Skip to main content

Exercise 7: Topic Modeling

4 min read

All the following topics are based on the whole corpus.

41 topics, 1000 iterations, 20 words printed per topic

Public Men Do Public Man Things: people power country laws state government law great liberty public men nation equally constitution present influence justice interest order private

Love Story: man woman good love make thought men women young world creature wife word poor find sex thing give don girl

50 topics, 500 iterations, 12 words printed per topic

Topic Modeling If Fo Fatiffying: fuch fo faid reverie fame moft fome thefe foon himfelf thing

Seventeen(sixty) Magazine: beauty lady fine eyes young fair beautiful air women eye dress appearance

50 topics, 500 iterations, 5 words printed per topic

Young Pickle!: peregrine pm gentleman young pickle

: man good make men give

Reading Tristram Shandy: tears heaven soul grief distress

80 topics, 1000 iterations, 12 words printed per topic

Putting the ‘List’ in ‘Orientalist’: japan taycho great dairo chinese empire orator cuboy farm people japonese fika

So Torrid: passion love heart lover mistress affection object sentiments tender tenderness soul loved

Authorial Modesty: author great genius learning learned read book works work poet books taste

I hoped that narrowing down the number of words per topic would yield more coherent topics, but I actually found that shorter topics were vaguer and more random-seeming, perhaps because by the time you are looking for the five words that are most likely to co-occur, you’re likely to get sort of generic, everyday words like “man good make men give” -- the ones I list above are the most coherent of the 5-word topics.

Armstrong claims that the transition from using the class system to using a person’s internal moral qualities as a measure of a person’s worth occurred through the novel, since the novel imbued the middle-class woman with individuated subjectivity. The topics that are explicitly gendered male seem to confirm her hypothesis, if we accept that in the 1760s, the part of the shift in which all middle-class people, including men, could be measured by their internal moral qualities had not yet happened. The topic Public Men Do Public Man Things lacks nouns that could be coded as having to do with the internal subjectivity of characters. We might infer that novels where that topic is prevalent conform more closely to an old model of fiction in which social stature determines worth, even if their particular ideas about the social order (liberty! equally!) are relatively modern.

However, Armstrong’s primary point, that the interiority of middle-class women began to matter in novels, is not precisely confirmed by the topics that are explicitly gendered female. The topic I called Seventeen(sixty) Magazine is representative of a number of topics which seem to be the “women’s novel” topic. Interestingly, while they certainly do not place emphasis on social class in the way that public, manly topics do, they also place no more emphasis on interior virtues. Rather, they bring to light a middle ground between the privacy of “essential qualities” and the publicly visible social order: physical appearance. The topic Seventeen(sixty) Magazine features appearance words laden with positive connotations, like beauty, fair, and young. It would be interesting to see whether, in novels where this topic is prevalent, female characters’ worth is defined more by their “essential qualities of mind” or their physical beauty.

The topics that reflect romance open up onto a number of questions about Armstrong’s thesis. So Torrid and Love Story, topics reflective of the romance genre, are composed of words that do explicitly refer to essential qualities of mind: good, love, passion, affection, tenderness, soul...According to topic modeling, it is in the romance genre that characters are most likely to be endowed with inner moral virtues. The romance, a particular type of domestic fiction, seems to match most closely with Armstrong’s argument. Do these (heterosexual) romances, which presumably demand both men and women main characters to function, imbue both their male and female characters with qualities of mind? Armstrong says that domestic novels “seized the authority to say what was female” -- were these feminized novels doing that through both male and female characters, or is it the feminized nature of the genre alone that allows Anderson to make that claim (468)?

*I’m not sure if it’s fair game to compare a topic chosen from a list of 50, generated through 500 iterations, and containing 12 words to topics chosen from a list of 41, generated through 1000 iterations, and containing 20 words. I think that the main thing that matters here is that they are all topics generated from the same corpus, groups of words that are likely to co-occur in that corpus, and that should make it acceptable to compare them???



6 min read

David Blei’s article in the topic modeling issue of the JDH helped me focus my thoughts after messing around with the topic modeling software for a while. I wish that I had a more complete understanding of the math at work behind the software (or math past the beginnings of AP Calculus at all, actually…) because I think that would really help me to focus the tool of topic modeling towards accomplishing my goals. This is actually what I’m grappling with in thinking through the implications of topic modeling for criticism and reading practices, and I am finding myself challenged by seeing not only the limitations but also the affordances of this software. Specifically I guess I am trying to take Blei’s article, my own feelings about the topic modeling exercise, and Stephen Ramsay’s piece, and put them into a productive conversation or make them help me reconcile some questions I have about close-reading vs. computer-assisted approaches.

Basically, I had difficulty approaching topic modeling as something I could use for anything other than “proving” my pre-existing conceptions about a book or a corpus. This feels as though it shouldn’t be the point — as Ramsay and Blei both explain, the point of using computer-assisted approaches like topic modeling in order to produce criticism is not to confirm preexisting assumptions or provide “proof” for a theory. Blei writes of the humanist scholar using topic modeling, “She discovers that her model falls short in several ways. She revises and repeats,” and Ramsay similarly points out that the end goal of literary criticism is not really an end goal at all, but merely a desire to open up further and more interesting and more productive conversations about texts, conversations which can be jump-started and supplemented by the implications of digital approaches. So, if the point isn’t to prove our pre-existing theories about corpuses or books, then how do we let topic modeling open up new questions for us?

I guess that the particular issue with which I’m struggling is the basic assumption of topic modeling, that texts are composed out of topics, when, as we discussed in class, the topics are actually created out of the texts. The loops of this are complicated enough — but it seems as though the dominant idea behind topic modeling that counting these bags of words can actually tell us something significant about the corpus as a whole or about individual books. I can’t figure out how to reconcile this with what I feel like is a basic assumption of literary theory: that close reading, by humans, is the answer to creating good criticism, that we are sort of looking for the “figure in the carpet,” which is both in the text and on the page but not plainly comprehensible or countable (though the conception of literary criticism in James’ story is odd, as it refers to Corvick’s barely-begun work on Vereker as “the last word,” as though a critic could or would want to write a “last word” on any work of fiction actually worth reading and discussing…). This exercise brought up some of the same ideas we discussed when using the NER, of a flattening of data. As the software doesn’t know anything about what the words actually mean and thinks of them as numbers, it makes them all appear equally important (despite its probabilistic structure). It assumes that at least some of these collections of words are significant in some way, or that we can adjust the settings on the software enough to create significant meaning out of these topics, or that we can achieve Ramsay’s goal of having productive new discussions using these topics/that Blei’s ideal humanist can be pushed to revise. I don’t want to fall into the intentional fallacy, but it seems like the very point of reading, at least after interpretation and basic comprehension become in some ways the same thing in novels (as we’ve marked this shift in James), is that all words — even after one eliminates stop words — in a text are not created equal and do not have equal significance. This isn’t to say that one can’t still use topic modeling in a productive way and I hope I’m not just uselessly handwringing here — it’s just to question some of the basic assumptions that go into creating the approach in order to pay attention to these as we try to build a criticism out of these technologies.

To try to pull all of what I am saying into a more coherent and final expression: my main question is if topic modeling and other approaches like these represent just a digital form of close reading, one of which humans are incapable, or if topic modeling and similar approaches are completely at odds with close reading practices and assume that meaning is actually machine-readable in some way. I guess my real concern is that the line critics must walk seems extremely fine — that is to say, distinguishing between Ramsay’s conversation-opening function of computer-assisted approaches and the more hunt-down-evidence ideas I can’t seem to free myself of while using this software/which Blei seems to somewhat reference, writing that he hopes topic modeling will “point us to such evidence” for the ideal humanist scholar. How do we/I deal with/combat the basic assumptions built into topic modeling in order to make it generative and useful as a tool? How have others done this? What is the nature of the criticism that can come out of using topic modeling? I don’t want to set up a straw man of humanists not reading and just using computer-assisted approaches, but I want to be careful that I am still human-reading and close-reading my books while using these technologies, treating them as supplementary methods rather than the end of human reading. Of course I don’t think anyone is doing this — but I am hesitant or at a total loss, really, to see what I could say about the topics I got out of this exercise having not read the books in the corpus!!


Exercise 7

5 min read

50 topics, 1000 iterations, 20 printed words, stop words removed

feelings and science

  1. ear mother devil captain fair er graces bath spirit doctor air playing hall delight vii theatrical water ne specimen dancing

a story of nature and humans

  1. nature great years soul eyes prince eye ground light twenty book fear back human turn earth friends city family caught

a pilgrimage

  1. kind thy thou eye turning divine appointed held drew bore glory ad noble grief preface alas worthy weak break en

10 topics, 1000 iterations, 20 printed words, stop words removed

a war between england and france

  1. king people war english england time men army great country power kingdom duke prince queen france nation began french general

love and family

  1. time miss made mr lady manner love make found great told person young thought brother replied house passion woman part

50 topics, 1000 iterations, 20 printed words, no stop words removed

love letters

  1. i and which you or your for am letter of upon page as any love are our it is from


  1. knight sir launcelot mr you your crowe squire captain clarke aurelia crabshaw greaves adventurer being darnel dolly t ye justice

50 topics, 500 iterations, 20 printed words, stop words removed

young love

  1. love happy life find man give men fine hear day tis beauty young ill dear er charms delight live formed

the court

  1. country service public making man gave gentleman pay appearance lordship company thy character fortune conduct manner interest laws care thee


  1. fo ship men sea fome made richard feveral found capt falconer water indians adventures god boat board ifland shore captain

Reading through these topics reminded me very much of Tristram Shandy. The variation of words in a topic list seemed similar to Tristram Shandy’s tangents and detours—although Tristram Shandy guided the reader through these tangents rather smoothly while these topics are lists of rather disjointed words. Without someone to explain their connectedness, it can be rather difficult to find an overarching theme. I doubt some of my labels do the topics justice. I should note that not all of the topics were this disjointed; some of them seemed to fit together rather well.

I thought I’d try to connect this to Armstrong’s _Desire and Domestic Fiction_, but surprisingly the topics (out of the ten I listed here) I found dealing with writing, subjectivity, and love aren’t necessarily gendered. Topic five has mentions both men and women while topic six does not mention a gender. Topic eight mentions only men—though with the combination of the two words “find man”, it could be about a woman looking for a man to marry or fall in love with. I feel like this case is harder to make when “woman” or “women” isn’t mentioned in the topic, though.

It could be that I also skipped over some topics that might have connected subjectivity to gender. There were quite a few.

(Okay, I went back and looked. One is [lady great time letter company found conversation behaviour woman immediately order mistress satisfaction love room acquaintance town obliged proper london], and another is [good made thought great make time give person reason woman put answer knew pleased long occasion words heard head leave]. So they're there, I just didn't include them in my ten.)

I think I can understand what is meant when it is said that topic modeling can make the novel seem like all reality effect. Topic modeling gives list of words, frequently including nouns (there are even more nouns with stopwords removed) that sometimes seem to have no connection to each other. They’re lists of words with no context to explain their presence. They’re just there. I think it’s this lack of context that makes the novel seem like all reality effect. Without any context or connection to each other, these words just… exist. And like the reality effect, words or nouns that don’t have a deeper meaning or connection to anything else—words that just exist—are there only to say that they are real. The reality effect is definitely at play here. One thing I think can diminish the reality effect in topic modeling is to find connection or meaning between the words listed in each topic. Sometimes the words seem like they’re just “there”, but other times it’s easier to find a connection to between them, or one gets the feeling that the words in this topic do have a connection but one hasn’t found it yet. For example, topics four and six in my list seem to have a straightforward connection. In this case, we don’t really see the reality effect as much because the words do have another meaning and they’re not there only because they exist. I think another thing about topic modeling that makes the reality effect more prevalent is that it takes words out of context. It could be that the words in this list do have other meanings or multiple meanings in the novels they appear, but without context we don’t really know what they are. These words might do more than just exist in the novels they appear in.


Topic Modeling

3 min read

1000 iterations, 50 topics japan taycho dairo cuboy chinese japonese fika orator kaka strot yak brut tiffi empire farm yesso beast body tartary council

--Travel, foreign words or making fun of foreign words
mr lady gentleman miss ll wife town man don honour sir money made began landlord master fine company people husband Titles and honour war country great enemy general king army made long enemies length men peace part command head high brought fell number -- Let’s go to war

ather friends young time found passion son heaven soul death utmost friendship longer means happiness appeared felt tears purpose arms -- Letters to my father

fo ship men fome sea made found captain water richard capt adventures falconer god indians feveral board boat till ifland --Let’s go on an adventure!

don chevalier memoirs love de king gold fine young spain point arc knight pier pierpoint excellent things large fet age -- More adventure and looking for treasure

Contrary to my expectations, the list of 10 topics did not make a lot of sense because it was a lot less focused. There were more words and they did not fit together as well. I thought that if I narrowed my topics I would need to also increase the iterations. I did this (increasing iterations to 1500 instead) and it was a little better. Names throw things off I think. I think the stop words are supposed to get rid of names? But they somehow slip through.

Looking at the lists that were generated, I thought about what Professor Buurma said about how topic modeling could be seen as just excessive reality effect (hopefully I am not completely incorrect in this paraphrase…). But at first I couldn’t see how they related. I think of reality effect in terms of describing objects or locations. I think because most of the examples Barthes uses are about describing a location, a room, etc. He mentions the barometer. The thing is, the barometer is probably only mentioned once in the book, so I don’t think it would appear on these lists. I think most of the examples of actual reality effect elements would not be significant enough to appear in topic modeling. But then I thought about maybe there is another way to think about reality effect and topic modeling. If you think about how topic modeling assumes that books are just random (I think random?) words chosen out of bags of topics. Reality effect is also a sort of random process. The barometer is random because it is an object the author randomly decided to include in the novel. Another way of thinking about relating these lists to the reality effect—I asked myself are these lists saying to me, “We are real, we are real, we are real”? Hmm well the fact that you can take something nonrandom, treat it as random, then put it through an algorithm and get nonrandom results says something about reality. Is reality just a random set of events that we interpret to make meaning out of it?? So this got me thinking about Barthes and how life is meaningless (is that too harsh a word?) until you assign your own meaning to it. And that’s what we do with these lists! They’re not made according to a given topic title, but once the lists are generated, we give a title to each of the lists.


Topic Modeling

3 min read

10Topics, 100 iterations

1) European power structures [king people country england power english war time men great prince general lord army france french enemy earl kingdom laws] 2) The businessman [made time manner make thought gave account fortune received give found care gentleman replied till affair money proper opportunity long]

50 Topics, 1000 iterations

3)Family [father young family mother fortune lady daughter son wife made years man time great good husband brother woman child marriage] 4)Literature [author great book genius read taste learned learning life wit works good piece history years public character work poet stage] 5)The good life (for men)[man nature life virtue human men good world natural opinion general means happy true degree light advantage equal make makes] 6)Adventures at Sea [fo ship men great made water sea richard fome adventures capt indians falconer told feveral god good time ifland make] 7)JustGirlyThings? [love heart passion affection soul mind happiness tender sentiments tenderness friendship heaven object present felt beauty eyes longer fortune mistress] 8)England v France [king english england duke army time war parliament france queen french henry crown general men earl began thousand made kingdom] 9)Pamela [mrs good master sir poor dear pamela ll mr jewkes lady hope thing god don jervis pray make mother father] 10)Power and Government [people country power laws government state present nation court order subjects ambition public great liberty kingdom constitution part arts authority]

A few common themes I noticed across the arrays of topics that I generated were: topics containing words related to people (mr, mrs, madam, his, her, etc.), topics relating to adventure or exploration, and topics centered around a particular culture/nationality or combination of cultures (British and Chinese, England and France). There were a number of topics similar to the Pamela topic that seemed to relate to just one volume in particular. These types of topics were indicated primarily by the presence of specific character names within the topic. I found it particularly interesting that a few gender-related topics appeared, which demonstrated the introduction of female subjectivity as described by Armstrong. I was also amused by the one or two topics that were just a bunch of words containing the letter "f" instead of the letter "s".

Using a larger number of topics and iterations definitely produced a greater variety of topics, but there was a lot of overlap between some of the topics. Using only 10 topics and 100 iterations produced some much more generalized topics.


Exercise 7: Topic Modeling

3 min read

Domestic life: house company gentleman person time acquaintance lordship great place conversation young lord received friend met day agreeable evening visit lived

Family: father son young daughter mother family child years wife fortune good children life age great year time married estate left

Woe is me: tears heart heaven death grief soul distress life eyes tender comfort melancholy despair pity unhappy moment sorrow alas days felt

The first Moby Dick: ship captain sea made men board found wind great land shore place boat richard water till voyage put sail immediately

Humanity: nature man human virtue life state god men world religion sense spirit natural power divine reason creatures mankind subject soul

Sophistication: country people men manner great find found order pleasure proper generally present english china society art world mankind taught peculiar

I kill, therefore I am: war army general french english enemy country men battle peace time enemies england number success forces command france troops part

Let’s get it on: passion love made found time person de horatio mistress lover affection loved knew louisa gave words nature tho thought thoughts

Someone studied for his SATs: peregrine pm pickle young hero consequence order pipes disposition sooner began view gentleman immediately satisfaction commodore great manner opportunity countenance

Authors who can’t spell: count ihe ed ft termes fee duchefs becaufe madam duke ihould mc fa paris myfelf cafardo mifs fe days wa

Coming into the assignment I wasn’t entirely convinced as to the usefulness of topic modeling; it seemed too mathematical and arbitrary to be of much help. After running the program, however, I was pretty impressed. Some topics were indiscernible, but some of them represented clear obsessions of the 18th century novel. And while some topics were compiled almost entirely from a single novel, some, such as the “Woe is me” topic, were sourced from a wide variety. I make this observation because topics from single novels are interesting if we are curious about that specific novel, but generally unhelpful for an overview of themes of the century. The fact that some of the themes come from a diverse set of novels, however, demonstrates the usefulness of the tool. I would be interested to do a similar thing on a set of contemporary novels. Many of these themes were pretty predictable and fit into my prior perception of novels of the period based on what we’ve read. I don’t have a similar sense of what topics I would predict for today’s novels, and I wonder if that’s because they’re less repetitive (doubtful), there are simply more of them, or just because I don’t have the benefit of hindsight.


Topic Modeling of Pamela

3 min read

After generating the first set of topics with the numbers given from the assignment, I noticed that they were all mostly the ones we had discussed in class so going through the chunks in the folder finding Pamela and decided to go through two chunks from the beginning and two from the end to compare them and see how they change. Since my mind categorizes things easier when they are in smaller pieces (and because the first run took half an hour on my computer), I chose to use 10 topics, 200 iterations, and 10 topic words for these Pamela chunk runs. I also included the stopwords list in all of my runs because the one practice run I ran without stopwords was overpowered with words that I did not find meaning in. (commas added to create more meaning/ how my brain created meaning)

First Run

  • [The Pamela Topic] : mrs master dear good poor pamela lady mr ll sir hope jewkes god letter jervis don williams goodness thought mother

  • [$$$] : money made time house means interest pounds present fortune thousand sum obliged hundred favour order paid found person put case

Beginning of Pamela

  • [leaving for class]: good life, mourning bed, god kind, loss grieved, part early
  • [Russet Potato Pamela] : crying praised indulge bring ashamed feared hard russet parents harm
  • [Everything Dreary] : english poor died trouble crying picture slightest sentiments marked created
  • [Dreary as well..]: tears afraid, back story wanted rest, reason heart silly matter

Ending of Pamela

  • [where did those 9 months go?] : saturday mother, thursday receive relation, reading remorse, mind married london

  • [epistolary plot-twist] : dear reward, born character wife unforeseen, reduced examples, lasted piety

  • [Pamela found God in Jamaica, ...when?] : leave jamaica, meeting earnest, story pleasure, gloucester set, received almighty (where did jamaica come in from?)
  • [Happy ending] : calne happy glad baby happily agreed lying reformation gracious present

Looking at the chunks from the beginning and ending of Pamela was interesting to track the changes as the topics did have a definite shift in connotation to them (the words generated from the beginning of Pamela were darker while the words generated from the ending were lighter in feeling). However, I tried to get chunks that were fairly separate from each other as I was not sure how much of the beginning or ending the chunks contained. The ending topics mostly seemed to focus on a light and happy mood and also on the child while those generated from the beginning relate how Pamela was constantly being attacked and afraid for her virtue although the word virtue only appeared once in my lists. I am not sure if these lists can fully support Armstrong’s argument however they did exhibit features that are more individual based and do not contain any mention of any economic connotations. A lot of the topics generated at the end of Pamela focus primarily on maternal aspects. I also thought it was interesting how I instantly related all the words in the topics to words regarding Pamela rather than Mr. B who constitutes a large presence in the novel. This is probably a result of the novel being mostly told in her words and everything Mr. B does is told in her response to it.


Time Made

4 min read

I think I'm going to approach this a bit more artistically and a bit less scholarly than may be intended, but I can't help myself.

I was most taken with the topics generated from the prescribed settings (50 topics, 1000 iterations, 20 words per topic):

Space Pirates: strap captain narcissa ship chap board time behaviour morgan surgeon immediately body uncle mate cried expence put told banter thomson

time travel, space ships, best friends, beautiful aliens, and witty remarks from the medic.

The next three all have exceptionally good final three words. I wonder how much the order matters to my understanding of the topics, and how random the order is.

Evening Passion: eyes purpose attention voice tears peace stood silence instantly fixed ground soul night distress place led felt length rose equally

two lovers part in a moonlit garden.

Americans Abroad: peregrine pm pickle lord pipes hero commodore gentleman mrs emilia hatchway love trunnion lieutenant jolter painter company view french behaviour

men in double-breasted suits aboard steamers talk of art and war over lunch.

I'm having a lot of fun with these. They remind me of poems without linebreaks. The NY Times has a running column that makes poems out of missed connections postings on Craigslist, which remind me of this. It makes me really want to write found poetry for my experimental bibliography.

I generated two other lists of topics: one simple, and one complex. Both were disappointing.

10 topics, 100 iterations, 10 words per topic:

pastoral epic: time power pleasure present life nature happiness english country thousand

In fact, the simplicity of the settings has led me to the most complex, or at the very least, abstract topics. I have asked the computer to distill novels to their basest forms for me. If I consider novels an imitation or representation of reality, then I am nearly asking a computer to find the meaning of life. That, of course, did not work out so nicely in The Hitchhiker's Guide to the Universe.

20 topics, 2000 iterations, 15 words, no stopwords:

this took 30 mins for the program to complete.

The most interesting topic this:

an ode to ee cummings: the to of i in a it not that but for be have as my Satisfyingly the opposite of the "simple" results, but otherwise too basic, too superficial.

Of course, I could write about the easily labeled topics: church, or voyages, or one topic that was very obviously Pamela. What's the fun in that? Topic modeling effectively takes something sciencey and relieves it of any obligation to be scientific. We take all these data that have been collected in the most absolutely unbiased process and require that they be nearly arbitrarily (certainly subjectively) named, labeled, and sorted.

I've been thinking about applications for topic modeling. Is it practical for telling about large amounts of writing? How could I actually use it in a real situation? Not just by generating lists, I think. But what about connecting the words in the topics to the full information? Could we hyperlink each word to direct back to its appearance(s) in the original text(s)? I'm thinking about something along the model of The Perseus Project. Could we create topic concordances, with links to locations of every instance of the word chosen in the topic? Could we generate statistical metadata, showing frequency, placement, etc? Lastly, could we superscore iterations of topics? Is that already being done by iterating (I don't have a strong enough grip on the actual process)? I'd like to see a super topic model, where only the strongest words remain, only those used over and over and over again, or used from topic to topic to topic. Is that close to my simple settings? What would happen if I asked the program to iterate once, generate one topic, and choose one word? I assume it would choose the most used word in 1760's novels.


I did this and returned with "sir". With 2 topics and two words each, I got "time made" and "sir lady." I'm intrigued by time made. I will leave it on this note.


Capitalism, pickles, and a very sad guy named Harry

3 min read

(All topics were produced with stop words removed)

The highlights:

5 topics, 1000 iterations, 5 words, full text corpus

Title: Imperialism. “king people don made war”.

50 topics, 1000 iterations, 5 words, full text corpus

Title: Social mobility. “time made mr house great”.

Title: ??? “peregrine pm mr pickle pipes”.

Title: Novels are EVIL. “author genius book bad devil”.

Title: Harry had a rough day. “harry mr fool quality cried”.

Title: France and England, a History. “war prince french england english”.

Title: The Structure of a Novel. “set hundred laid part story”.

Title: Virtue’s Fear of Flying. “honour heart flie foul happened”.

5 topics, 1000 iterations, 5 words, ONLY the chunks of Pamela.

Title: Sparknotes version of Pamela. “mrs jewkes thought poor thing”

10 topics, 3000 iterations, seven words, full text corpus.

Title: England Over Everything. “king people country power england english time”.


The most striking thing I’m sure you all will notice about my topics is that they’re exceptionally short when compared to the baseline of 20 that the exercise guidelines outlined as an ideal starting point. I played with several different versions of topic modeling outputs over the course of this assignment, and I found that the five-word topic made the most sense to me and felt like I could draw the most meaning out of it. Thus, five-word topics abound in my highlights section.

Personally I have a tough time drawing a lot of meaning out of these topics, even in my more concise versions. I feel like the algorithm just doesn’t really create topics in a way that allows us to draw powerful conclusions from them, at least in this setting. Maybe there’s something I’m not seeing, but many of the topics just seem terribly incoherent.

I did notice there’s a little bit of Armstrong-y type stuff going on here, specifically with relation to “author genius book bad devil”. This reminded me of our discussions on Pamela and Shamela, and how as the female domestic novel emerged onto the literary scene, it often doubled/masqueraded as/was supposed to be a conduct book of sorts while it helped to shape what we know of today as the novel.

There’s also some imagined communities stuff floating around as well, a prime example being this “king people country power england english time” thing. This topic seems to suggest a unified English identity though the unity of king, people, and country over the course of time. It’s difficult to say whether this topic was drawn primarily from a single text within the corpus or not, but if it wasn’t, this topic gives powerful evidence to support Anderson’s theory that the novel was playing a crucial role in creating these imagined communities.

It’s possible to glean some interpretation from these topics, but as it stands right now, I liked them better for their comedic qualities.


Exercise 7 - Topic Modeling

4 min read

Coolest Topics (all with stopwords removed)

Fulltext, 50 Topics, 1000 Iterations (20 topic word printings)

“Pamela in a Nutshell” → mrs sir master good pamela mr dear poor…

“Your Average Domestic Novel” → dear heart lady love father latter good hand hope…

Fulltext, 25 Topics, 100 Iterations (10 topic word printings)

“Time for Army Adventures” → king people war english england general adventures army…

“A Lovely Letter” → love dear letter heart adieu happy happiness moment friendship emily

Chunks, 25 Topics, 100 Iterations (10 topic word printings)

“Adventure = Money” → adventures guinea made make money moment moment sir master give

“Nouns R Important” → lady friend men woman world heart lord thing present happiness

Chunks, 25 Topics, 200 Iterations (10 topic words printed)

“A Brief History of England” → king people england war prince english power army

Chunks, 50 Topics, 1000 Iterations (20 topic word printings)

“A Gentleman’s Handbook” → honour time power thought favour give part liberty

“Intro to English Gov” → king people england duke france kingdom prince queen parliament…

“Probably Plot of Chrysal” → master guinea adventures made directly service general business person

Chunks, 50 Topics, 1000 Iterations (5 topic word printings)

“Nonsense” → de ia le ft la

Most interesting things in fulltext topic modeling: When I did the first run-through of 50 topics, 1000 iterations, and 20 topic word printings, I was struck by how some topics were very specific to one book. The topic I named “Pamela in a Nutshell” was a list which constituted 40% of Pamela. Throughout my experiments with different numbers of topics, iterations, and topic word printings (all using the fulltext folder), I found that it was indeed possible to find genres in some of these topics, and some of them are reminiscent of Tristram Shandy; however, I think it’s hard to evaluate because I’m going off what I know about existing genres to “check” these topics, so there’s no way for me to check the topics related to genre nuances I’m unaware of.

Cool things about chunks of novels topic modeling: It seems like the the chunks of novels we get a higher probability that each topic will more directly correspond with a particular book -- though I think this depends on the topic itself, because some are more general and others become more specific (like the ones about HistoryEngland). One weirdness was a topic that was suddenly very nonsensical (entitled “Nonsense”) that felt like stopwords had edged their way in there. When I looked at the topic more closely, it constituted 92% of Shandy1_22, which, upon further investigation, is a chunk that’s mostly in French -- so the topic confusion makes sense. Going further, it would be so helpful if a next version of this algorithm could scan for different languages within a text and either make a note of that or just create the topics in that language. Luckily, our corpus of works is (almost) all in English, so we don’t require this for our current use of the program, but it would be cool. Very cool.

General observations: The more specific novels have an easier time getting a topic all to themselves; for example, the HistoryEngland doc was represented with topics like “Time for Army Adventures” in the fulltext topic modeling and the topics “A Brief History of England” and “Intro to English Gov” in the chunks of novels topic modeling. All of the topic modeling feels a bit like a reality effect exercise, because it starts by essentially stripping away any possible meanings of the words and simply grouping them with other words they appear close to. This idea that words/objects could be in a novel without a meaning, to just be there, feels like what Barthes was talking about. (Maybe? I don’t have a full grasp on all the details of the reality effect.)

I preferred topic modeling the chunks of novels over modeling whole novel docs because modeling the chunks made it easier for me to understand how topics related to each individual book. Certain docs showed up with topics highly represented (high percentage of words in doc assigned to topic), which tells us that the particular topic in question is very important for that section/chunk of the novel. The other possibility was that one novel dominated the list of docs that included the topic in question, which means that the topic is important throughout the whole book because it shows up in more than one chunk of the novel. I liked being able to compare significances of topics for whole books with sections of those books -- this could be really useful for tracking themes throughout novels, and then throughout the history of novels, but I feel like we’d need a tool that lets us be more precise about we want to do.


Exercise 7 - Topic Modeling

3 min read


All had stop words removed with 1,000 Iterations.

Initial – 50 Topics, 20 words

Love in a Country House: time found house great made person young gave received immediately passion desire mistress desired opportunity satisfaction happened master answered ordered

War in England: king french war english general house parliament commons lord spain ministry nation forces fleet army march troops hundred england men

On an Adventure: ship men sea captain water richard made capt falconer indians board adventures boat feveral god ifland shore foon faid laft

10 Topics, 20 words

Writing a Letter: good mr sir master poor god man lady put great make ll mrs fool made thing till day dear honour

People: sir lady dear man charles miss mr love lord heart grandison good madam brother letter harriet clementina make woman Byron

Labels: thou ihe count thy ed thee callirrhoe letter madam day king night duke prince woman thefe art ft termes wife

20 Topics, 10 Words

Simple Country Life: man lady people country wife great money head good town

OCR Trouble: fuch fo reverie faid moft heart fome thefe fame thing

25 Topics, 15 Words, Text Chunks

Daily Schedule: day morning time night house till return place leave days returned left found set home

Evil: lord purpose danger length violence appeared received power revenge vengeance william stood instantly fears friends


To answer the question about reality effect, Topic Modeling definitely led me to see names in a different light. Looking at the "People" topic, the names don't really suggest a particular meaning; the topic isn't really about anything. I would even be skeptical to say that names like "Charles", "Harriet", and "Clementina" are meant to take on any symbolic meaning. They are simply names used to make the reader perceive the story as being real, so that they think--that seems like a reasonable person for a person to have, and go on reading without having doubts as to the validity of the story.

Topic Modeling also raised some of the questions we worked with in Tristram Shandy about the authenticity of digital facsimiles of novels. It's interesting that the Topic Modeling tool didn't necessarily understand the words in the "OCR Trouble" topic, which had all of the long s's replaced with f's. But since the tool only grouped words based on their proximity to each other, it may not have mattered. Though it's possible that the words "such so reverie said most heart some these same thing" also appeared in texts written after the long s stopped occurring, but that these texts didn't contribute to the Topic because the tool didn't recognize "such" and "fuch" as actually being the same word.