Deprecated: __autoload() is deprecated, use spl_autoload_register() instead in /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php on line 17

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 22

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 23

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 25

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 26

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 27

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 28

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 29

Warning: session_set_save_handler(): Cannot change save handler when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Data/AbstractSQL.php on line 86

Warning: session_name(): Cannot change session name when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 45

Warning: session_start(): Cannot start session when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 46

Warning: session_cache_limiter(): Cannot change cache limiter when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 47

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 57

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 58

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Webmention.php on line 376

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Webmention.php on line 377

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/PubSubHubbub.php on line 41

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/PubSubHubbub.php on line 42

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 57

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 58

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 998

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/external/bonita/includes/Bonita/Templates.php on line 170

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/templates/default/shell.tpl.php on line 5

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/templates/default/shell.tpl.php on line 6
The Rise of the Novel
Skip to main content

Exercise 8: Narrative form + artistic representation

1 min read

In thinking about topic modeling and metadata for a group of 18th century novels, a particular challenge for me seems to be figuring out a question to explore using these methods which is neither so large that its answer would not be particularly interesting or meaningful nor so small that it is not much different from close reading. So, in trying to both narrow down and expand my interests in genre, form, etc. I really like the idea of combining algorithmic criticism with “middle-distance reading” to find something out about a group of novels that traditional literary criticism — with its dichotomous close and distant reading — can’t really do. I would like to take a group of novels written in the third person OR in free indirect discourse and look at the topics related to or involving artistic representation in the form of writing — so mentions of books, novels, poetry, other kinds of writing — to see if any commonalities pop up and if there might some kind of relationship to be extrapolated or theorized here about narrative form and the presence of narrative artistic representations in the novels.

Exercise8

Exercise 7: Topic Modeling

4 min read

All the following topics are based on the whole corpus.

41 topics, 1000 iterations, 20 words printed per topic

Public Men Do Public Man Things: people power country laws state government law great liberty public men nation equally constitution present influence justice interest order private

Love Story: man woman good love make thought men women young world creature wife word poor find sex thing give don girl

50 topics, 500 iterations, 12 words printed per topic

Topic Modeling If Fo Fatiffying: fuch fo faid reverie fame moft fome thefe foon himfelf thing

Seventeen(sixty) Magazine: beauty lady fine eyes young fair beautiful air women eye dress appearance

50 topics, 500 iterations, 5 words printed per topic

Young Pickle!: peregrine pm gentleman young pickle

: man good make men give

Reading Tristram Shandy: tears heaven soul grief distress

80 topics, 1000 iterations, 12 words printed per topic

Putting the ‘List’ in ‘Orientalist’: japan taycho great dairo chinese empire orator cuboy farm people japonese fika

So Torrid: passion love heart lover mistress affection object sentiments tender tenderness soul loved

Authorial Modesty: author great genius learning learned read book works work poet books taste

I hoped that narrowing down the number of words per topic would yield more coherent topics, but I actually found that shorter topics were vaguer and more random-seeming, perhaps because by the time you are looking for the five words that are most likely to co-occur, you’re likely to get sort of generic, everyday words like “man good make men give” -- the ones I list above are the most coherent of the 5-word topics.

Armstrong claims that the transition from using the class system to using a person’s internal moral qualities as a measure of a person’s worth occurred through the novel, since the novel imbued the middle-class woman with individuated subjectivity. The topics that are explicitly gendered male seem to confirm her hypothesis, if we accept that in the 1760s, the part of the shift in which all middle-class people, including men, could be measured by their internal moral qualities had not yet happened. The topic Public Men Do Public Man Things lacks nouns that could be coded as having to do with the internal subjectivity of characters. We might infer that novels where that topic is prevalent conform more closely to an old model of fiction in which social stature determines worth, even if their particular ideas about the social order (liberty! equally!) are relatively modern.

However, Armstrong’s primary point, that the interiority of middle-class women began to matter in novels, is not precisely confirmed by the topics that are explicitly gendered female. The topic I called Seventeen(sixty) Magazine is representative of a number of topics which seem to be the “women’s novel” topic. Interestingly, while they certainly do not place emphasis on social class in the way that public, manly topics do, they also place no more emphasis on interior virtues. Rather, they bring to light a middle ground between the privacy of “essential qualities” and the publicly visible social order: physical appearance. The topic Seventeen(sixty) Magazine features appearance words laden with positive connotations, like beauty, fair, and young. It would be interesting to see whether, in novels where this topic is prevalent, female characters’ worth is defined more by their “essential qualities of mind” or their physical beauty.

The topics that reflect romance open up onto a number of questions about Armstrong’s thesis. So Torrid and Love Story, topics reflective of the romance genre, are composed of words that do explicitly refer to essential qualities of mind: good, love, passion, affection, tenderness, soul...According to topic modeling, it is in the romance genre that characters are most likely to be endowed with inner moral virtues. The romance, a particular type of domestic fiction, seems to match most closely with Armstrong’s argument. Do these (heterosexual) romances, which presumably demand both men and women main characters to function, imbue both their male and female characters with qualities of mind? Armstrong says that domestic novels “seized the authority to say what was female” -- were these feminized novels doing that through both male and female characters, or is it the feminized nature of the genre alone that allows Anderson to make that claim (468)?

*I’m not sure if it’s fair game to compare a topic chosen from a list of 50, generated through 500 iterations, and containing 12 words to topics chosen from a list of 41, generated through 1000 iterations, and containing 20 words. I think that the main thing that matters here is that they are all topics generated from the same corpus, groups of words that are likely to co-occur in that corpus, and that should make it acceptable to compare them???

 

Topic-Modeling Metadata

2 min read

At first glance, topic modeling doesn’t seem to provide us with a lot of metadata; the outputs that the algorithm generates for us are pretty straightforward strings of words. But I think there’s more to topic modeling than meets the eye, and I would be interested in exploring and analyzing what little metadata topic modeling has to offer us. My research question would be something along the lines of: What can the metadata of topic modeling tell us about topic modeling as a practice, and about the novels they are attempting to topic model?

In order to effectively analyze the metadata of topics, we’d be burdened with the task of creating, or at least documenting, the metadata that’s available for each one. I would begin by tagging each topic with the number of other topics produced alongside it, the number of iterations, the number of printed words, and the presence of stop words or not. I would then create some basic content-related labels surrounding the topics based on what we’ve seen so far, such as “money” or “family” or “hilarious” or “???” depending on the topic. I’d also like to figure out a way to assign the topics a “relevancy score”, or some metric that indicates how much the topic “makes sense” to a human reader or how much meaning we can draw from it.

I think this research question, and its answers, would provide us with a SUPER macro-level picture of what’s going on in a corpus of novels, one worth discussing.

exercise8

Stuck!

6 min read

David Blei’s article in the topic modeling issue of the JDH helped me focus my thoughts after messing around with the topic modeling software for a while. I wish that I had a more complete understanding of the math at work behind the software (or math past the beginnings of AP Calculus at all, actually…) because I think that would really help me to focus the tool of topic modeling towards accomplishing my goals. This is actually what I’m grappling with in thinking through the implications of topic modeling for criticism and reading practices, and I am finding myself challenged by seeing not only the limitations but also the affordances of this software. Specifically I guess I am trying to take Blei’s article, my own feelings about the topic modeling exercise, and Stephen Ramsay’s piece, and put them into a productive conversation or make them help me reconcile some questions I have about close-reading vs. computer-assisted approaches.

Basically, I had difficulty approaching topic modeling as something I could use for anything other than “proving” my pre-existing conceptions about a book or a corpus. This feels as though it shouldn’t be the point — as Ramsay and Blei both explain, the point of using computer-assisted approaches like topic modeling in order to produce criticism is not to confirm preexisting assumptions or provide “proof” for a theory. Blei writes of the humanist scholar using topic modeling, “She discovers that her model falls short in several ways. She revises and repeats,” and Ramsay similarly points out that the end goal of literary criticism is not really an end goal at all, but merely a desire to open up further and more interesting and more productive conversations about texts, conversations which can be jump-started and supplemented by the implications of digital approaches. So, if the point isn’t to prove our pre-existing theories about corpuses or books, then how do we let topic modeling open up new questions for us?

I guess that the particular issue with which I’m struggling is the basic assumption of topic modeling, that texts are composed out of topics, when, as we discussed in class, the topics are actually created out of the texts. The loops of this are complicated enough — but it seems as though the dominant idea behind topic modeling that counting these bags of words can actually tell us something significant about the corpus as a whole or about individual books. I can’t figure out how to reconcile this with what I feel like is a basic assumption of literary theory: that close reading, by humans, is the answer to creating good criticism, that we are sort of looking for the “figure in the carpet,” which is both in the text and on the page but not plainly comprehensible or countable (though the conception of literary criticism in James’ story is odd, as it refers to Corvick’s barely-begun work on Vereker as “the last word,” as though a critic could or would want to write a “last word” on any work of fiction actually worth reading and discussing…). This exercise brought up some of the same ideas we discussed when using the NER, of a flattening of data. As the software doesn’t know anything about what the words actually mean and thinks of them as numbers, it makes them all appear equally important (despite its probabilistic structure). It assumes that at least some of these collections of words are significant in some way, or that we can adjust the settings on the software enough to create significant meaning out of these topics, or that we can achieve Ramsay’s goal of having productive new discussions using these topics/that Blei’s ideal humanist can be pushed to revise. I don’t want to fall into the intentional fallacy, but it seems like the very point of reading, at least after interpretation and basic comprehension become in some ways the same thing in novels (as we’ve marked this shift in James), is that all words — even after one eliminates stop words — in a text are not created equal and do not have equal significance. This isn’t to say that one can’t still use topic modeling in a productive way and I hope I’m not just uselessly handwringing here — it’s just to question some of the basic assumptions that go into creating the approach in order to pay attention to these as we try to build a criticism out of these technologies.

To try to pull all of what I am saying into a more coherent and final expression: my main question is if topic modeling and other approaches like these represent just a digital form of close reading, one of which humans are incapable, or if topic modeling and similar approaches are completely at odds with close reading practices and assume that meaning is actually machine-readable in some way. I guess my real concern is that the line critics must walk seems extremely fine — that is to say, distinguishing between Ramsay’s conversation-opening function of computer-assisted approaches and the more hunt-down-evidence ideas I can’t seem to free myself of while using this software/which Blei seems to somewhat reference, writing that he hopes topic modeling will “point us to such evidence” for the ideal humanist scholar. How do we/I deal with/combat the basic assumptions built into topic modeling in order to make it generative and useful as a tool? How have others done this? What is the nature of the criticism that can come out of using topic modeling? I don’t want to set up a straw man of humanists not reading and just using computer-assisted approaches, but I want to be careful that I am still human-reading and close-reading my books while using these technologies, treating them as supplementary methods rather than the end of human reading. Of course I don’t think anyone is doing this — but I am hesitant or at a total loss, really, to see what I could say about the topics I got out of this exercise having not read the books in the corpus!!

topicmodeling

Topic Modeling

3 min read

1000 iterations, 50 topics japan taycho dairo cuboy chinese japonese fika orator kaka strot yak brut tiffi empire farm yesso beast body tartary council

--Travel, foreign words or making fun of foreign words
mr lady gentleman miss ll wife town man don honour sir money made began landlord master fine company people husband Titles and honour war country great enemy general king army made long enemies length men peace part command head high brought fell number -- Let’s go to war

ather friends young time found passion son heaven soul death utmost friendship longer means happiness appeared felt tears purpose arms -- Letters to my father

fo ship men fome sea made found captain water richard capt adventures falconer god indians feveral board boat till ifland --Let’s go on an adventure!

don chevalier memoirs love de king gold fine young spain point arc knight pier pierpoint excellent things large fet age -- More adventure and looking for treasure

Contrary to my expectations, the list of 10 topics did not make a lot of sense because it was a lot less focused. There were more words and they did not fit together as well. I thought that if I narrowed my topics I would need to also increase the iterations. I did this (increasing iterations to 1500 instead) and it was a little better. Names throw things off I think. I think the stop words are supposed to get rid of names? But they somehow slip through.

Looking at the lists that were generated, I thought about what Professor Buurma said about how topic modeling could be seen as just excessive reality effect (hopefully I am not completely incorrect in this paraphrase…). But at first I couldn’t see how they related. I think of reality effect in terms of describing objects or locations. I think because most of the examples Barthes uses are about describing a location, a room, etc. He mentions the barometer. The thing is, the barometer is probably only mentioned once in the book, so I don’t think it would appear on these lists. I think most of the examples of actual reality effect elements would not be significant enough to appear in topic modeling. But then I thought about maybe there is another way to think about reality effect and topic modeling. If you think about how topic modeling assumes that books are just random (I think random?) words chosen out of bags of topics. Reality effect is also a sort of random process. The barometer is random because it is an object the author randomly decided to include in the novel. Another way of thinking about relating these lists to the reality effect—I asked myself are these lists saying to me, “We are real, we are real, we are real”? Hmm well the fact that you can take something nonrandom, treat it as random, then put it through an algorithm and get nonrandom results says something about reality. Is reality just a random set of events that we interpret to make meaning out of it?? So this got me thinking about Barthes and how life is meaningless (is that too harsh a word?) until you assign your own meaning to it. And that’s what we do with these lists! They’re not made according to a given topic title, but once the lists are generated, we give a title to each of the lists.

exercise7

Topic Modeling

3 min read

10Topics, 100 iterations

1) European power structures [king people country england power english war time men great prince general lord army france french enemy earl kingdom laws] 2) The businessman [made time manner make thought gave account fortune received give found care gentleman replied till affair money proper opportunity long]

50 Topics, 1000 iterations

3)Family [father young family mother fortune lady daughter son wife made years man time great good husband brother woman child marriage] 4)Literature [author great book genius read taste learned learning life wit works good piece history years public character work poet stage] 5)The good life (for men)[man nature life virtue human men good world natural opinion general means happy true degree light advantage equal make makes] 6)Adventures at Sea [fo ship men great made water sea richard fome adventures capt indians falconer told feveral god good time ifland make] 7)JustGirlyThings? [love heart passion affection soul mind happiness tender sentiments tenderness friendship heaven object present felt beauty eyes longer fortune mistress] 8)England v France [king english england duke army time war parliament france queen french henry crown general men earl began thousand made kingdom] 9)Pamela [mrs good master sir poor dear pamela ll mr jewkes lady hope thing god don jervis pray make mother father] 10)Power and Government [people country power laws government state present nation court order subjects ambition public great liberty kingdom constitution part arts authority]

A few common themes I noticed across the arrays of topics that I generated were: topics containing words related to people (mr, mrs, madam, his, her, etc.), topics relating to adventure or exploration, and topics centered around a particular culture/nationality or combination of cultures (British and Chinese, England and France). There were a number of topics similar to the Pamela topic that seemed to relate to just one volume in particular. These types of topics were indicated primarily by the presence of specific character names within the topic. I found it particularly interesting that a few gender-related topics appeared, which demonstrated the introduction of female subjectivity as described by Armstrong. I was also amused by the one or two topics that were just a bunch of words containing the letter "f" instead of the letter "s".

Using a larger number of topics and iterations definitely produced a greater variety of topics, but there was a lot of overlap between some of the topics. Using only 10 topics and 100 iterations produced some much more generalized topics.

Exercise7

Topic Modeling of Pamela

3 min read

After generating the first set of topics with the numbers given from the assignment, I noticed that they were all mostly the ones we had discussed in class so going through the chunks in the folder finding Pamela and decided to go through two chunks from the beginning and two from the end to compare them and see how they change. Since my mind categorizes things easier when they are in smaller pieces (and because the first run took half an hour on my computer), I chose to use 10 topics, 200 iterations, and 10 topic words for these Pamela chunk runs. I also included the stopwords list in all of my runs because the one practice run I ran without stopwords was overpowered with words that I did not find meaning in. (commas added to create more meaning/ how my brain created meaning)

First Run

  • [The Pamela Topic] : mrs master dear good poor pamela lady mr ll sir hope jewkes god letter jervis don williams goodness thought mother

  • [$$$] : money made time house means interest pounds present fortune thousand sum obliged hundred favour order paid found person put case

Beginning of Pamela

  • [leaving for class]: good life, mourning bed, god kind, loss grieved, part early
  • [Russet Potato Pamela] : crying praised indulge bring ashamed feared hard russet parents harm
  • [Everything Dreary] : english poor died trouble crying picture slightest sentiments marked created
  • [Dreary as well..]: tears afraid, back story wanted rest, reason heart silly matter

Ending of Pamela

  • [where did those 9 months go?] : saturday mother, thursday receive relation, reading remorse, mind married london

  • [epistolary plot-twist] : dear reward, born character wife unforeseen, reduced examples, lasted piety

  • [Pamela found God in Jamaica, ...when?] : leave jamaica, meeting earnest, story pleasure, gloucester set, received almighty (where did jamaica come in from?)
  • [Happy ending] : calne happy glad baby happily agreed lying reformation gracious present

Looking at the chunks from the beginning and ending of Pamela was interesting to track the changes as the topics did have a definite shift in connotation to them (the words generated from the beginning of Pamela were darker while the words generated from the ending were lighter in feeling). However, I tried to get chunks that were fairly separate from each other as I was not sure how much of the beginning or ending the chunks contained. The ending topics mostly seemed to focus on a light and happy mood and also on the child while those generated from the beginning relate how Pamela was constantly being attacked and afraid for her virtue although the word virtue only appeared once in my lists. I am not sure if these lists can fully support Armstrong’s argument however they did exhibit features that are more individual based and do not contain any mention of any economic connotations. A lot of the topics generated at the end of Pamela focus primarily on maternal aspects. I also thought it was interesting how I instantly related all the words in the topics to words regarding Pamela rather than Mr. B who constitutes a large presence in the novel. This is probably a result of the novel being mostly told in her words and everything Mr. B does is told in her response to it.

exercise7

Capitalism, pickles, and a very sad guy named Harry

3 min read

(All topics were produced with stop words removed)

The highlights:

5 topics, 1000 iterations, 5 words, full text corpus

Title: Imperialism. “king people don made war”.

50 topics, 1000 iterations, 5 words, full text corpus

Title: Social mobility. “time made mr house great”.

Title: ??? “peregrine pm mr pickle pipes”.

Title: Novels are EVIL. “author genius book bad devil”.

Title: Harry had a rough day. “harry mr fool quality cried”.

Title: France and England, a History. “war prince french england english”.

Title: The Structure of a Novel. “set hundred laid part story”.

Title: Virtue’s Fear of Flying. “honour heart flie foul happened”.

5 topics, 1000 iterations, 5 words, ONLY the chunks of Pamela.

Title: Sparknotes version of Pamela. “mrs jewkes thought poor thing”

10 topics, 3000 iterations, seven words, full text corpus.

Title: England Over Everything. “king people country power england english time”.

Thoughts:

The most striking thing I’m sure you all will notice about my topics is that they’re exceptionally short when compared to the baseline of 20 that the exercise guidelines outlined as an ideal starting point. I played with several different versions of topic modeling outputs over the course of this assignment, and I found that the five-word topic made the most sense to me and felt like I could draw the most meaning out of it. Thus, five-word topics abound in my highlights section.

Personally I have a tough time drawing a lot of meaning out of these topics, even in my more concise versions. I feel like the algorithm just doesn’t really create topics in a way that allows us to draw powerful conclusions from them, at least in this setting. Maybe there’s something I’m not seeing, but many of the topics just seem terribly incoherent.

I did notice there’s a little bit of Armstrong-y type stuff going on here, specifically with relation to “author genius book bad devil”. This reminded me of our discussions on Pamela and Shamela, and how as the female domestic novel emerged onto the literary scene, it often doubled/masqueraded as/was supposed to be a conduct book of sorts while it helped to shape what we know of today as the novel.

There’s also some imagined communities stuff floating around as well, a prime example being this “king people country power england english time” thing. This topic seems to suggest a unified English identity though the unity of king, people, and country over the course of time. It’s difficult to say whether this topic was drawn primarily from a single text within the corpus or not, but if it wasn’t, this topic gives powerful evidence to support Anderson’s theory that the novel was playing a crucial role in creating these imagined communities.

It’s possible to glean some interpretation from these topics, but as it stands right now, I liked them better for their comedic qualities.

ohwell

Exercise 7 - Topic Modeling

4 min read

Coolest Topics (all with stopwords removed)


Fulltext, 50 Topics, 1000 Iterations (20 topic word printings)

“Pamela in a Nutshell” → mrs sir master good pamela mr dear poor…

“Your Average Domestic Novel” → dear heart lady love father latter good hand hope…

Fulltext, 25 Topics, 100 Iterations (10 topic word printings)

“Time for Army Adventures” → king people war english england general adventures army…

“A Lovely Letter” → love dear letter heart adieu happy happiness moment friendship emily

Chunks, 25 Topics, 100 Iterations (10 topic word printings)

“Adventure = Money” → adventures guinea made make money moment moment sir master give

“Nouns R Important” → lady friend men woman world heart lord thing present happiness

Chunks, 25 Topics, 200 Iterations (10 topic words printed)

“A Brief History of England” → king people england war prince english power army

Chunks, 50 Topics, 1000 Iterations (20 topic word printings)

“A Gentleman’s Handbook” → honour time power thought favour give part liberty

“Intro to English Gov” → king people england duke france kingdom prince queen parliament…

“Probably Plot of Chrysal” → master guinea adventures made directly service general business person

Chunks, 50 Topics, 1000 Iterations (5 topic word printings)

“Nonsense” → de ia le ft la

Most interesting things in fulltext topic modeling: When I did the first run-through of 50 topics, 1000 iterations, and 20 topic word printings, I was struck by how some topics were very specific to one book. The topic I named “Pamela in a Nutshell” was a list which constituted 40% of Pamela. Throughout my experiments with different numbers of topics, iterations, and topic word printings (all using the fulltext folder), I found that it was indeed possible to find genres in some of these topics, and some of them are reminiscent of Tristram Shandy; however, I think it’s hard to evaluate because I’m going off what I know about existing genres to “check” these topics, so there’s no way for me to check the topics related to genre nuances I’m unaware of.

Cool things about chunks of novels topic modeling: It seems like the the chunks of novels we get a higher probability that each topic will more directly correspond with a particular book -- though I think this depends on the topic itself, because some are more general and others become more specific (like the ones about HistoryEngland). One weirdness was a topic that was suddenly very nonsensical (entitled “Nonsense”) that felt like stopwords had edged their way in there. When I looked at the topic more closely, it constituted 92% of Shandy1_22, which, upon further investigation, is a chunk that’s mostly in French -- so the topic confusion makes sense. Going further, it would be so helpful if a next version of this algorithm could scan for different languages within a text and either make a note of that or just create the topics in that language. Luckily, our corpus of works is (almost) all in English, so we don’t require this for our current use of the program, but it would be cool. Very cool.

General observations: The more specific novels have an easier time getting a topic all to themselves; for example, the HistoryEngland doc was represented with topics like “Time for Army Adventures” in the fulltext topic modeling and the topics “A Brief History of England” and “Intro to English Gov” in the chunks of novels topic modeling. All of the topic modeling feels a bit like a reality effect exercise, because it starts by essentially stripping away any possible meanings of the words and simply grouping them with other words they appear close to. This idea that words/objects could be in a novel without a meaning, to just be there, feels like what Barthes was talking about. (Maybe? I don’t have a full grasp on all the details of the reality effect.)

I preferred topic modeling the chunks of novels over modeling whole novel docs because modeling the chunks made it easier for me to understand how topics related to each individual book. Certain docs showed up with topics highly represented (high percentage of words in doc assigned to topic), which tells us that the particular topic in question is very important for that section/chunk of the novel. The other possibility was that one novel dominated the list of docs that included the topic in question, which means that the topic is important throughout the whole book because it shows up in more than one chunk of the novel. I liked being able to compare significances of topics for whole books with sections of those books -- this could be really useful for tracking themes throughout novels, and then throughout the history of novels, but I feel like we’d need a tool that lets us be more precise about we want to do.

Exercise7

Exercise 7 - Topic Modeling

3 min read

Topics

All had stop words removed with 1,000 Iterations.

Initial – 50 Topics, 20 words

Love in a Country House: time found house great made person young gave received immediately passion desire mistress desired opportunity satisfaction happened master answered ordered

War in England: king french war english general house parliament commons lord spain ministry nation forces fleet army march troops hundred england men

On an Adventure: ship men sea captain water richard made capt falconer indians board adventures boat feveral god ifland shore foon faid laft

10 Topics, 20 words

Writing a Letter: good mr sir master poor god man lady put great make ll mrs fool made thing till day dear honour

People: sir lady dear man charles miss mr love lord heart grandison good madam brother letter harriet clementina make woman Byron

Labels: thou ihe count thy ed thee callirrhoe letter madam day king night duke prince woman thefe art ft termes wife

20 Topics, 10 Words

Simple Country Life: man lady people country wife great money head good town

OCR Trouble: fuch fo reverie faid moft heart fome thefe fame thing

25 Topics, 15 Words, Text Chunks

Daily Schedule: day morning time night house till return place leave days returned left found set home

Evil: lord purpose danger length violence appeared received power revenge vengeance william stood instantly fears friends

Writing:

To answer the question about reality effect, Topic Modeling definitely led me to see names in a different light. Looking at the "People" topic, the names don't really suggest a particular meaning; the topic isn't really about anything. I would even be skeptical to say that names like "Charles", "Harriet", and "Clementina" are meant to take on any symbolic meaning. They are simply names used to make the reader perceive the story as being real, so that they think--that seems like a reasonable person for a person to have, and go on reading without having doubts as to the validity of the story.

Topic Modeling also raised some of the questions we worked with in Tristram Shandy about the authenticity of digital facsimiles of novels. It's interesting that the Topic Modeling tool didn't necessarily understand the words in the "OCR Trouble" topic, which had all of the long s's replaced with f's. But since the tool only grouped words based on their proximity to each other, it may not have mattered. Though it's possible that the words "such so reverie said most heart some these same thing" also appeared in texts written after the long s stopped occurring, but that these texts didn't contribute to the Topic because the tool didn't recognize "such" and "fuch" as actually being the same word.

Tags: