Deprecated: __autoload() is deprecated, use spl_autoload_register() instead in /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php on line 17

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 22

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 23

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 25

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 26

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 27

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 28

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 29

Warning: session_set_save_handler(): Cannot change save handler when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Data/AbstractSQL.php on line 86

Warning: session_name(): Cannot change session name when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 45

Warning: session_start(): Cannot start session when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 46

Warning: session_cache_limiter(): Cannot change cache limiter when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 47

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 57

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 58

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Webmention.php on line 376

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Webmention.php on line 377

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/PubSubHubbub.php on line 41

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/PubSubHubbub.php on line 42

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 998

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/external/bonita/includes/Bonita/Templates.php on line 170

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/templates/default/shell.tpl.php on line 5

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/templates/default/shell.tpl.php on line 6
The Rise of the Novel
Skip to main content

Final Paper Plan(s)

3 min read

Throughout the course, I have noticed a shocking number of similarities between the novels and theories that we have examined and Don Quixote, which I am reading for my Spanish class. This may not seem surprising since Don Quixote is often considered a candidate for the title of “the first novel,” but it is interesting to me because Don Quixote was first published well over a century before the first novels that we examined in this class. It is also intriguing to look at the different traditions that our novels and Don Quixote evolved out of, as well as the cultural elements that influenced the novels. I have made a lot of interesting connections between our novels and Don Quixote throughout the class, but I have narrowed my ideas for the final paper down to two potential options. My first idea would be comparing and connecting Don Quixote to Northanger Abbey. They are an unlikely match, since Don Quixote is a legendary work in its own right that essentially made Cervantes, whereas Northanger Abbey is a lesser-known work by a famous writer. Additionally, Don Quixote was published more than two hundred years prior to Northanger Abbey, and there are obvious gender differences (authors and characters) as well. Nonetheless, I noticed a lot of striking similarities between the two novels which I think would be really interesting to explore, including: • Both novels parody the conventions of their genre • Yet at the same time they make a case for the novel o And they are both better versions of the previous novels that they are lambasting. In mocking the genre, they are simultaneously improving it. • Both protagonists are not naturally-born heroes. They are unlikely heroes who must develop into their protagonist role, which is a rupture with the previous literary traditions. • They both also establish an interesting relationship between novels and real life. Through their protagonists, they make fun of those who would try to see novels as exact representations of real life. Yet they also prove that novels still hold value without attempting to adhere directly to reality. o This is related to the debate/evolution of reading as pure entertainment or something more involved/useful. Both novels definitely deal with this theme, although I believe their messages diverge a bit. I am not 100% sure how I am going to synthesize all these connections into a clear takeaway about the impact of comparing Don Quixote and Northanger Abbey on the rise of the Anglo-American novel. However, I do think there is a lot to be said about the work that these novels do in terms of advancing the novel, tweaking and developing its existing conventions, and dealing with the task of reading. If I do not go that route, I would like to look at Don Quixote as it connects to Daisy Miller. There are fewer overlaps between these two novels, but a particularly striking connection between them is the idea of a character’s hidden subjectivity and an unreliable narrator. I think I could say a lot about the way that these elements, present in different ways in both novels, deal with subjectivity on the part of the reader and the advent of reading as an involved, interpretive task, rather than pure entertainment. Don Quixote especially goes even further than Daisy Miller in this area, so it would be interesting to look at the implications of that on the rise of the Anglo-American novel. Obviously, like my other idea, I still need to think through an exact synthesis of the impact of these connections.

Experimental Bibliography

2 min read

For my descriptive bibliography, I chose to do Clara Reeve's The Champion of Virtue. It has a beautiful frontispiece depicting a scene from the novel as well as a quote from Horace on the title page that translates to "Fictions meant to please should approximate the truth."

When considering what is missed by the traditional bibliography, I thought about plot, context, and visual depictions that would give more information than printed words. After a bit of research, I learned that the Champion of Virtue was republished in 1778, one year after its initial publishing, under the name the Old English Baron (which also edited by Samuel Richardson's daughter). This republishing has a preface that essentially outlines what Clara Reeve looked to accomplish in this novel. Both the Champion of Virtue and The Old English Baron are gothic novels based on the outline put out by Horace Walpole's The Castle of Otranto, which is often attributed as the first gothic novel. Clara Reeve's preface to the Old English Baron tells the reader what her issues were with the gothic novel ideal that Walpole's novel laid out: it wasn't realistic. In Mckeon's terms, the gothic novel should be a mixing of both romance idealism and naive empiricism. The Castle of Otranto's extravagant, supernatural explanations for everything were not believable, making the novel too fictitious to be interesting. Reeve's preface says, "...the Castle of Otranto; a work which, as already has been observed, is an attempt to unite the various merits and graces of the ancient Romance and modern Novel.. Had the story been kept within the utmost verge of probability, the effect had been preserved, without losing the least circumstance that excited or detains the attention."

My idea is to compare and contrast, mostly quantitatively, the three novels noted above. Popularity rankings, TextWrangler lists, topic modeling, and other quantificational methods could help elucidate the "realistic" nature of each novel as well as showing their reception. My intention is to visualize several of these methods to show how each novel compares in terms of realism and hopefully place the Champion of Virtue in it's historical context as a defining novel in the creation of the gothic novel genre.

Vanishing Point - The New Yorker

Extraordinarily relevant to our discussions on imagined communities and the relationship between social media and the novel.

Exercise 7

3 min read

50 topics, 1000 iterations, 20 printed words:

A business life: made time gave found leave place manner day days return company told long received paris acquainted happened returned knew till

Typical Saturday night: good table money fellow wine company people half hundred give eat box glass guineas pretty made poor drink turned peace

War victory story: war army general english french enemy time country men battle enemies england forces command officer number field troops part success

Conceited Autobiography: great genius taste learning learned wit character opinion poet piece stage works play author judgment characters read friend age merit

All things divine: god man good religion world church heaven true soul divine spirit things body fear human christian truth faith life death

25 topics, 1000 iterations, 10 printed words:

Sounds kinky: passion time made found person husband affection mistress lover fortune

Maritime adventure: captain ship made men great board sea found time place

Basically Robinson Crusoe: man make thing thought time give good great find world

50 topics, 1000 iterations, 5 printed words:

Typical Swarthmore student response to “How are you?”: tears heaven death life grief

Me after reading one page of a novel: great page world learned learning

Conclusion: At first, I thought that reducing the printed words would give us a more concrete and accurate subject topic of novels, but it seems to do the opposite. With 20 or even 10 words, I was able to grasp a bit more of what the novel was about rather than a somewhat superficial topic produced by a 5 words printing limit.

Here are a number of common themes I found through the different iterations of modeling: war, navy, adventure, aristocracy, money, and of course virtue. The the importance of the notion of virtue can be summed up in the topic: honour conduct character virtue reason.

Now I come to how this exercise reminded me of Tristram Shandy. While reading Tristram Shandy, I kept thinking that the novel was about nothing, and everything at once. After about 50 pages, I couldn’t rule out any subject in the world as a potential digression topic for the narrator of Tristram Shandy. The same can be said for topic modeling. And even though Tristram Shandy forms a more coherent narrative than these topics, I’m not sure if it can be reduced to one or even a number of topics. Indeed, I do think that Tristram Shandy is the most interesting novel about nothing. Even if we were to try to reduce it to a topic, it would probably be muddled with mundane words such as “uncle”, “father”, “make”, “give”, and others of the sort.

These mundane words are also sprinkled across all of the topics, no matter what number of words printed. This, in turn, creates a sort of reality effect. These seemingly useless words are necessary to remind us that novels cannot be reduced to substantial and important topics, and this makes these topics more believable. For example, “good table money fellow wine company people half hundred give eat box glass guineas pretty made poor drink turned peace” makes for a much more interesting and intelligible topic than if we were to take some words out and produce “money wine company glass drink poor peace”. I do think that it is much easier to construct a plausible story from the former than the latter. Novels and topics both need these mundane words to produce a more intelligible and “real” story. I think this also relates to my point about topics with fewer words seeming more superficial than topics with more.

The Real Don Quixote - Radiolab

Thanks to Sophia for pointing this Radiolab episode out.

Experimental Bibliography

2 min read

As I skimmed Anonymous’s The Indian Adventurer: History of Mr. Vanneck and found what little critical writing I could, I was surprised to learn that it was published just before a “crackdown on sexually graphic material” in 1787 (Malhotra, Ashok, Making British Indian Fictions 152). As such, being the sensational novel about an intensely horny foreigner that it is, it is at times pretty racy. Although some of the content we’ve read has been sexual in nature, it seems to me that the representation of sex acts is more-often-than-not something to be alluded to rather than explicitly stated. The Indian Adventurer, standing in opposition, is at times almost downright nasty. I wondered while I read what this novel would reflect if placed in conversation with a more modern medium.

The narrator, although he begins his story as an almost-successful German surgeon, he ends up flunking out of med school because of too much partying, and joins the Dutch East India Company and gets shipped off to several “oriental” cities: Delhi, Patna, Calcutta, and Cossimbuzzar. As such, he becomes wrapped up in the proto-English imperial project, and although he is sometimes visibly altruistic (he begins offering free health care to the cities’ impoverished citizens), he remains an extension of the corporate colonisation of South Asia. As with the sexual aspects of the work, I wondered what we could glean from juxtaposing the text with a representation of contemporary imperialism.

The final aspect I’d like to investigate is the religious tinge to the work. Although our unnamed narrator finds a humble nobility in the Brahmins of the region (much reminiscent of popular representations of the American Indian), he subtly denigrates their religion as well as Protestantism’s main rival: Catholicism (see Malhotra (it’s really good)). I’d like to explore the oriental approach to religion as well as its attachment to rising nationalistic sentiment. My plan is to put these three thematic topics in conversation with a more modern manifestation of themselves, through a collage of commercials, films, and television.

Topic Modeling

2 min read

This was a pretty entrancing process and reminded me a lot of the Dadaist / Burroughsian "cut-up method," in which a text is dissembled and rearranged, usually randomly, to create a new text. It struck me that we were doing something similar only instead of using newspaper clippings or randomly picked words, we're using a corpus of 1760's novels, and instead of scissors, we're using the topic modeling tool. I realize that the end goal of topic modeling is to learn something about the corpus or about traits of the English novel of this time period, but I think it may prove instructive (or fun, at least) to view some of these "bags of words" as poem-like texts in and of themselves. I've constructed a few of them stanzaically (sorry for using this "word;" it's the best I got) and naming them.

House of Three by 10 topics, 200 iterations, 10 words Mr. time told great house Found gentleman Lady day mother

Q&A by 10 topics, 200 iterations, 10 words Miss Madam? Mr. Man Dear Lady? Good Sir Charles?

English England People by 50 topics, 1000 iterations, 20 words English England people resolved war country power Parliament made men great London forces victory

This last one, from 5 topics, 200 iterations, 5 words, doesn't quite work as a poem, but I love it as a title. Imagine hearing, "How are you today, 'Mr. Good-Lady-Time, sir?'"

Maybe this approach doesn't hold much water formally or academically, but personally I felt like once I pretended that these jumbled up assortments of topic words were cohesive, intentional texts, it grounded them in my mind as things that are actually saying something. By imagining them as poems based around a theme (i.e. a topic), I was better able to interpret them and then broadly apply this interpretation to the corpus.

Stuck!

6 min read

David Blei’s article in the topic modeling issue of the JDH helped me focus my thoughts after messing around with the topic modeling software for a while. I wish that I had a more complete understanding of the math at work behind the software (or math past the beginnings of AP Calculus at all, actually…) because I think that would really help me to focus the tool of topic modeling towards accomplishing my goals. This is actually what I’m grappling with in thinking through the implications of topic modeling for criticism and reading practices, and I am finding myself challenged by seeing not only the limitations but also the affordances of this software. Specifically I guess I am trying to take Blei’s article, my own feelings about the topic modeling exercise, and Stephen Ramsay’s piece, and put them into a productive conversation or make them help me reconcile some questions I have about close-reading vs. computer-assisted approaches.

Basically, I had difficulty approaching topic modeling as something I could use for anything other than “proving” my pre-existing conceptions about a book or a corpus. This feels as though it shouldn’t be the point — as Ramsay and Blei both explain, the point of using computer-assisted approaches like topic modeling in order to produce criticism is not to confirm preexisting assumptions or provide “proof” for a theory. Blei writes of the humanist scholar using topic modeling, “She discovers that her model falls short in several ways. She revises and repeats,” and Ramsay similarly points out that the end goal of literary criticism is not really an end goal at all, but merely a desire to open up further and more interesting and more productive conversations about texts, conversations which can be jump-started and supplemented by the implications of digital approaches. So, if the point isn’t to prove our pre-existing theories about corpuses or books, then how do we let topic modeling open up new questions for us?

I guess that the particular issue with which I’m struggling is the basic assumption of topic modeling, that texts are composed out of topics, when, as we discussed in class, the topics are actually created out of the texts. The loops of this are complicated enough — but it seems as though the dominant idea behind topic modeling that counting these bags of words can actually tell us something significant about the corpus as a whole or about individual books. I can’t figure out how to reconcile this with what I feel like is a basic assumption of literary theory: that close reading, by humans, is the answer to creating good criticism, that we are sort of looking for the “figure in the carpet,” which is both in the text and on the page but not plainly comprehensible or countable (though the conception of literary criticism in James’ story is odd, as it refers to Corvick’s barely-begun work on Vereker as “the last word,” as though a critic could or would want to write a “last word” on any work of fiction actually worth reading and discussing…). This exercise brought up some of the same ideas we discussed when using the NER, of a flattening of data. As the software doesn’t know anything about what the words actually mean and thinks of them as numbers, it makes them all appear equally important (despite its probabilistic structure). It assumes that at least some of these collections of words are significant in some way, or that we can adjust the settings on the software enough to create significant meaning out of these topics, or that we can achieve Ramsay’s goal of having productive new discussions using these topics/that Blei’s ideal humanist can be pushed to revise. I don’t want to fall into the intentional fallacy, but it seems like the very point of reading, at least after interpretation and basic comprehension become in some ways the same thing in novels (as we’ve marked this shift in James), is that all words — even after one eliminates stop words — in a text are not created equal and do not have equal significance. This isn’t to say that one can’t still use topic modeling in a productive way and I hope I’m not just uselessly handwringing here — it’s just to question some of the basic assumptions that go into creating the approach in order to pay attention to these as we try to build a criticism out of these technologies.

To try to pull all of what I am saying into a more coherent and final expression: my main question is if topic modeling and other approaches like these represent just a digital form of close reading, one of which humans are incapable, or if topic modeling and similar approaches are completely at odds with close reading practices and assume that meaning is actually machine-readable in some way. I guess my real concern is that the line critics must walk seems extremely fine — that is to say, distinguishing between Ramsay’s conversation-opening function of computer-assisted approaches and the more hunt-down-evidence ideas I can’t seem to free myself of while using this software/which Blei seems to somewhat reference, writing that he hopes topic modeling will “point us to such evidence” for the ideal humanist scholar. How do we/I deal with/combat the basic assumptions built into topic modeling in order to make it generative and useful as a tool? How have others done this? What is the nature of the criticism that can come out of using topic modeling? I don’t want to set up a straw man of humanists not reading and just using computer-assisted approaches, but I want to be careful that I am still human-reading and close-reading my books while using these technologies, treating them as supplementary methods rather than the end of human reading. Of course I don’t think anyone is doing this — but I am hesitant or at a total loss, really, to see what I could say about the topics I got out of this exercise having not read the books in the corpus!!

topicmodeling

Exercise 7

5 min read

50 topics, 1000 iterations, 20 printed words, stop words removed

feelings and science

  1. ear mother devil captain fair er graces bath spirit doctor air playing hall delight vii theatrical water ne specimen dancing

a story of nature and humans

  1. nature great years soul eyes prince eye ground light twenty book fear back human turn earth friends city family caught

a pilgrimage

  1. kind thy thou eye turning divine appointed held drew bore glory ad noble grief preface alas worthy weak break en

10 topics, 1000 iterations, 20 printed words, stop words removed

a war between england and france

  1. king people war english england time men army great country power kingdom duke prince queen france nation began french general

love and family

  1. time miss made mr lady manner love make found great told person young thought brother replied house passion woman part

50 topics, 1000 iterations, 20 printed words, no stop words removed

love letters

  1. i and which you or your for am letter of upon page as any love are our it is from

adventure

  1. knight sir launcelot mr you your crowe squire captain clarke aurelia crabshaw greaves adventurer being darnel dolly t ye justice

50 topics, 500 iterations, 20 printed words, stop words removed

young love

  1. love happy life find man give men fine hear day tis beauty young ill dear er charms delight live formed

the court

  1. country service public making man gave gentleman pay appearance lordship company thy character fortune conduct manner interest laws care thee

exploration

  1. fo ship men sea fome made richard feveral found capt falconer water indians adventures god boat board ifland shore captain

Reading through these topics reminded me very much of Tristram Shandy. The variation of words in a topic list seemed similar to Tristram Shandy’s tangents and detours—although Tristram Shandy guided the reader through these tangents rather smoothly while these topics are lists of rather disjointed words. Without someone to explain their connectedness, it can be rather difficult to find an overarching theme. I doubt some of my labels do the topics justice. I should note that not all of the topics were this disjointed; some of them seemed to fit together rather well.

I thought I’d try to connect this to Armstrong’s _Desire and Domestic Fiction_, but surprisingly the topics (out of the ten I listed here) I found dealing with writing, subjectivity, and love aren’t necessarily gendered. Topic five has mentions both men and women while topic six does not mention a gender. Topic eight mentions only men—though with the combination of the two words “find man”, it could be about a woman looking for a man to marry or fall in love with. I feel like this case is harder to make when “woman” or “women” isn’t mentioned in the topic, though.

It could be that I also skipped over some topics that might have connected subjectivity to gender. There were quite a few.

(Okay, I went back and looked. One is [lady great time letter company found conversation behaviour woman immediately order mistress satisfaction love room acquaintance town obliged proper london], and another is [good made thought great make time give person reason woman put answer knew pleased long occasion words heard head leave]. So they're there, I just didn't include them in my ten.)

I think I can understand what is meant when it is said that topic modeling can make the novel seem like all reality effect. Topic modeling gives list of words, frequently including nouns (there are even more nouns with stopwords removed) that sometimes seem to have no connection to each other. They’re lists of words with no context to explain their presence. They’re just there. I think it’s this lack of context that makes the novel seem like all reality effect. Without any context or connection to each other, these words just… exist. And like the reality effect, words or nouns that don’t have a deeper meaning or connection to anything else—words that just exist—are there only to say that they are real. The reality effect is definitely at play here. One thing I think can diminish the reality effect in topic modeling is to find connection or meaning between the words listed in each topic. Sometimes the words seem like they’re just “there”, but other times it’s easier to find a connection to between them, or one gets the feeling that the words in this topic do have a connection but one hasn’t found it yet. For example, topics four and six in my list seem to have a straightforward connection. In this case, we don’t really see the reality effect as much because the words do have another meaning and they’re not there only because they exist. I think another thing about topic modeling that makes the reality effect more prevalent is that it takes words out of context. It could be that the words in this list do have other meanings or multiple meanings in the novels they appear, but without context we don’t really know what they are. These words might do more than just exist in the novels they appear in.

Exercise7

Topic Modeling

3 min read

1000 iterations, 50 topics japan taycho dairo cuboy chinese japonese fika orator kaka strot yak brut tiffi empire farm yesso beast body tartary council

--Travel, foreign words or making fun of foreign words
mr lady gentleman miss ll wife town man don honour sir money made began landlord master fine company people husband Titles and honour war country great enemy general king army made long enemies length men peace part command head high brought fell number -- Let’s go to war

ather friends young time found passion son heaven soul death utmost friendship longer means happiness appeared felt tears purpose arms -- Letters to my father

fo ship men fome sea made found captain water richard capt adventures falconer god indians feveral board boat till ifland --Let’s go on an adventure!

don chevalier memoirs love de king gold fine young spain point arc knight pier pierpoint excellent things large fet age -- More adventure and looking for treasure

Contrary to my expectations, the list of 10 topics did not make a lot of sense because it was a lot less focused. There were more words and they did not fit together as well. I thought that if I narrowed my topics I would need to also increase the iterations. I did this (increasing iterations to 1500 instead) and it was a little better. Names throw things off I think. I think the stop words are supposed to get rid of names? But they somehow slip through.

Looking at the lists that were generated, I thought about what Professor Buurma said about how topic modeling could be seen as just excessive reality effect (hopefully I am not completely incorrect in this paraphrase…). But at first I couldn’t see how they related. I think of reality effect in terms of describing objects or locations. I think because most of the examples Barthes uses are about describing a location, a room, etc. He mentions the barometer. The thing is, the barometer is probably only mentioned once in the book, so I don’t think it would appear on these lists. I think most of the examples of actual reality effect elements would not be significant enough to appear in topic modeling. But then I thought about maybe there is another way to think about reality effect and topic modeling. If you think about how topic modeling assumes that books are just random (I think random?) words chosen out of bags of topics. Reality effect is also a sort of random process. The barometer is random because it is an object the author randomly decided to include in the novel. Another way of thinking about relating these lists to the reality effect—I asked myself are these lists saying to me, “We are real, we are real, we are real”? Hmm well the fact that you can take something nonrandom, treat it as random, then put it through an algorithm and get nonrandom results says something about reality. Is reality just a random set of events that we interpret to make meaning out of it?? So this got me thinking about Barthes and how life is meaningless (is that too harsh a word?) until you assign your own meaning to it. And that’s what we do with these lists! They’re not made according to a given topic title, but once the lists are generated, we give a title to each of the lists.

exercise7

Topic Modeling

3 min read

10Topics, 100 iterations

1) European power structures [king people country england power english war time men great prince general lord army france french enemy earl kingdom laws] 2) The businessman [made time manner make thought gave account fortune received give found care gentleman replied till affair money proper opportunity long]

50 Topics, 1000 iterations

3)Family [father young family mother fortune lady daughter son wife made years man time great good husband brother woman child marriage] 4)Literature [author great book genius read taste learned learning life wit works good piece history years public character work poet stage] 5)The good life (for men)[man nature life virtue human men good world natural opinion general means happy true degree light advantage equal make makes] 6)Adventures at Sea [fo ship men great made water sea richard fome adventures capt indians falconer told feveral god good time ifland make] 7)JustGirlyThings? [love heart passion affection soul mind happiness tender sentiments tenderness friendship heaven object present felt beauty eyes longer fortune mistress] 8)England v France [king english england duke army time war parliament france queen french henry crown general men earl began thousand made kingdom] 9)Pamela [mrs good master sir poor dear pamela ll mr jewkes lady hope thing god don jervis pray make mother father] 10)Power and Government [people country power laws government state present nation court order subjects ambition public great liberty kingdom constitution part arts authority]

A few common themes I noticed across the arrays of topics that I generated were: topics containing words related to people (mr, mrs, madam, his, her, etc.), topics relating to adventure or exploration, and topics centered around a particular culture/nationality or combination of cultures (British and Chinese, England and France). There were a number of topics similar to the Pamela topic that seemed to relate to just one volume in particular. These types of topics were indicated primarily by the presence of specific character names within the topic. I found it particularly interesting that a few gender-related topics appeared, which demonstrated the introduction of female subjectivity as described by Armstrong. I was also amused by the one or two topics that were just a bunch of words containing the letter "f" instead of the letter "s".

Using a larger number of topics and iterations definitely produced a greater variety of topics, but there was a lot of overlap between some of the topics. Using only 10 topics and 100 iterations produced some much more generalized topics.

Exercise7

Exercise 7: Topic Modeling

3 min read

Domestic life: house company gentleman person time acquaintance lordship great place conversation young lord received friend met day agreeable evening visit lived

Family: father son young daughter mother family child years wife fortune good children life age great year time married estate left

Woe is me: tears heart heaven death grief soul distress life eyes tender comfort melancholy despair pity unhappy moment sorrow alas days felt

The first Moby Dick: ship captain sea made men board found wind great land shore place boat richard water till voyage put sail immediately

Humanity: nature man human virtue life state god men world religion sense spirit natural power divine reason creatures mankind subject soul

Sophistication: country people men manner great find found order pleasure proper generally present english china society art world mankind taught peculiar

I kill, therefore I am: war army general french english enemy country men battle peace time enemies england number success forces command france troops part

Let’s get it on: passion love made found time person de horatio mistress lover affection loved knew louisa gave words nature tho thought thoughts

Someone studied for his SATs: peregrine pm pickle young hero consequence order pipes disposition sooner began view gentleman immediately satisfaction commodore great manner opportunity countenance

Authors who can’t spell: count ihe ed ft termes fee duchefs becaufe madam duke ihould mc fa paris myfelf cafardo mifs fe days wa

Coming into the assignment I wasn’t entirely convinced as to the usefulness of topic modeling; it seemed too mathematical and arbitrary to be of much help. After running the program, however, I was pretty impressed. Some topics were indiscernible, but some of them represented clear obsessions of the 18th century novel. And while some topics were compiled almost entirely from a single novel, some, such as the “Woe is me” topic, were sourced from a wide variety. I make this observation because topics from single novels are interesting if we are curious about that specific novel, but generally unhelpful for an overview of themes of the century. The fact that some of the themes come from a diverse set of novels, however, demonstrates the usefulness of the tool. I would be interested to do a similar thing on a set of contemporary novels. Many of these themes were pretty predictable and fit into my prior perception of novels of the period based on what we’ve read. I don’t have a similar sense of what topics I would predict for today’s novels, and I wonder if that’s because they’re less repetitive (doubtful), there are simply more of them, or just because I don’t have the benefit of hindsight.

Exercise7

Assignment 7: Amusing Gibberish

3 min read

I experimented with a few combinations of algorithms. Although some topics were complete gibberish, the few that did make sense were quite amusing and relatable. In fact, because there is no context at all, I just let my imagination take over, giving me an artificial realistic effect. Topic modeling allowed me to explore different ways of finding trends in the text. In thinking of “The Reality Effect”, the mixtures of seemingly random words actually stand out to give a unique sentence structure and a realistic effect. I found many topics with words that I could relate ideas, parts of history, and even novels to.

Topics //50 topics, 20 printed words, 1000 iterations -Wisdom: “love passion heart make happy real tender friendship lover loved sentiments knew moments sincerity inclinations consent indifference cruel letters hopes” -“Angela’s Ashes”: “child town mother married poor remember lay life friends husband pray wife work quit world died concluded cards nurse discovery”

It is amusing that some topics have fragments of sentences as it’s list of words. For example, I found the words of the first topic to be quite wise, in which “love” and “passion” do make “real tender friendship” and that “indifference” and “cruel letters” are detrimental to “hope”. The words of the second topic reminded me of a novel I read, “Angela’s Ashes”, in which Angela, the mother of the author, leads a tough life trying to take care of her children while living in poverty and a husband who is an alcoholic.

//40 topics, 10 printed words, 100 iterations -Group Project: ”life brought friends hands great fair idea proceeded weak advice” -War: “people war interest government troops expence favour measures law king” Live your Life: “leave world hope home coach love town evening live moment”

The third topic is best relatable in thinking of group projects. Especially in engineering labs, we have great ideas on designing circuits, but our methods and ways of implementation are rather “weak”. The fourth topic easily reminds me of old kings and disillusioned governments sending young soldiers to die in wars. The fifth topic reminds me of the song by Rihanna in which she reminds us to just live our lives for once, instead of being controlled by the countless pressures of life.

//30 topics, 10 printed words, 100 iterations -Artificial Creation: “man made time person called make lost place forward ran” -Life: “mind heart pleasure honour heaven give peace person friendship happiness” -Love: “love passion time answered mind made lover sense beauty affection”

//20 topics, 10 printed words, 100 iterations -Desire: “love pleasure long heaven left found passion beauty prevent desire”

The four topics above remind me of the desires of humans. We desire to have an enjoyable life, finding love and settling down. Yet, we limit ourselves with the artificial conception of time, which controls our lives, from when we wake up to when we die.

//10 topics, 10 printed words, 100 iterations -Freedom of Speech: “man favour voice heart heard person fear knew present longer”

This last topic reminds me of one of the strongest human desires: the need to speak our minds and follow our hearts, only fearing that our voices will be taken away.

Topic Modeling of Pamela

3 min read

After generating the first set of topics with the numbers given from the assignment, I noticed that they were all mostly the ones we had discussed in class so going through the chunks in the folder finding Pamela and decided to go through two chunks from the beginning and two from the end to compare them and see how they change. Since my mind categorizes things easier when they are in smaller pieces (and because the first run took half an hour on my computer), I chose to use 10 topics, 200 iterations, and 10 topic words for these Pamela chunk runs. I also included the stopwords list in all of my runs because the one practice run I ran without stopwords was overpowered with words that I did not find meaning in. (commas added to create more meaning/ how my brain created meaning)

First Run

  • [The Pamela Topic] : mrs master dear good poor pamela lady mr ll sir hope jewkes god letter jervis don williams goodness thought mother

  • [$$$] : money made time house means interest pounds present fortune thousand sum obliged hundred favour order paid found person put case

Beginning of Pamela

  • [leaving for class]: good life, mourning bed, god kind, loss grieved, part early
  • [Russet Potato Pamela] : crying praised indulge bring ashamed feared hard russet parents harm
  • [Everything Dreary] : english poor died trouble crying picture slightest sentiments marked created
  • [Dreary as well..]: tears afraid, back story wanted rest, reason heart silly matter

Ending of Pamela

  • [where did those 9 months go?] : saturday mother, thursday receive relation, reading remorse, mind married london

  • [epistolary plot-twist] : dear reward, born character wife unforeseen, reduced examples, lasted piety

  • [Pamela found God in Jamaica, ...when?] : leave jamaica, meeting earnest, story pleasure, gloucester set, received almighty (where did jamaica come in from?)
  • [Happy ending] : calne happy glad baby happily agreed lying reformation gracious present

Looking at the chunks from the beginning and ending of Pamela was interesting to track the changes as the topics did have a definite shift in connotation to them (the words generated from the beginning of Pamela were darker while the words generated from the ending were lighter in feeling). However, I tried to get chunks that were fairly separate from each other as I was not sure how much of the beginning or ending the chunks contained. The ending topics mostly seemed to focus on a light and happy mood and also on the child while those generated from the beginning relate how Pamela was constantly being attacked and afraid for her virtue although the word virtue only appeared once in my lists. I am not sure if these lists can fully support Armstrong’s argument however they did exhibit features that are more individual based and do not contain any mention of any economic connotations. A lot of the topics generated at the end of Pamela focus primarily on maternal aspects. I also thought it was interesting how I instantly related all the words in the topics to words regarding Pamela rather than Mr. B who constitutes a large presence in the novel. This is probably a result of the novel being mostly told in her words and everything Mr. B does is told in her response to it.

exercise7

Topic Modeling

3 min read

Topic: female roles; set-up: 50 topics, 1000 iterations, 20 printed life female art pleasure company public manners appearance friends love natural education opinion generally world fortune beauty emily young pretty Topic: male roles; set-up: 30 topics, 750 iterations, 15 printed people country men laws nature state man spirit religion equally natural society great generally influence topic: female appearances; set-up: 30 topics, 750 iterations, 15 printed company lady ladies fine good women person woman time beauty young gentleman agreeable fair appearance topic: female domesticity; set-up: 50 topics, 1000 iterations, 20 printed family time manner account woman part sister possibly moment degree mildmay nature present colonel naturally louisa give air delicacy means topic: female domesticity and appearance; set-up: 50 topics, 1000 iterations, 20 printed company fine ladies play pretty people dress table gentlemen face dinner appearance figure laugh good set head hair taste glass topic: female appearances as ornaments; set-up: 50 topics, 1000 iterations, 20 printed fine beauty ladies eyes round beautiful taste rich eye dress fair place fancy full air large country appearance figure picture topic: manpower; set-up: 10 topics, 500 iterations, 10 printed man great nature good world author people public history genius **also interesting about this folder was that there was only one mention of a female term, “lady,” which co-occurred with male terms Topic: female socialization; set-up: 50 topics, 1000 iterations, 5 printed company ladies young person acquaintance topic: male legal matters; set-up: 50 topics, 1000 iterations, 20 printed justice gentleman law man person affair made revenge put gave brought prison order carried knew court fellow bring common committed topic: male contributions to society; set-up: 50 topics, 1000 iterations, 20 printed pleasure world man find nature men wisdom mankind friend art pleasures happiness taught china society manner knowledge objects order rest

I was interested in lists that mentioned either only female or only male terms and their relationship to Nancy Armstrong’s argument about novels’ construction of societal gender roles. I found that a lot of these topics, when they only mentioned specifically female or male terms such as “man” or “lady,” generally also included nouns that have become associated with gender roles in 18th and 19th century society (ladies, pretty, appearance, home, etc. versus man, public, laws, state, etc.). Since the MALLET tool does not recognize the actual meaning of these words, there is no way it could group these topics together based on associations. Instead, it would only list them together based on their common co-occurrence in novels, therefore supporting Armstrong’s claim that novels create these gendered societal associations and not the other way around. I began investigating Armstrong’s argument in the context of these topics because of an initial topic list that seemed to contradict or at least complicate her claim, the list in which female and public appeared together. This seemed to suggest to me that perhaps women were represented as public rather than private figures in fiction but that real-life society relegated them to the home. However, the rest of the topics that involved “female” terms associated women with the home, or with public appearances and beauty. After these results, I looked back at that original topic list and it too appeared to be in line with Armstrong’s assertion about the novel’s influence on gender roles/norms. Even though female and public co-occurred in that intriguing list, the rest of the words in the list such as art, manners, pleasure, beauty, appearance, etc. did not suggest women as public figures as I had originally assumed, but rather women as public ornaments.

Topic Modeling

4 min read

50 Topics, 20 printed words, 100 iterations, full text -

Pamela Overview - "master mrs ll pamela good poor dear jewkes" Imperialism - "king english people army england country kingdom" Domesticity - "emily moment friendship heart woman tenderness women happy"

50 topics, 20 printed words, 100 iterations, chunks -

The "s" - "fo fuch fome faid fhe fee moft thefe ihe foon ed firft thofe" The Public sphere - "money proper interest fortune present public success state" Seafaring - "captain ship board made sea men wind place land shore" The house and private sphere - "night morning till bed day"

20 topic, 20 printed words, 1000 iterations, full text -

Letter writing - "letter page love dear adieu life"

I thought that this was an incredibly interesting assignment. It was pretty eye opening to see lots of rather strong and thought provoking topics come out of this wholly syntactic algorithm as well as some rather humorous topics to see where the program failed. I chose to tinker around a bit with the settings and see how exactly it would affect the outputs. For the most part, many of the same reoccurring themes, as noted above, came up in all contexts. Seafaring, imperialism, English nationalism, domestic items and ideas, women, letter writing, and many other rough topics would appear in nearly all of the constraints that I could conjure up. Interestingly, lessening the number of topics produced seems to produce topics that are more diverse and difficult to interpret. This isn't too surprising, seeing as limiting the number of topics would force some words to go together that possibly wouldn't in the case of a larger survey (I'm still a bit fuzzy on the specifics so maybe I have this a bit mixed up). But, something that struck me as slightly odd was the constant reoccurrence of topics in which the only discernible theme was the modified "s" character. I guessed this is likely because some novels use this character and some do not, so the ones that did would have those terms grouped together often. But, this wouldn't explain why nearly all of the words in the topic contain this character. Also, the breakdown of these topics seems to include an even number of references in all of the novels, suggesting that it may not be due to this focusing of the elongated "s" character in some novels.

These topics certainly showed some of the ideas we have seen portrayed in the readings. For example, one of Nancy Armstrong's main arguments in Desire and Domestic Fiction was that the creation of female subjectivity in novels challenged social norms and produced a new form of the novel. We see that several topics (Domesticity above) group women with the domestic realm and characteristics that we would define as psychological rather than political, thus creating this gender divide that she references. Likewise, Habermas argues that the division of the public and private sphere is something that can be seen in the architecture of houses in this time period. The division creates a divide between public and private that has political and economic repercussions. Several topics engage entirely with the idea of the house and family, while others that discuss money and power are grouped solely with the public sphere. With regards to "The Reality Effect," the topics essentially make up this eclectic (with some level of order) mixture of words and ideas that theoretically can be used to construct whole novels. In this way, we see many topics that are collections of nouns that seem meaningless - possibly interpreted as residue from realistic portrayals of the world. References to names, times, places without any sort of context seems to be, in essence, the reality effect.

Time Made

4 min read

I think I'm going to approach this a bit more artistically and a bit less scholarly than may be intended, but I can't help myself.

I was most taken with the topics generated from the prescribed settings (50 topics, 1000 iterations, 20 words per topic):

Space Pirates: strap captain narcissa ship chap board time behaviour morgan surgeon immediately body uncle mate cried expence put told banter thomson

time travel, space ships, best friends, beautiful aliens, and witty remarks from the medic.

The next three all have exceptionally good final three words. I wonder how much the order matters to my understanding of the topics, and how random the order is.

Evening Passion: eyes purpose attention voice tears peace stood silence instantly fixed ground soul night distress place led felt length rose equally

two lovers part in a moonlit garden.

Americans Abroad: peregrine pm pickle lord pipes hero commodore gentleman mrs emilia hatchway love trunnion lieutenant jolter painter company view french behaviour

men in double-breasted suits aboard steamers talk of art and war over lunch.

I'm having a lot of fun with these. They remind me of poems without linebreaks. The NY Times has a running column that makes poems out of missed connections postings on Craigslist, which remind me of this. It makes me really want to write found poetry for my experimental bibliography.

I generated two other lists of topics: one simple, and one complex. Both were disappointing.

10 topics, 100 iterations, 10 words per topic:

pastoral epic: time power pleasure present life nature happiness english country thousand

In fact, the simplicity of the settings has led me to the most complex, or at the very least, abstract topics. I have asked the computer to distill novels to their basest forms for me. If I consider novels an imitation or representation of reality, then I am nearly asking a computer to find the meaning of life. That, of course, did not work out so nicely in The Hitchhiker's Guide to the Universe.

20 topics, 2000 iterations, 15 words, no stopwords:

this took 30 mins for the program to complete.

The most interesting topic this:

an ode to ee cummings: the to of i in a it not that but for be have as my Satisfyingly the opposite of the "simple" results, but otherwise too basic, too superficial.

Of course, I could write about the easily labeled topics: church, or voyages, or one topic that was very obviously Pamela. What's the fun in that? Topic modeling effectively takes something sciencey and relieves it of any obligation to be scientific. We take all these data that have been collected in the most absolutely unbiased process and require that they be nearly arbitrarily (certainly subjectively) named, labeled, and sorted.

I've been thinking about applications for topic modeling. Is it practical for telling about large amounts of writing? How could I actually use it in a real situation? Not just by generating lists, I think. But what about connecting the words in the topics to the full information? Could we hyperlink each word to direct back to its appearance(s) in the original text(s)? I'm thinking about something along the model of The Perseus Project. Could we create topic concordances, with links to locations of every instance of the word chosen in the topic? Could we generate statistical metadata, showing frequency, placement, etc? Lastly, could we superscore iterations of topics? Is that already being done by iterating (I don't have a strong enough grip on the actual process)? I'd like to see a super topic model, where only the strongest words remain, only those used over and over and over again, or used from topic to topic to topic. Is that close to my simple settings? What would happen if I asked the program to iterate once, generate one topic, and choose one word? I assume it would choose the most used word in 1760's novels.

...

I did this and returned with "sir". With 2 topics and two words each, I got "time made" and "sir lady." I'm intrigued by time made. I will leave it on this note.

Exercise7

Capitalism, pickles, and a very sad guy named Harry

3 min read

(All topics were produced with stop words removed)

The highlights:

5 topics, 1000 iterations, 5 words, full text corpus

Title: Imperialism. “king people don made war”.

50 topics, 1000 iterations, 5 words, full text corpus

Title: Social mobility. “time made mr house great”.

Title: ??? “peregrine pm mr pickle pipes”.

Title: Novels are EVIL. “author genius book bad devil”.

Title: Harry had a rough day. “harry mr fool quality cried”.

Title: France and England, a History. “war prince french england english”.

Title: The Structure of a Novel. “set hundred laid part story”.

Title: Virtue’s Fear of Flying. “honour heart flie foul happened”.

5 topics, 1000 iterations, 5 words, ONLY the chunks of Pamela.

Title: Sparknotes version of Pamela. “mrs jewkes thought poor thing”

10 topics, 3000 iterations, seven words, full text corpus.

Title: England Over Everything. “king people country power england english time”.

Thoughts:

The most striking thing I’m sure you all will notice about my topics is that they’re exceptionally short when compared to the baseline of 20 that the exercise guidelines outlined as an ideal starting point. I played with several different versions of topic modeling outputs over the course of this assignment, and I found that the five-word topic made the most sense to me and felt like I could draw the most meaning out of it. Thus, five-word topics abound in my highlights section.

Personally I have a tough time drawing a lot of meaning out of these topics, even in my more concise versions. I feel like the algorithm just doesn’t really create topics in a way that allows us to draw powerful conclusions from them, at least in this setting. Maybe there’s something I’m not seeing, but many of the topics just seem terribly incoherent.

I did notice there’s a little bit of Armstrong-y type stuff going on here, specifically with relation to “author genius book bad devil”. This reminded me of our discussions on Pamela and Shamela, and how as the female domestic novel emerged onto the literary scene, it often doubled/masqueraded as/was supposed to be a conduct book of sorts while it helped to shape what we know of today as the novel.

There’s also some imagined communities stuff floating around as well, a prime example being this “king people country power england english time” thing. This topic seems to suggest a unified English identity though the unity of king, people, and country over the course of time. It’s difficult to say whether this topic was drawn primarily from a single text within the corpus or not, but if it wasn’t, this topic gives powerful evidence to support Anderson’s theory that the novel was playing a crucial role in creating these imagined communities.

It’s possible to glean some interpretation from these topics, but as it stands right now, I liked them better for their comedic qualities.

ohwell

Exercise 7: Topic Modeling

3 min read

Full text, 50 Topics, 1000 Iterations, 20 topic words printed:

“Sea Adventure”: [ship men sea water richard captain made capt adventures falconer…]

“Every 18th Century Epistolary Novel”: [lady lord letter dear heart brother love man mother happy poor goodness family…]

Full text, 20 Topics, 200 Iterations, 10 topic words printed:

“Chinese Nationalism”: [country great men people order number laws spirit chinese china]

“Authoritative Power”: [king england war people army english power time duke prince]

“I Think Your “S” Key is Broken”: [fo fuch fhe faid firft fome moft thing thefe muft]

Full text, 10 Topics, 100 Iterations, 10 topic words printed:

“Topic Gone Wrong”: [ihe ed flie ing wife day friend good woman tion]

Chunks, 50 Topics, 1000 Iterations, 20 topic words printed:

“Virtue”: [happy heart love hope dearest gratitude good goodness god generous duty…]

“The Domestic”: [door bed room house night chamber found heard put servants…]

“That’s Depressing…”: [death distress grief melancholy poor soul despair tears unhappy wretched sorrow despair…]

“Authorship”: [author read book history work great books life written stage poet…]

Writing:

It was incredibly satisfying to see the software successfully identify a topic, yet simultaneously hilarious when it failed to identify another. One trend I noticed is that a lot of the same topics appeared in each new iteration. For example, almost every cluster contained a topic filled with words that used the long s. While the software obviously didn’t pick up on the fact that the “f” was meant to be an “s”, it still identified a trend. Similarly, each cluster featured a topic that was complete gibberish, not unlike the “Topic Gone Wrong” that I’ve highlighted above. I’m not entirely sure why this is, but I’m guessing it has something to do with words being cut off by pages, lines, etc.

Some topics were also a little difficult to contextualize, featuring words that, if anything, were only loosely related. Tristam Shandy sometimes feels like a bunch of deviating and unrelated stories/themes that are subsumed into one novel, so I think there’s definitely a connection to be made. Other topics were a bit more linear and consistent, so they didn’t feel quite as reminiscent of Sterne’s writing specifically. Lots of them dealt with goodness/virtue, authoritative powers, the epistolary, adventure—themes that are indicative of most, if not all, of the novels we’ve read this semester.

Generally speaking, my final iteration was the most successful as it picked up perhaps the most specific topics. The words in each topic felt more closely related to one another, making it far easier to give the cluster an appropriate title. It would seem that using chunks from novels is far more effective than using the full texts. I guess that as the texts get smaller and more specific, so do the topics/categories.

Exercise 7 - Topic Modeling

4 min read

Coolest Topics (all with stopwords removed)


Fulltext, 50 Topics, 1000 Iterations (20 topic word printings)

“Pamela in a Nutshell” → mrs sir master good pamela mr dear poor…

“Your Average Domestic Novel” → dear heart lady love father latter good hand hope…

Fulltext, 25 Topics, 100 Iterations (10 topic word printings)

“Time for Army Adventures” → king people war english england general adventures army…

“A Lovely Letter” → love dear letter heart adieu happy happiness moment friendship emily

Chunks, 25 Topics, 100 Iterations (10 topic word printings)

“Adventure = Money” → adventures guinea made make money moment moment sir master give

“Nouns R Important” → lady friend men woman world heart lord thing present happiness

Chunks, 25 Topics, 200 Iterations (10 topic words printed)

“A Brief History of England” → king people england war prince english power army

Chunks, 50 Topics, 1000 Iterations (20 topic word printings)

“A Gentleman’s Handbook” → honour time power thought favour give part liberty

“Intro to English Gov” → king people england duke france kingdom prince queen parliament…

“Probably Plot of Chrysal” → master guinea adventures made directly service general business person

Chunks, 50 Topics, 1000 Iterations (5 topic word printings)

“Nonsense” → de ia le ft la

Most interesting things in fulltext topic modeling: When I did the first run-through of 50 topics, 1000 iterations, and 20 topic word printings, I was struck by how some topics were very specific to one book. The topic I named “Pamela in a Nutshell” was a list which constituted 40% of Pamela. Throughout my experiments with different numbers of topics, iterations, and topic word printings (all using the fulltext folder), I found that it was indeed possible to find genres in some of these topics, and some of them are reminiscent of Tristram Shandy; however, I think it’s hard to evaluate because I’m going off what I know about existing genres to “check” these topics, so there’s no way for me to check the topics related to genre nuances I’m unaware of.

Cool things about chunks of novels topic modeling: It seems like the the chunks of novels we get a higher probability that each topic will more directly correspond with a particular book -- though I think this depends on the topic itself, because some are more general and others become more specific (like the ones about HistoryEngland). One weirdness was a topic that was suddenly very nonsensical (entitled “Nonsense”) that felt like stopwords had edged their way in there. When I looked at the topic more closely, it constituted 92% of Shandy1_22, which, upon further investigation, is a chunk that’s mostly in French -- so the topic confusion makes sense. Going further, it would be so helpful if a next version of this algorithm could scan for different languages within a text and either make a note of that or just create the topics in that language. Luckily, our corpus of works is (almost) all in English, so we don’t require this for our current use of the program, but it would be cool. Very cool.

General observations: The more specific novels have an easier time getting a topic all to themselves; for example, the HistoryEngland doc was represented with topics like “Time for Army Adventures” in the fulltext topic modeling and the topics “A Brief History of England” and “Intro to English Gov” in the chunks of novels topic modeling. All of the topic modeling feels a bit like a reality effect exercise, because it starts by essentially stripping away any possible meanings of the words and simply grouping them with other words they appear close to. This idea that words/objects could be in a novel without a meaning, to just be there, feels like what Barthes was talking about. (Maybe? I don’t have a full grasp on all the details of the reality effect.)

I preferred topic modeling the chunks of novels over modeling whole novel docs because modeling the chunks made it easier for me to understand how topics related to each individual book. Certain docs showed up with topics highly represented (high percentage of words in doc assigned to topic), which tells us that the particular topic in question is very important for that section/chunk of the novel. The other possibility was that one novel dominated the list of docs that included the topic in question, which means that the topic is important throughout the whole book because it shows up in more than one chunk of the novel. I liked being able to compare significances of topics for whole books with sections of those books -- this could be really useful for tracking themes throughout novels, and then throughout the history of novels, but I feel like we’d need a tool that lets us be more precise about we want to do.

Exercise7