Skip to main content

Exercise 8

3 min read

One thing I'd like to use topic modeling and metadata for is to look for a shift from religious communities/dynastic realms and messianic time to nationalism and homogenous time (as described in Anderson's Imagined Communities. To do this, I think it would be best to split a corpus of eighteenth century novels into groups based on their publication date. One suggested grouping is 1700-1750 and 1751-1799, although one could also create more than two groups by making the year ranges smaller (for example, 1700-1733, 1734-1766, 1767-1799). However, it is possible that the later dates may contain a larger number of novels, in which case it might be better to have uneven groupings so that the split of novels between groups is not too disproportionate--though a large time range could obscure when the shift took place. It might be preferable to run this multiple times with different groupings to see how the groupings affect the results.

To identify the shift, I think one thing to look at would be to see whether there are any topics related to imagined communities: that is, any topic where an imagined community (a community where the constituents don't interact with all other constituents) is associated with descriptive words or traits. One particularly useful thing to look for would be the association of country names with other words. Topic modeling could also be used to look for simultaneity in novels, which Anderson argues enables the concept of nationalism and imagined communities. Here it would be helpful to look to see if topic modeling can help identify instances/trends of simultaneity (where characters are acting separately but at the same time). (Topic modeling may not be the best tool for this, but I'm not sure what would be. Perhaps it would be better to look at the frequency of words like "meanwhile".) It would still be interesting to see if words like "meanwhile" show up in topic modeling, and what words they are associated with if they do. Another thing to look for that doesn't require topic modeling is to see how time is described across eighteenth-century novels. Anderson discusses a shift from sacred time to modern time, and this shift could be identified through a rise of the usage of standardized or clock time. Anderson also talks about the newspaper's role in enabling the new sense of time/imagined community, so it might be worth looking at the mention of newspapers (frequency-wise) or seeing if the word "newspaper" appears in the topic modeling at all. If the word "newspaper" does appear in the topic modeling, it would be interesting to see what words the newspaper is associated with.


Exercise 7

5 min read

50 topics, 1000 iterations, 20 printed words, stop words removed

feelings and science

  1. ear mother devil captain fair er graces bath spirit doctor air playing hall delight vii theatrical water ne specimen dancing

a story of nature and humans

  1. nature great years soul eyes prince eye ground light twenty book fear back human turn earth friends city family caught

a pilgrimage

  1. kind thy thou eye turning divine appointed held drew bore glory ad noble grief preface alas worthy weak break en

10 topics, 1000 iterations, 20 printed words, stop words removed

a war between england and france

  1. king people war english england time men army great country power kingdom duke prince queen france nation began french general

love and family

  1. time miss made mr lady manner love make found great told person young thought brother replied house passion woman part

50 topics, 1000 iterations, 20 printed words, no stop words removed

love letters

  1. i and which you or your for am letter of upon page as any love are our it is from


  1. knight sir launcelot mr you your crowe squire captain clarke aurelia crabshaw greaves adventurer being darnel dolly t ye justice

50 topics, 500 iterations, 20 printed words, stop words removed

young love

  1. love happy life find man give men fine hear day tis beauty young ill dear er charms delight live formed

the court

  1. country service public making man gave gentleman pay appearance lordship company thy character fortune conduct manner interest laws care thee


  1. fo ship men sea fome made richard feveral found capt falconer water indians adventures god boat board ifland shore captain

Reading through these topics reminded me very much of Tristram Shandy. The variation of words in a topic list seemed similar to Tristram Shandy’s tangents and detours—although Tristram Shandy guided the reader through these tangents rather smoothly while these topics are lists of rather disjointed words. Without someone to explain their connectedness, it can be rather difficult to find an overarching theme. I doubt some of my labels do the topics justice. I should note that not all of the topics were this disjointed; some of them seemed to fit together rather well.

I thought I’d try to connect this to Armstrong’s _Desire and Domestic Fiction_, but surprisingly the topics (out of the ten I listed here) I found dealing with writing, subjectivity, and love aren’t necessarily gendered. Topic five has mentions both men and women while topic six does not mention a gender. Topic eight mentions only men—though with the combination of the two words “find man”, it could be about a woman looking for a man to marry or fall in love with. I feel like this case is harder to make when “woman” or “women” isn’t mentioned in the topic, though.

It could be that I also skipped over some topics that might have connected subjectivity to gender. There were quite a few.

(Okay, I went back and looked. One is [lady great time letter company found conversation behaviour woman immediately order mistress satisfaction love room acquaintance town obliged proper london], and another is [good made thought great make time give person reason woman put answer knew pleased long occasion words heard head leave]. So they're there, I just didn't include them in my ten.)

I think I can understand what is meant when it is said that topic modeling can make the novel seem like all reality effect. Topic modeling gives list of words, frequently including nouns (there are even more nouns with stopwords removed) that sometimes seem to have no connection to each other. They’re lists of words with no context to explain their presence. They’re just there. I think it’s this lack of context that makes the novel seem like all reality effect. Without any context or connection to each other, these words just… exist. And like the reality effect, words or nouns that don’t have a deeper meaning or connection to anything else—words that just exist—are there only to say that they are real. The reality effect is definitely at play here. One thing I think can diminish the reality effect in topic modeling is to find connection or meaning between the words listed in each topic. Sometimes the words seem like they’re just “there”, but other times it’s easier to find a connection to between them, or one gets the feeling that the words in this topic do have a connection but one hasn’t found it yet. For example, topics four and six in my list seem to have a straightforward connection. In this case, we don’t really see the reality effect as much because the words do have another meaning and they’re not there only because they exist. I think another thing about topic modeling that makes the reality effect more prevalent is that it takes words out of context. It could be that the words in this list do have other meanings or multiple meanings in the novels they appear, but without context we don’t really know what they are. These words might do more than just exist in the novels they appear in.


Experimental bibliography

2 min read

The novel I chose for my bibliography project is The Sylph, by Georgiana Cavendish. The Sylph is an epistolary novel about a woman (Julia Grenville) who has married into high society and moved to London. She becomes disillusioned, and she begins receiving letters of advice from someone called "the sylph". I haven't read the book yet (I skimmed some pages), so I looked up what it was about online. With my experimental bibliography, I'd like to represent some aspect of the content of the book. Though the traditional bibliography doesn't completely capture the physicality of a book, it doesn't quite capture the content either, and I think the content is more important. There are a few things here to think about: the advice Julia gets from the Sylph, the Sylph's anonymity (until the end, presumably), the lowered moral standards Julia apparently finds in the new society she enters. Given the anonymity of the sylph, the event of a masquerade in the book, Julia's disillusionment with London society, and the use of trickery and keeping secrets (though I'm not sure how prevalent deception is in the book; I just know it's there), I think it would be nice to create a mask to represent the novel, or perhaps multiple masks if I wanted to represent a different thing with each mask. One idea is to create a mask by using its text, and cover a mask with text from the book--though this might imply that the novel itself is a mask, and I'm not really sure what that means. Another idea is to design multiple masks based off different characters or concepts in the novel. Some characters in the novel actually wore masks, so in a way some masks could be seen as items or props from the novel.


Descriptive Bibliography

2 min read

Devonshire, Georgiana Cavendish, Duchess of. The Sylph; a novel. In two volumes. London: printed for T. Lowndes, No. 77, Fleet-Street, MDCCLXXIX., 1779.

| THE | SYLPH; | A | NOVEL. | IN TWO VOLUMES. | "Ye Sylphs and Sylphids, to your chief give ear, | "Faes, Fairies, Genii, Elves, and Demons, hear! | "Ye know the spheres, and various talks assign'd | "By laws eternal to th' aerial kind: | "Some in the fields of purest water play, | "And bask, and whiten, in the blaze of day; | "Some guide the course of wand'ring orbs on high, | "Or roll the planets thro' the boundless sky: | "Our humbler province is to tend the Fair, | "Not a less pleasing, nor less glorious care." | POPE'S Rape of the Lock. | VOL. I. | [Ornament, size unknown (as a digital facsimile was consulted), looks like a Celtic knot] | LONDON: PRINTED FOR T. LOWNDES, NO. 77, FLEET-STREET. | MDCCLXXIX.


Vol. I, 264 p., Vol. II, 215 p., 12mo; Vol. I, A1, B1-B12, C1-C12, D1-D12, E1-E12, F1-F12, G1-G12, H1-H12, I1-I12, K1-K12, L1-L12, M1-M11; Vol. II, A1, B1-B12, C1-C12, D1-D12, E1-E12, F1-F12, G1-G12, H1-H12, I1-I12, K1-K12


Vol. I: A1r title, A1v blank, B1r-M9v text, M10-M11 advertisements; Vol. II: A1r title, B1-K4v text, K5-K12 advertisements


Epistolary. Source location is Harvard University Houghton Library. Retrieved from Eighteenth Century Collections Online. In both volumes, there appears to be no J gathering. A1v of Vol. I has a stamp and writing on it. The stamp reads | HARVARD COLLEGE LIBRARY | THE GIFT OF | FRIENDS OF THE LIBRARY . The writing reads | By Georgiana, duchess of | Devonshire . There are also some numbers written in the bottom right corner; they read | 7415 | 41-141 | 242 . Other writing on the page is illegible. In Vol. II, A1v is not shown in the digital facsimile. M12 of Vol. 1 is either not present or not shown.


Assignment 6

One thing I learned from the publication date bar graph is that most of the novels in the collection were published after 1740. This could be because either more novels were published after that date or because fewer of the novels published before 1740 were collected. The narrative form pie chart was also really interesting. The top three forms seem to be “third person”, “epistolary", and “first person”. This suggests that the epistolary form was popular, as was suggested before in previous exercises.

After that, I used a word cloud to look at the titles of the novels. Frequent words seem to be “adventures”, “history”, “volumes”, “Miss”, “Lady”, “Written”, “Memoirs”, “Vol”, “Edition”, “Life”, “Spy" and “Travels”. The words “Miss” and “Lady” suggest a female protagonist, while the words “history”, “life”, and “memoirs” suggest that the novel focuses on the protagonist’s private life. It would be unsurprising if these latter words were also associated with a female protagonist, as one of the popular novel forms seen so far seems to be about a lady’s inner or private life or thoughts. Back to the list of frequent words, the word “written” is rather interesting. One thing it could mean is that author anonymity is becoming less and less frequent. It could mean instead that the title page insists the novel is written by the protagonist, or that the novel is written by a lady (who may remain unnamed). It would be interesting to go back and look whether the word “written” corresponds to a named author or an anonymous one. One of the words that surprised me was “spy”, as it doesn’t seem suited to an epistolary novel or necessarily to an adventure novel. It could be that novels about spies are also becoming popular, although I don’t think it’s a subject/genre we’ve really looked at before.

Exercise 5

3 min read

Many of the novels seem to focus on the subjective world of a single individual. Many of the novels have female main characters, although some focus on a male character. The epistolary form also seems to be popular. Some of the novels state that they are epistolary novels in their title—some will use the word “letters” in their title. One example is in “Letters from Henrietta to Morvina”. Most of the titles tend to less than three or four lines long. While quite a few of the titles tend to include the character’s name (like “Memoirs of the Countess D’Anois”, others are more vague (like “The Sylph” and “The Rival Friends; or, the Noble Recluse”. These latter titles seem a bit more like modern book titles. Some of the novels also emphasize their usefulness for teaching virtue. One thing that interested me is that some of the novel’s were about an upper class person and said so in the title, like the example of Countess D’Anois. Neither Evelina nor Pamela were born into upper class society; so this change was interesting to see. I wonder if there’s a shift in reading about less common people or if it’s still a remnant from before. Evelina’s title take up the majority of the title page, but the letters are large and well-spaced, making it easy to read. The volume and edition number are smaller and more crammed than the title, but still easy to read. The smallest and most crammed words are the publisher information, but those are still easy to read. There is no author information on this page. Here it seems that the title takes up the most importance. “The Sylph” also has a large title and small publisher information, but it also has a quote and an illustration. Here it seems important that the title page is aesthetically pleasing, and the quote must have been chosen to set up or prepare for the novel itself. “Letters from Henrietta” is styled much similarly to “Evelina”. It seems that the aesthetics of the title page might be becoming more important, judging from “The Sylph”. When using ARTEMIS, I decided to look at books from 1770-1800. Both “history” and “epistolary novel” are frequently occurring words, so there seems to be a trend of focusing on an individual’s private life. “Lady” is also a prominent word while “gentleman” is not (although “King” is). This suggests that the novels tend to have female protagonists. “Author” is also a prominent word, but not as prominent as one would expect when one considers that every novel has an author. The word “history” is twice as frequent as “author”. It could be that the word “history” is mentioned more than once in the book, or that quite a few of the authors prefer to remain anonymous. It seems likely to be a combination of both .


Assignment 4

1 min read

For this assignment, I chose to OCR chapter 20 of Tristram Shandy. I wanted to see how the software would deal with images that contained French text as well as English. I used a few different OCR softwares for this assignment: the demo versions of FineReader and Prizmo and a couple of free ones I found online. Unfortunately FineReader only let me export the first three pages of chapter 20 and the online free ones only did the first page or so. Prizmo did OCR the French text, but I have nothing to compare it to.

The most common mistake in the OCR versions of the text was that sometimes the letter s was replaced with the letter f or the letter l. I think it's because of the typeface used for s in the images. The FineReader version also messed up on the punctuation a couple of times, inserting ^ and superscripted cc and ee. The Prizmo version, when converting the French text, sometimes replaced letters with symbols like ~ and <<. It also sometimes left out individual letters. This was done when the language was set to French, as the program wasn't good at catching accents when the language was sent to English.

One mistake that could be autocorrected is definitely the use of f when s should have been used. It shouldn't be hard to check when placing an s instead of an f in a word makes more sense.

FineReader did a pretty good job of preserving the spacing and font of the original text, but with the online free OCR software and Prizmo, these aspects of the text were lost.

Even if the spacing and font of the original text is preserved, there's a notable difference between looking at an electronic version of a text (whether it's machine-readable or a digital facsimile) than a physical copy of the book. The former has to be looked at on a screen while the latter can be held and paged through and physically interacted with. I think the most significant thing lost when a physical text is converted to an electronic text is the ability to write on it. Obviously one wouldn't want to write on an original version of the text, but I find that writing on a physical modern edition of the text (underlining parts, making comments and questions) helps me to better understand it. Sometimes I think writing on a text is pretty vital in helping me comprehend it. When a text is converted to an electronic copy, this process is lost or becomes more difficult to do. Another difference between the two is that some people may not want to stare at a screen for long periods of time.


Assignment 3: Dialogue, Silence, and Writing

When I first glanced at the word cloud, I saw that the most frequent words were pretty generic and weren't really surprising. This included words like "and", "you", "the", "my", "me", "to", "he", "of", "said", "a", "so". The frequency of "me" and "my" do show the importance of the first person (and Pamela's voice) in this novel, but that's already known because Pamela is, after all, a series of letters. The word "said" might be the most interesting out of this generic list, showing the frequency of dialogue (or references to dialogue). A great part of the letters consists of Pamela's account of the events that occurred and her interactions with other people. I decided to compare the occurrence of this word with the occurrence of the word silent (somewhat its opposite).

While "said" is used quite frequently in the novel, neither "silent" nor its variations seem to appear at all. I find this very interesting. It seems the narrator doesn't think silence important enough to mention. In a way, there is always something being said, even if there aren't any characters speaking at the moment. As Pamela writes, she is speaking to the reader--there is no silence anywhere.

I then decided to compare "said" to "write", as writing plays a significant role in the novel as well. The graph is included in the post.

Surprisingly, the word "write" and its variations did not appear very frequently in the novel, at least when compared to "said". More attention seems to be focused on the dialogue, but I thought writing would be important enough to appear more frequently.

Assignment 2

2 min read

It was really interesting to see the locations visited or mentioned in Robinson Crusoe spread out on a map. Like other people have pointed out, the map has many errors (like placing the Amazon in the United States). Europe seems to have the largest cluster of locations, while North America seems to have the second most. The locations in the second group are all errors; the novel doesn't mention any North American locations. (America might pass as not being an error but, oddly enough, the map placed it in the middle of the United States when in the novel Robinson Crusoe does refer to South America.) It should be noted that I also didn't purge my list of erroneous locations, so that added to the confusion. For example, St. Augustine in the novel refers to a monastery and not the city. One thing I found really interesting (which was also one of the causes of the errors) was the length of time between when the novel was published and now, the time in which we are reading and analyzing it. In this assignment, we're taking locations from 1719 and matching them to places in 2015. That's 296 years of change. Though the land/geography itself hasn't changed over the years, the human cities certainly must have. For example, though they occupy the same location, 1719 Madrid is not 2015 Madrid. This large gap in time is partially what allows for some of the errors: North America in Robinson Crusoe was pretty much unlabeled, while now many more places in North America have names. These new names include copycats which enable errors saying that Yorkshire is in the United States.


Assignment 1: Times in Robinson Crusoe

2 min read

I thought that the 'times' list generated by the NER program was rather interesting. The list of unique times gives only "night, last night, evening, afternoon, this morning, morning, this night, midnight, and two hours before midnight". This list seems to be pretty accurate as far as I can tell, though I think it left out times like "within half-an-hour of sunset" and "a little after sunset". The program might have left out other times as well, but I'm not sure if it did or how many. One interesting thing about this list is that the times are a bit vague except for the last two. I think one likely possibility is because the speaker did not know the specific time, or it could be that he did not need to be more specific than that. An interesting thing to do would be to backtrack and see when the specific times were used. The unique list does lose some information that the non-unique list has; namely the frequency with which the times are used. Looking over the list, it looks like night and evening are used quite a bit, and less frequently, morning.

Another interesting thing I noticed is that the program sorts 'Heaven' into the 'Organization' list while it does not seem to be in the 'Location' list. I suppose Heaven could be seen as an organization by referring to God and the angels, but it could also be seen as the place in which they live.


p. 23

1 min read

"It was my lot first of all to fall into pretty good company in London, which does not always happen to such loose and unguided young fellows as I then was; the devil generally not omitting to lay some snare for them very early."