Skip to main content

Research Question!

1 min read

Suppose I give you a book but don’t tell you the title or what kind of book it is supposed to be. Could you figure out the genre just by reading it? Probably; we all have some sense of what makes a fantasy novel different from an autobiography. I am curious if a computer could similarly distinguish different genres of writing - specifically, could one use topic modeling to classify a book? Suppose we develop some algorithm to somehow rate the “genre-similarity” of a huge literary corpus. What genres would we find there? Would they match up with the genres we are familiar with, or would the algorithm group together what we would consider very different novels? I imagine that we would find weird clusters of books, since this is essentially topic modeling but on a larger scale, and the topics we discovered when working with topic modeling were sometimes humanly coherent and sometimes not. But to see the final product of this type of analysis - some big picture of a huge number of literary works, grouped together by some features of language, I think would be pretty cool.


Descriptive Bibliography Exercise

1 min read

Thistlethwaite, James. The Man of Experience. London, 1778.


Vol. I 263p.; Vol. II 250p; 12mo.

Contents. Volume 1. A1r half title, A1v title page, A2r-3v dedication, B1r-M6v text. Volume 2. A1r half title, A1r title page, B1r-M5v text, M6r-v, advertisement.

Notes. Sourced from the British Library. Digital facsimile obtained from ECCO database.

Metadata Analysis

2 min read

After playing around with the data for a bit, I became curious about the relationship between the dates of publication and other aspects of the data. There seems to be a spike in the production of novels somewhere around 1770 - in contrast the first 20 years of the 1700's seemed to be a slow season for novels (although many people at that time might not have even considered what they were reading to be a different literary genre altogether, as we have discussed in class). I keyworded "translate" to see what the publication date trend among translated novels looks like, and noticed that there is a spike somewhat earlier, sometime around 1750. The two main languages of translation seem to be French and Spanish (although mostly French). When I keyworded "french" and "spanish" I noticed that there seem to be specific moments at which many novels are being published in translation. Perhaps this reflects the political climate at the time, since it is conceivable that there were moments in history when it was less appropriate to publish a work that was originally in French (or Spanish). It would be nice to have data that could account for versions of a text that weren't originally published in English, for there are a number of texts in this database that were published in English much later than when they originally appeared in French or Spanish. This would let us focus in on a specific group of multinational texts and look at how they were modified as they changed from language to language.


Exercise 4 - Tristram Shandy in the OCR

2 min read

I used ABBYY Pro 12 to look at a couple pages of Chaptter VIII, which I choose because of the various odd typographical things going on there. As expected, the software consistently mistook the old fashion character for "s" as an "f" and stripped down some of the formatting of the original text (the word "HOBBY-HORSES" in the text becomes a confused "HOBBY HoRses"). Also the dashes are a mess in the file I extracted, although I can't say much for the original text either... Overall though the OCR did a good job of capturing the words on the page. I get the impression that the OCR doesn't use any sort of word-predictive algorithm like Google Search or iPhones use, which would definitely help catch those words that are clearly not correct.

I must admit now that seeing Tristram Shandy reduced to an unformatted and misspelt word document in Times New Roman, font size 55, made me sad. It just seemed so out of context. Even if all the errors were corrected I think something wouldn't feel right. The broader question here, I think, is how do we experience a novel like Tristram Shandy? The obvious answer is that we read it - we see words on the page, make connections between them and the story and our lives, and come to some sort of understanding about their meaning. But the experience of reading is as much textual as is visual and tactual. The way words are presented to us has a profound impact on the way in which we interpret them, and any sort of translation technology will affect the way we think about whatever it is that we're reading. Just consider for a moment the two black pages in the Oxford Edition of Tristram Shandy. Obviously turning the page unsuspectingly into that black abyss is a much different experience than scrolling down casually into those same pages. The effect is lost on the reader and it just looks silly. Any transformation of a text from one medium to another must keep this in mind; the same tricks simply won't work, and the work has to be adapted to make sense in the new medium.


The Voyant Voyage

2 min read

This program is really cool. I am impressed by how easy it is to use, and by quickly it can convert a 500 - something page novel into a format that is super simple to analyze. First I had to confirm my suspicion that the word "sex" never once refers to the physical act, but instead refers to some particular gender (more often than not Pamela's, but occasionally the male sex is referred to as well). This continues to surprise me, seeing as Pamela's discussions with her parents about her chastity, virtue, and other such matters are extensive and occur throughout the course of the novel. In a similar vein, I also confirmed my suspicion that the word "rape" never once occurs in the novel, although as we have pointed out in class there are many moments in which we would say (as modern readers) that Mr. B attempts to rape Pamela. When one reads a novel like Pamela and finds a lack of profanity (e.g., the word "damn" is apparently bad enough to be dashed out) and a general prudishness regarding sexual matters (i.e., no one ever explicitly says what they're talking about), one might think for a moment that this must mean that the world these characters inhabit is less profane and less sex crazed than our own profane and sex crazed society. But that's nonsense. This novel is absolutely sex crazed - Mr. B haunts Pamela's presence throughout the entire story, and his ultimate desire (to have sex with her) is achieved by the end of the novel via their marriage. There are certainly differences between their world and ours, but it is important for us to realize that the language they use (or don't use) can be misleading- only if we don't think deeply about what is really going on.

Who we write for, where we write from

2 min read

Seeing half of Africa called Negroland reminded me of a teacher I might have had in elementary school who thought Africa was a country. But that’s an aside. Not surprisingly, I failed to find a single specific location in Africa on the map of places the NER found in Robinson Crusoe. We have Senegal, Guinea, Gambia, Morocco and Egypt for countries, and then Madeira and the Canary Islands, but that’s about it. This is only slightly more specific than the map of the world (Morocco doesn’t appear on the map, for instance), so it seems reasonable that Defoe had some map like this one in the back of his mind as he was writing his story. The only specific location that Google Maps thought was in Africa was Winterton, but in the text Defoe is referring to Winterton Ness, which is a location in Europe. Many of the other locations Google Maps couldn’t pick out were alternative spellings of places in Europe (Sallee and Languedoc for instance). All of these observations make sense in the context of our discussion on Wednesday, when we talked about the nature of the descriptions Crusoe gives in the novel. The specific is exclusively European, so any details we receive about the world he inhabits are bound to defer to the European experience (at the expense of, well, everyone else’s).

"He told me, I might judge of the happiness of this [middle] state; by this one thing, viz., that this was the state of life which all people envied..." pg. 6