Skip to main content

exercise 8

1 min read

I've been thinking about this for some time, and I'm still left scratching my head. I feel very deeply that topic modeling, data analysis, and metadata can be combined to provide interesting insight into old and new questions. But when I try to think about really good uses, I keep drawing a blank. I'm not sure what I want to know about the corpus, and even worse, I'm not sure what my possibilities are for utilizing the digital technology we have.

I fall back on my previous idea of a hyperlinked concordance, but I'm not sure what question I would try to answer with it. I've also been thinking about how it would be interesting to try to reconstruct a text or a corpus using the outputs of the tools--especially topic modeling, but also fusion tables, word clouds, maps, etc. What could we learn about trying to resee the thing we research through the researching tools? Would it be inaccurate nonsense, like when you use an online translator to translate a word into another language and then back to the original? Or might it broaden our perspectives?

Time Made

4 min read

I think I'm going to approach this a bit more artistically and a bit less scholarly than may be intended, but I can't help myself.

I was most taken with the topics generated from the prescribed settings (50 topics, 1000 iterations, 20 words per topic):

Space Pirates: strap captain narcissa ship chap board time behaviour morgan surgeon immediately body uncle mate cried expence put told banter thomson

time travel, space ships, best friends, beautiful aliens, and witty remarks from the medic.

The next three all have exceptionally good final three words. I wonder how much the order matters to my understanding of the topics, and how random the order is.

Evening Passion: eyes purpose attention voice tears peace stood silence instantly fixed ground soul night distress place led felt length rose equally

two lovers part in a moonlit garden.

Americans Abroad: peregrine pm pickle lord pipes hero commodore gentleman mrs emilia hatchway love trunnion lieutenant jolter painter company view french behaviour

men in double-breasted suits aboard steamers talk of art and war over lunch.

I'm having a lot of fun with these. They remind me of poems without linebreaks. The NY Times has a running column that makes poems out of missed connections postings on Craigslist, which remind me of this. It makes me really want to write found poetry for my experimental bibliography.

I generated two other lists of topics: one simple, and one complex. Both were disappointing.

10 topics, 100 iterations, 10 words per topic:

pastoral epic: time power pleasure present life nature happiness english country thousand

In fact, the simplicity of the settings has led me to the most complex, or at the very least, abstract topics. I have asked the computer to distill novels to their basest forms for me. If I consider novels an imitation or representation of reality, then I am nearly asking a computer to find the meaning of life. That, of course, did not work out so nicely in The Hitchhiker's Guide to the Universe.

20 topics, 2000 iterations, 15 words, no stopwords:

this took 30 mins for the program to complete.

The most interesting topic this:

an ode to ee cummings: the to of i in a it not that but for be have as my Satisfyingly the opposite of the "simple" results, but otherwise too basic, too superficial.

Of course, I could write about the easily labeled topics: church, or voyages, or one topic that was very obviously Pamela. What's the fun in that? Topic modeling effectively takes something sciencey and relieves it of any obligation to be scientific. We take all these data that have been collected in the most absolutely unbiased process and require that they be nearly arbitrarily (certainly subjectively) named, labeled, and sorted.

I've been thinking about applications for topic modeling. Is it practical for telling about large amounts of writing? How could I actually use it in a real situation? Not just by generating lists, I think. But what about connecting the words in the topics to the full information? Could we hyperlink each word to direct back to its appearance(s) in the original text(s)? I'm thinking about something along the model of The Perseus Project. Could we create topic concordances, with links to locations of every instance of the word chosen in the topic? Could we generate statistical metadata, showing frequency, placement, etc? Lastly, could we superscore iterations of topics? Is that already being done by iterating (I don't have a strong enough grip on the actual process)? I'd like to see a super topic model, where only the strongest words remain, only those used over and over and over again, or used from topic to topic to topic. Is that close to my simple settings? What would happen if I asked the program to iterate once, generate one topic, and choose one word? I assume it would choose the most used word in 1760's novels.


I did this and returned with "sir". With 2 topics and two words each, I got "time made" and "sir lady." I'm intrigued by time made. I will leave it on this note.


Hark, a Sylph

1 min read

I'd like to try my hand at a Kate Beaton-esque sequential art work.

For example:

I'd like to inject a little humor into the description/bibliography of a novel, and I'd like to make it less stuffy by turning it into a sketchy comic. I'm a little worried that I won't be as funny as Kate Beaton, and I'm also worried that I won't accurately represent my novel, The Sylph, because I haven't actually read it. What I can tell about it, though, is that it is a story of a pretty naive girl just married into high society who receives mysterious and helpful letters from a woman who signs herself only as "The Sylph." I think I can do something with that, especially in light of the previous epistolary novels we've read, perhaps using satirical pastiche elements of Northanger Abbey as a guide.

Like many of Beaton's comics, I plan to draw four strips, each 3 cels long.

Descriptive Bibliography

2 min read

Devonshire, Georgiana Cavendish, Duchess of (1757-1806) The sylph; a novel. In two volumes. London: T. Lowndes, 1779. 1st edition.

THE| SYLPH;| A| NOVEL.| IN TWO VOLUMES.| "Ye Sylphs and Sylphids, to your chief give ear,|"Fays, Fairies, Genii, Elves, and Demons, hear!|"Ye know the ſpheres, and various taſsks aſſign'd|"By laws eternal to th' aërial kind:|"Some in the fields of pureſt aether play,|"And baſk, and whiten, in the blaze of day;|"Some guide of the courſe of wand'ring orbs on high,|"Or roll the planets thro' the boundleſs ſky:|"Our humbler province is to tend the Fair,|"Not a leſs pleaſing, nor leſs glorious care."| Pope's Rape of the Lock.| VOL I.|[printer's ornament]|LONDON:|PRINTED FOR T. LOWNDES, No. 77, FLEET-STREET.|MDCCLXXIX.|

I 264p; II 215p. 12mo; VI B-M, VII B-K; Both volumes do not have an A gathering, both vol.'s final gatherings are missing pages: vI, M12v; vII, K12v and K12r (perhaps blanks excised in the digital facsimile).


Vol I: ir title. iv blank. B-M9v text. M10r-M12r advertisement. Vol II: ir title. B-K4v text. K5r-K11v advertisement.


Sourced from Harvard University Houghton Library. Digital facsimile retrieved from Eighteenth Century Collections Online. 2nd edition consulted, sourced from The Library of the University of Michigan. Digital facsimile retrieved from HathiTrust Digital Library. 1st edition: Blank page after title has stamp reading HARVARD COLLEGE LIRARY|THE GIFT OF|FRIENDS OF THE LIBRARY| followed by indedcipherable pen marks and notation of author and call number.

Keeping on Trying

2 min read

I wrote an entire post and then lost it when I tried to insert some graphs. I'm going to take this opportunity to summarize what I wrote, since my findings were interesting enough, but not particularly different from what I expect everyone else found. I did write a sentence with the word "stymying" in it, which was worth re-typing. I'm linking my fusion tables here:

Basically, after some wrangling, google fusion provided me with some pretty cool tools. I was especially appreciative of the bar graph with filters. The pie chart and word cloud (not fusion, I know) were satisfyingly visual but provided me with little more than the most basic information. A lot of this exercise had that problem, really. It was hard to draw conclusions with the gaps in data (is this because there weren't a lot of data, or is it because not enough data were compiled?). How can you make conclusions about trends in narrative over time if the amount of books published steadily increases over time? I know this would require only a small bit of statistics, but I don't have that at my disposal.

I thought the network graph might be cool. I couldn't figure out how to use it, but I thought it might be able to do things I wanted, like look at the relationships between publishing location and publishing date, or location and narrative form. Visualizing data can be so helpful, or it can be nearly superfluous. It's depends both on the visualization tools and on the type and amount of data collected.


Words Words Words

6 min read

There were some very clear trends that could be seen even in this small sampling of a year of novels. Mainly anonymous authorship, though the notes on many specified that the novels were written by “a lady.” The majority of the novels published were about ladies, in fact. I thought a pretty good example was an anonymously written novel whose subtitle was “A test of Virtue.” That seems to have been the basic premise of all stories about women. A critic wrote of this one, “the same story might have been told more agreeably by the same writer in a smaller compass. It is something, however, in a modern novel, to find half of it worth reading.” That was pretty funny. It’s nice to find that I’m not the only one who finds the repetitive scenes of dramatic virtue to be excessive. Certainly Austen, later on, agrees, much to my satisfaction.

The novels that were not written about women (these can be characterized as stories about marriage, I think) were about men, and the overwhelming majority of them were stories about adventures. It makes me think, if women are looking for marriage, what are the men doing who are marrying them? They can’t be on adventures and they won’t go on adventures after marrying, so this means they must have gone on adventures already. Which means that their stories have already concluded by the time the women’s begin.

Then I got a little carried away by the pricing of the books. Generally they went for between 5-7 shillings, more when bound rather than sewed. Evelina, bound, went for a premium of 9s. This means a book cost between one quarter to nearly half of a skilled worker’s weekly wages. That’s saying that if someone now who worked 5o hours a week at McDonald’s for $9 per hour (their current wage) wanted to buy a book, they’d be paying between roughly between one and two hundred dollars for the book. This calculation may even be inaccurately low. A construction worker (probably more analogous with a worker from the 18th century) is paid around $15/hr. This ups the price of the book to between roughly $150 and $300. In a way, it makes sense that there was so much fuss about why the book was a worthy read: it wasn’t just a use of your time, but of a huge amount of your money. As much as a frivolous tale may be fun to read, a morally enriching story that will improve your personal character is a much more worthy to buy.

I think the best questions you can ask that will be answered by this bibliography would be about what the public wanted to read. Was it epistolary novels? Stories of suspense and adventure? Stories of suspense and marriage? Stories told by women? How much were these stories worth to them? How much did they cost? From these answers we can then ask, why? Why did they want this, and why did they buy this?

Next, I looked at three of the books listed. The Travels of Hildebrand Bowman into Carnovirria, Taupiniera, Olfactaria, and Auditante, in New-Zealand was very Robinson Crusoe-esque. It is written by the protagonist, or so we are told. From what I can gather from some initial skimming, Bowman is left on a savage land by himself, and he makes friends with an uncivilized but helpful native. The entire piece seems to be part of a very slightly earlier style. It has an extensive table of contents with chapter summaries, for example. The Memoirs of the Countess D’Anois, written by Henriette Julie de Castelnau Murat, was in some ways comparable. It unexpectedly aligned more with an archetypal male adventure novel than a female domestic novel. All of it is written at the end of the Countess’ life, rather than at the beginning. It uses the style of claiming to be written by the main character. It also has an interesting letter to the reader claiming that the book is a justification of her life and a defense against what seems to be accusations of coquetry. It seems to me to be almost the opposite of the editorial prefacing letters in domestic novels which attest to the moral enrichment the story demonstrates. The final novel I was able to find a copy of was The Unfortunate Union: or, The Test of Virtue. A story founded on facts, and calculated to promote the cause of virtue in younger minds. It has the very long and explanatory (as well as moral-enrichment-claiming) title. Beyond the title though, there is no long preface, no note from editors or protagonist. After the title page, the novel simply begins. It also uses an epistolary format, unlike Crusoe but just like Evelina. It is astonishingly similar, in fact, to Evelina. In the first few pages I found the family name Villars referred to, as well as multiple locations often visited in Evelina, including Ranelagh. I have no idea of these are coincidences or what this means exactly.

It is hard to really draw any conclusions from these searches, because I know so little about the entire novels or the novels contemporary with them. I would be uncomfortable to outline many trends, though I do think we’ve already established that a simplification of title and preamble is growing. As for Evelina’s place among these books, it seems more refined and contemporary than the first two, hard to say about the nearly identical-seeming Unfortunate Union.

The ARTEMIS tool is actually very cool. I’m sure that it would prove frustrating if I was doing more focused research, but to get a feel for what types of novels were published in the span of thirty years, it’s great. The visual aides really help. There’s a nice little histogram of amount of books published per decade, and then term frequency makes cool graphs just like voyant, but for a lot of novels rather than one. I like that there’s a popularity option, rather than only looking at the search term by frequency. The term clusters were good for browsing a large body of search results, but it seems to (understandably) miss out on a lot of books and content. As we know from before, OCR is full of errors and inabilities.

Assignment 3: Voyant

3 min read

What a cool program voyant is. My first experiences with the word cloud alone brought up some interesting stuff. The two high-frequency words that really caught my eye were "should" (676) and "master" (627). Both imply some form of submission and obligation. This is pretty surface level, but i'd still like to point out the dissonance between the images of Pamela and the words in it. Pamela resists her master, refuses to bend to Mrs. Jewkes, and in general sticks to her guns against immense pressure. Yet the most prevalent words used to tell her story feature a focus on her servility and subjection. Pamela is so praised because she serves her master as well as she can, though this includes defying him. Pamela is a completely obedient character: following her parents' wishes, obeying the writ of the land in social terms (for example, refusing to enter into anything that would make her seem of a higher class, such as wearing nice clothes, as well as refusing Mr. B's advances), going so far as asking, but never taking (her constant request to be sent home, but never any actual attempt to go home when she still could). Pamela is not defiant. She is pious, so much so that her acts of "rebellion" (i.e., refusing Mr. B's advances) are in fact acts of compliance with a more righteous guideline.

There was a lot to find with voyant, beyond that first revealing glance. I edited stopwords (mostly taking out names) for the frequency cloud and became fascinated by "good" (855) and "poor" (534). Perhaps because the words were similar in size (frequency) and shape ("oo"). But of course, there is a definite correlation between the two in the novel. Poor Pamela and her poor parents are the ultimate good characters.

My last interesting observation came from looking at a word from a scene and its graph through the novel. I looked at the second attempted rape scene, when Pamela is held down between Mrs. Jewkes and Mr. B. I graphed the frequency throughout the novel of "wicked"(128). It spikes at the two rape scenes, and drops sharply after book two. Then I looked at the names I remembered Mr. B calling Pamela: sauce-box, hussy, and slut. I had expected to find a much higher frequency of these words, because he verbally abused her so much more than she him, but all clock in under 20. What does this mean? Is it because Pamela is narrating, refers to Mr. B more often than to herself (via him)? Is 128 high frequency? Does this entire investigation yield anything? I liked finding through voyant that the big picture I got from the book often didn't line up with the micro picture afforded by voyant.


Assignment 2: Maps

2 min read

The map of Crusoe's journey is vastly different from the map of the locations Crusoe has mentioned. Even when cleaned up, so that old names now have their updated ones, and small Ohio towns do not take the mantle of European cities, this map has far flung locations, from America to Egypt. These are evidence of the extent of places Crusoe is comfortable and capable of referencing. Then, there is a dense mass of locations bristling through Western Europe. I saw this and understood why so many locations were there, but it made me cross, because that density did not in fact denote importance. Crusoe spent only the very beginning and the very end there, but, especially in the case of the end, he hit so many locations because he was on a journey. The most important location of all is an island that is never even named, and even if it were, would show up only once on my map.

I don't find this map particularly informative, because it isn't mining the right kind of data. Some of the places mentioned, Crusoe never sets foot on. Most, he appears in only briefly. The crux of the novel is in a small, unnamed space. For in fact, Robinson Crusoe is not a novel of a journey, if even a novel of an adventure. It is a tale of survival, but a trapped survival, which by definition does not span great distances. I imagine two possible informative maps: the first, of the world and Crusoe's journey through it, but putting importance (via colors or thickness of line) on duration of time spent in each place. The second, a magnification of the island. I would like to see where he was shipwrecked, where he built his tent, where his cave, his summer bower. Where were the goats penned? Where did he anchor his canoes? Where was the mainland, and where the cannibal ritual spot? These would be informative, and because they are fictional, cannot be plotted by google maps.