Deprecated: __autoload() is deprecated, use spl_autoload_register() instead in /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php on line 17

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 22

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 23

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 25

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 26

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 27

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 28

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 29

Warning: session_set_save_handler(): Cannot change save handler when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Data/AbstractSQL.php on line 86

Warning: session_name(): Cannot change session name when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 45

Warning: session_start(): Cannot start session when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 46

Warning: session_cache_limiter(): Cannot change cache limiter when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 47

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 57

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 58

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Webmention.php on line 376

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Webmention.php on line 377

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/PubSubHubbub.php on line 41

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/PubSubHubbub.php on line 42

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 57

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 58

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 998

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/external/bonita/includes/Bonita/Templates.php on line 170

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/templates/default/shell.tpl.php on line 5

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/templates/default/shell.tpl.php on line 6
The Rise of the Novel
Skip to main content

Exercise 8: Narrative form + artistic representation

1 min read

In thinking about topic modeling and metadata for a group of 18th century novels, a particular challenge for me seems to be figuring out a question to explore using these methods which is neither so large that its answer would not be particularly interesting or meaningful nor so small that it is not much different from close reading. So, in trying to both narrow down and expand my interests in genre, form, etc. I really like the idea of combining algorithmic criticism with “middle-distance reading” to find something out about a group of novels that traditional literary criticism — with its dichotomous close and distant reading — can’t really do. I would like to take a group of novels written in the third person OR in free indirect discourse and look at the topics related to or involving artistic representation in the form of writing — so mentions of books, novels, poetry, other kinds of writing — to see if any commonalities pop up and if there might some kind of relationship to be extrapolated or theorized here about narrative form and the presence of narrative artistic representations in the novels.


Exercise 8

2 min read

I’d like to explore the relationship between the most prominent topics within a corpus of novels and some pieces of metadata that may correlate to the novel’s availability to different socioeconomic strata of readers in order to see if we can track any changes in these factors along with the assertions of Ian Watt in “The Rise of the Novel,” namely that the novel was born alongside (and helped to create / propagate) the burgeoning middle class created by modern capitalistic systems. The metadata that I would look at would be:

  • price of the volume (can we find any correlations between the supposed content of the novels as revealed by topic modeling? do these prices change in subsequent republishing? when is the price included in the paratext?)
  • place of publication (what kinds of novels were published in Dublin as opposed to London as opposed to Philadelphia? how do these reflect the state of the middle class in each respective area?)
  • year (does the novel become reflect more middle class values as it ages?)

I would group the novels by these divisions of metadata (e.g. all the novels that cost 2-3 shillings or were published in Dublin from 1775 to 1780) and use topic modeling to see if I could find any reflection of these factors in its most prominent topics.

Exercise 8

1 min read

Because the novel I am using for my experimental bibliography project contains a prologue, and since my project engages with the content of the prologue, I have been thinking recently about prologues in general. Relevant questions about prologues include: what information do they convey? In what types of novels do they appear? What is their relation to the novel as a whole? Do they appear more often in first edition copies or subsequent editions of novels? I am also interested in this research question after talking with Professor Buurma about my project last week and learning that one suggestion for choosing which 18th century novels Penn should digitize is the novels with prologues. Preliminary information about these prologues could be very helpful in making this decision. I think topic modeling could tell us what themes are covered in the prologues. Other questions could deal with comparing resulting topics from metadata of novels with prologues vs. novels without prologues. I could also divide the novels with prologues into subcategories derived from metadata information (such as narrative form, publisher, or author).

Exercise 8

3 min read

One thing I'd like to use topic modeling and metadata for is to look for a shift from religious communities/dynastic realms and messianic time to nationalism and homogenous time (as described in Anderson's Imagined Communities. To do this, I think it would be best to split a corpus of eighteenth century novels into groups based on their publication date. One suggested grouping is 1700-1750 and 1751-1799, although one could also create more than two groups by making the year ranges smaller (for example, 1700-1733, 1734-1766, 1767-1799). However, it is possible that the later dates may contain a larger number of novels, in which case it might be better to have uneven groupings so that the split of novels between groups is not too disproportionate--though a large time range could obscure when the shift took place. It might be preferable to run this multiple times with different groupings to see how the groupings affect the results.

To identify the shift, I think one thing to look at would be to see whether there are any topics related to imagined communities: that is, any topic where an imagined community (a community where the constituents don't interact with all other constituents) is associated with descriptive words or traits. One particularly useful thing to look for would be the association of country names with other words. Topic modeling could also be used to look for simultaneity in novels, which Anderson argues enables the concept of nationalism and imagined communities. Here it would be helpful to look to see if topic modeling can help identify instances/trends of simultaneity (where characters are acting separately but at the same time). (Topic modeling may not be the best tool for this, but I'm not sure what would be. Perhaps it would be better to look at the frequency of words like "meanwhile".) It would still be interesting to see if words like "meanwhile" show up in topic modeling, and what words they are associated with if they do. Another thing to look for that doesn't require topic modeling is to see how time is described across eighteenth-century novels. Anderson discusses a shift from sacred time to modern time, and this shift could be identified through a rise of the usage of standardized or clock time. Anderson also talks about the newspaper's role in enabling the new sense of time/imagined community, so it might be worth looking at the mention of newspapers (frequency-wise) or seeing if the word "newspaper" appears in the topic modeling at all. If the word "newspaper" does appear in the topic modeling, it would be interesting to see what words the newspaper is associated with.


Topic-Modeling Metadata

2 min read

At first glance, topic modeling doesn’t seem to provide us with a lot of metadata; the outputs that the algorithm generates for us are pretty straightforward strings of words. But I think there’s more to topic modeling than meets the eye, and I would be interested in exploring and analyzing what little metadata topic modeling has to offer us. My research question would be something along the lines of: What can the metadata of topic modeling tell us about topic modeling as a practice, and about the novels they are attempting to topic model?

In order to effectively analyze the metadata of topics, we’d be burdened with the task of creating, or at least documenting, the metadata that’s available for each one. I would begin by tagging each topic with the number of other topics produced alongside it, the number of iterations, the number of printed words, and the presence of stop words or not. I would then create some basic content-related labels surrounding the topics based on what we’ve seen so far, such as “money” or “family” or “hilarious” or “???” depending on the topic. I’d also like to figure out a way to assign the topics a “relevancy score”, or some metric that indicates how much the topic “makes sense” to a human reader or how much meaning we can draw from it.

I think this research question, and its answers, would provide us with a SUPER macro-level picture of what’s going on in a corpus of novels, one worth discussing.


Exercise 8

1 min read

Inspired by Habermas's arguments regarding individual subjectivity and private writing oriented towards an other, I would be interested in tracking instances in a group of 18th-century novels composed of letters of times direct address to a recipient appears, and how the frequency of these occurrences changes over time. To what degree are letters simply diary entries with a recipient's name at the top, or news reports, and to what degree is the recipient an important and relevant part of the discourse? Is the letter a monologue or interaction with an imagined other? I would want to track instances of the use of the word "you" within letters, as well as the use of rhetorical questions ending with a question mark, and see if a pattern emerges over time. Both topic modeling and metadata would be useful in putting together the variables of direct address and date of publication of the novel.

Exercise 8

1 min read

Research Question: How did the ways in which novels talked about women and social class change from the beginning to the end of the 18th century?

I think that an interesting way to go about answering this question would be to first use metadata, i.e. years of publication, to separate groups of novels written from 1700-1720, 1720-1740, 1740-1760, 1760-1780, and finally, 1780-1800. One could then use a topic modeling tool to create topics for the novels in each of these corpuses, and look for patterns in topics that mention gender or class. Do they tend to focus on beauty and wealth? Or on virtue? Maybe on intelligence and kindness? These questions go alongside the Questions of Virtue that raised by McKeon, and could help to think about how novels answered these questions throughout the century. A further analysis could even look at the titles of the works written during each of the time periods, and a simple word cloud could help to give a sense of what words were common and what words were not.


Exercise 8

2 min read

How much do titles foreshadow what the novel is truly about? To what extent do the elaborate descriptive titles of the 18th century novels we’ve looked at reflect the themes with which the novel is occupied? Do the words that appear on the title page reappear throughout, or are they simply there to attract readers?

I’m not entirely sure how to execute this using only the exact topic modeling and metadata tools of the past two assignments, but very similar technology could answer these questions. The topic modeling would need to be limited to a single novel (if we wanted to do this very inefficiently, with tons of iterations), or there would need to be a way to connect the topic modeling to the metadata in such a way that matches novels with themselves. That was poorly explained. What I’m trying to say is that the two technologies would need to be combined in such a way that would allow us to compare words in titles to themes within individual novels. This would allow us to determine—albeit pretty abstractly and inconclusively—how much of a correlation there is between what the title promises the reader and what is delivered.

Alternatively, there could be a cool tool that uses the basis of topic modeling—co-occurrence of words—but examining the titles as well as the body of the text. In novels with “virtue” in the title, what percentage of the words are “virtue” or related terms? And what topic does “virtue” belong to? What does that tell us about novels with “virtue” in the title?


exercise 8

1 min read

I've been thinking about this for some time, and I'm still left scratching my head. I feel very deeply that topic modeling, data analysis, and metadata can be combined to provide interesting insight into old and new questions. But when I try to think about really good uses, I keep drawing a blank. I'm not sure what I want to know about the corpus, and even worse, I'm not sure what my possibilities are for utilizing the digital technology we have.

I fall back on my previous idea of a hyperlinked concordance, but I'm not sure what question I would try to answer with it. I've also been thinking about how it would be interesting to try to reconstruct a text or a corpus using the outputs of the tools--especially topic modeling, but also fusion tables, word clouds, maps, etc. What could we learn about trying to resee the thing we research through the researching tools? Would it be inaccurate nonsense, like when you use an online translator to translate a word into another language and then back to the original? Or might it broaden our perspectives?

Exercise 8

2 min read

I have been thinking a lot about novels’ consciousness and narration as I have been planning my final paper, and in particular I have been concerned with shifts in these areas in novels at different times throughout the rise of the novel. If we are meant to use the novel metadata file on the Github, I think we might be able to use topic modeling to explore these relationships. Obviously, topic modeling would not match specific years with narrative form or something like that, even though looking at specific dates, perhaps on a line graph, might be the easiest way to examine these shifts. Instead, I think you could do a topic modeling exercise where you use that metadata file and look at matchups between narrative form and the types of titles/words in titles. My research question would essentially be: “how can we look at shifts in narration and novels’ consciousness of its form and fictionality in a non-numeric way?” Like I said, I think we could examine this question by topic modeling the co-occurrence of certain narrative forms and book titles. For instance, we would probably see lists that grouped epistolary, first-person novels with long titles such as “the history of so-and-so” with lots of extra details included in the title. This would be contrasted with lists of more advanced narrative techniques with less laboriously-detailed and formulaic book titles. The difference between these various sets of co-occurring topics would hopefully reveal a shift in the way novels describe and think about themselves externally, in the title, and internally, in the narration.

Exercise 8

2 min read

The corpus metadata we have at our disposal allows for us to ask a lot of interesting questions. For me, it was rather striking the change in title location names that were seen in this dataset. When mapping Robinson Crusoe, we saw a mostly Eurocentric distribution of countries, but now it seems a bit more widespread, particularly in the United States. But, this doesn't allow for a fully exhaustive analysis of the actual places mentioned in each book. Exercise 7 allowed for us to look at topics in the actual text, and we found that countries and characteristics of those countries (imperialism and England or dynasticism and China etc.) were often present. My question is looking into the locations of these novels (and possibly those mentioned) and how they change over time. I predict that early 18th novels like Robinson Crusoe are likely to have a very Eurocentric focus and be accompanied with topics of seafaring and imperialism whereas novels in the later 18th century would likely be more "worldly" and invoke countries and topics far beyond the reach of Europe. Also, it would be interesting to analyze how publication location fits into this. This dataset doesn't include many publication locations outside of London, but maybe with a more comprehensive list or looking further into the 19th century, we could see how novel publications outside of this area deal with location. I would imagine that other locations may focus on areas outside of British empire control, and deal with completely different topics.

To do this, I would use the Google Fusion metadata to tag title nouns, adjectives, and locations that are associated with a country or feature. Likewise, topic modeling of this corpus would reveal prevalent country related topics and what novels they correspond to. In this way, I could pinpoint which novels deal with what countries and topics and then place them in categories by their publication date. Also, adding in other publication locations into this same analysis would give us a bit more diversity. This could then be visualized in 10 or 20 year chunks to give an idea of how location and perception of location changed with respect to time and publication location.

Exercise 8

1 min read

Although the metadata provided us with a lot of quantitative information about the group of novels we studied, there were several aspects of this collection of works that went un-described by the dataset. While completing the exercise, I found myself wondering about the popularity of these novels. Additionally, during the topic modeling exercise, we saw that while some of the topics generated seemed to be random amalgamations of unrelated words, topics that were relatively cohesive and identifiable did appear.

I would like to address the popularity of different topics during the 18th century and combine both metadata and topic modeling to track what subjects people were most interested in reading about. I would first choose several well defined topics (possibilities could include exploration/travel, family, literature/fine arts etc.) and collect data on the individual novels that comprise each topic. Obtaining informationon how many copies of each of these books were printed or sold would most likely require some digging, but if getting a hold of this data were possible, I could then monitor the popularity of each topic over a specified time period.


Assignment 8: The Intended Audience

2 min read

Most of our exercises focused on understanding the novels and the literary trends of the 18th century. However, it is also important to understand the 18th century audience that the novels are trying to reach. From the metadata and through topic modeling, it is possible to better understand the intended audience or rather the point of view of the intended audience. Furthermore, as most of the novels we have read were published in 18th century Britain and during the era of the British Empire, it would be interesting to see the places that the novels mention, allowing us a better understanding of the world from the point of view of an 18th century Briton. I would create a map of “TitlePlaces” from the metadata, paying close attention to the locations that were mentioned the most in the metadata. If the novels were focusing on specific locations, it is possible that the audience could easily recognize or relate to those locations, as the authors would want their novels to relate to or influence the audience (e.g. Defoe mentions the “Brazils” and the plantations there to idealize the enslavement of the New World). Afterwards, using the Topic Modeling Tool, I would look for the names of the those locations. I would focus on finding any trends or patterns of words that are associated with popular locations (perhaps England is associated with “civilization” and Paris is associated with “progress”). The purpose of this research is to better understand an average 18 century reader, thereby giving us a better understanding of why authors chose certain techniques, locations, and themes in their novels.


Europe #1?

2 min read

Considering that many of the texts we’ve read were originally published in 18th century England, it would be interesting to revisit Eurocentrism as a potential research topic. These 18th century novels seem to place an emphasis on European exceptionalism, primarily by glorifying European social and ideological customs. Since it might be hard to explore this Eurocentrism with the guidance of single question, I would probably turn to Ian Watt’s idea of formal realism. In the simplest of terms, Watt argues that the novel serves as a reflection of the individual human experience. Contextualizing this with the novels we’ve absorbed this semester, the human experience that Watt identifies seems to be a predominantly European one—specifically pertaining to England.

We discussed this concept briefly in our discussions of Robinson Crusoe, but I imagine it can be further contextualized by using the software from exercises 6 and 7. Using the available metadata, I would create a word cloud for the ‘TitlePlaces’ column and keep an eye out for European locations. I might also experiment with the ‘TitleNames’ and ‘TitleNouns’ categories to see if any results seem specific to Europe (European names, objects, etc.). Additionally, I would run digital facsimiles through the topic modeling software. I would probably use chunks from the novel to this, as that seemed to be the most effective method in exercise 7. After doing this, I’d pinpoint topics that evoke European social conventions, locations, and ideologies. Given the importance of location and setting in this proposed research project, I would also be enticed to input the results of the ‘TitlePlaces’ category into Google My Maps, which is similar to what we accomplished in exercise 2.


Exercise 6: Metadata

4 min read

Exercise 6

Playing around with the metadata confirmed a trend we’ve talked about in class: Most of the novels in the corpus were published in London, with a smaller but significant percentage coming out of Dublin and the remainder scattered among other locations. Between 1700 and 1740, 94.9% of the novels in the sample were published in London, only 2.2% in Dublin, and none in other publishing cities such as Edinburgh or Bath. Meanwhile, in the second half of the sampled era, between 1740 and 1779, only 85.7% of the novels were published in London, with 12.3% in Dublin and small but notable percentages in Edinburgh, Glasgow, and a couple of other cities. I think it’s safe to say that this this trend speaks to the increasing popularity of novels outside of centers like London and the growing tradition of reprinting and pirating books. However, this sample of 855 novels is dubiously representative of All 18th Century Novels, and it seems possible that this trend and others speak just as much to idiosyncrasies and oversampling in this particular corpus as to actual patterns. Partly because of this, and because of the unwieldy and idiosyncratic nature of categories like TitleNouns and AuthorDates, I had trouble seeing the utility of the metadata and finding anything really exciting in it when playing around with Google Fusion.

The data, and the tools we have to analyze it, are somewhat limited. I thought it would be interesting to trace the prominence of particular types of paratext in conjunction with each other over time. Specifically, I wanted to see how often pieces of paratext coded as “Preface” and pieces of paratext coded as “To the reader” occurred in the same novel over time. Their co-occurrence might be a rough proxy for the amount of hedging, snark, and/or authorial self-abasement addressed to readers and editors. However, because all the types of paratext (preface, advertisement, errata, etc.) are lumped together in one column (paratextTitleControlled) charting the rise of a couple of individual types of paratext doesn’t seem to be possible. For instance, I would want a filter to pick up on Samuel Richardson’s Clarissa as having both a preface and a “To the reader” section, as the novel is described as having “Preface, Character information, To the reader, Errata.” But from what I can figure out, a Google Fusion bar chart of publication date, filtered by “Preface” and “To the reader” in the category paratextTitleControlled, would only show novels whose paratext has been coded in that order, leaving out novels whose paratext was coded in a different order. The search treats “Preface, Character information, To the reader, Errata” as a different value from “Preface, To the reader” rather than recognizing it as the combined occurrence of a preface AND a “To the reader” along with some other stuff (Errata and Character information) that’s irrelevant in my search.

To illustrate, Image 1 is a chart of paratext over time, filtered by “Preface, To the reader.”

Image 2 is a chart of paratext over time, filtered by “To the reader, Preface.” Even this chart -- the same two types of paratext, listed in a different order -- is totally different.

Notably, neither of the two charts above include Clarissa at all, since the filter can’t pick out the two types of paratext in the orders listed.

Similarly, it might be interesting to look at the volume and frequency of particular title nouns (or adjectives, but I was looking at nouns) over time. The word cloud I made (Image 3) points out which nouns occur most, as raw numbers, out of the whole corpus we have data on, but it doesn’t let you visualize changes over time. A bar chart would be more helpful for that, but again, if you wanted to look at, say, “French” and “amour” in conjunction over time to see what if anything you could learn about how novelists imagined the French, the filter would only be able to pick out titles where the coder had listed “French” and “amour” in the order you entered the terms into the filter.

The human arbitrariness of the way the novels were coded (e.g. some novels’ paratext includes “To the reader, Preface” while some includes “Preface, To the reader”), the way the categories are formatted, and the relative simplicity of Google fusion combine to make looking at how multiple values interact difficult. More sophisticated analysis tools, and a more sophisticated understanding of how to use them on my part, would let me get at more multidimensional ways that different values interact for the different categories of metadata.


Time Made

4 min read

I think I'm going to approach this a bit more artistically and a bit less scholarly than may be intended, but I can't help myself.

I was most taken with the topics generated from the prescribed settings (50 topics, 1000 iterations, 20 words per topic):

Space Pirates: strap captain narcissa ship chap board time behaviour morgan surgeon immediately body uncle mate cried expence put told banter thomson

time travel, space ships, best friends, beautiful aliens, and witty remarks from the medic.

The next three all have exceptionally good final three words. I wonder how much the order matters to my understanding of the topics, and how random the order is.

Evening Passion: eyes purpose attention voice tears peace stood silence instantly fixed ground soul night distress place led felt length rose equally

two lovers part in a moonlit garden.

Americans Abroad: peregrine pm pickle lord pipes hero commodore gentleman mrs emilia hatchway love trunnion lieutenant jolter painter company view french behaviour

men in double-breasted suits aboard steamers talk of art and war over lunch.

I'm having a lot of fun with these. They remind me of poems without linebreaks. The NY Times has a running column that makes poems out of missed connections postings on Craigslist, which remind me of this. It makes me really want to write found poetry for my experimental bibliography.

I generated two other lists of topics: one simple, and one complex. Both were disappointing.

10 topics, 100 iterations, 10 words per topic:

pastoral epic: time power pleasure present life nature happiness english country thousand

In fact, the simplicity of the settings has led me to the most complex, or at the very least, abstract topics. I have asked the computer to distill novels to their basest forms for me. If I consider novels an imitation or representation of reality, then I am nearly asking a computer to find the meaning of life. That, of course, did not work out so nicely in The Hitchhiker's Guide to the Universe.

20 topics, 2000 iterations, 15 words, no stopwords:

this took 30 mins for the program to complete.

The most interesting topic this:

an ode to ee cummings: the to of i in a it not that but for be have as my Satisfyingly the opposite of the "simple" results, but otherwise too basic, too superficial.

Of course, I could write about the easily labeled topics: church, or voyages, or one topic that was very obviously Pamela. What's the fun in that? Topic modeling effectively takes something sciencey and relieves it of any obligation to be scientific. We take all these data that have been collected in the most absolutely unbiased process and require that they be nearly arbitrarily (certainly subjectively) named, labeled, and sorted.

I've been thinking about applications for topic modeling. Is it practical for telling about large amounts of writing? How could I actually use it in a real situation? Not just by generating lists, I think. But what about connecting the words in the topics to the full information? Could we hyperlink each word to direct back to its appearance(s) in the original text(s)? I'm thinking about something along the model of The Perseus Project. Could we create topic concordances, with links to locations of every instance of the word chosen in the topic? Could we generate statistical metadata, showing frequency, placement, etc? Lastly, could we superscore iterations of topics? Is that already being done by iterating (I don't have a strong enough grip on the actual process)? I'd like to see a super topic model, where only the strongest words remain, only those used over and over and over again, or used from topic to topic to topic. Is that close to my simple settings? What would happen if I asked the program to iterate once, generate one topic, and choose one word? I assume it would choose the most used word in 1760's novels.


I did this and returned with "sir". With 2 topics and two words each, I got "time made" and "sir lady." I'm intrigued by time made. I will leave it on this note.


Experimental Descriptive Bibliography

5 min read

Novel: The Wedding Ring, or, History of Miss Sidney. In a series of letters. In three volumes. By Anynymous.

Context: In the preface, the author refers to an obsession with knowing the identity of the author in order to place the story of the novel into context. The author says that readers want to know whether the author “be of dark or fair complexion, mild or choleric disposition” or “married or single.” Readers also want to know the motives of the author and reasons for writing—whether the novel was "written by the importunity of a friend, or whether the author’s natural temper and inclination directed the choice of the subject.” This obsession with authorship to me seems contradictory to the traditional way of thinking about literature—shouldn’t we value the text based on the quality of the writing and the story?

But the author is hinting on something that was and remains to be true about literature–text is often inextricable from context. It is not just petty curiosity that drives past and present readers’ obsession with authorship. The identity of the author gives readers clues into how to read the text. In most literature classes, reading a new book often begins with a lecture or on the background information of the time, and a biography of the author.

Thinking about the relationship between text and author is even more interesting when one considers that past and present-day readers will have different reasons for wanting to know the identity. At the time of its publication, concerns over the author were probably related to credibility—was the author really a woman, what class was she, and had she written anything noteworthy before? Today, the identity of this eighteenth-century author would help us place this novel into historical context in order to analyze data and form a thesis about gender and authorship.

The traditional bibliography leaves out the complicated relationship between the author and the text that I have described above. This relationship is addressed directly in the preface, and it extends throughout the reading of the book, because as I mentioned above, the identity of the author often gives us a lens through which we read the novel.

This project is also motivated by my own musings and confusion over the obsession with gender in literature and in life. Professor Buurma mentioned in class that data analysts who are not necessarily lit scholars often use gender as an example of theses we can make about metadata. But this example is often simplistic and also—who cares??? Can’t we look at other things with all this data? Finally, the obsession with author identity/gender past and present makes me wonder— is it necessary to treat gender in such strict binary terms when analyzing literature? What do we lose and what do we gain if we move beyond that binary?

Project Plan: I plan to explore what the bibliography leaves out by exploring possible identities of the author and what we imagine the identity might tell us about the text. I like the idea of exploring these questions in a modern context. I plan to photograph students (maybe specifically student writers?) who identify as female. I want to ask them to dress as if they were trying to disguise themselves, as if they were going out in public and didn’t want to be recognized, celebrity hiding from paparazzi-style. I will ask them to write a letter, because The Wedding Ring is written in epistolary style. I imagine there will be a variety of methods of writing letters—whether it is an email, a letter on paper, a quick post-it note to a friend. I will also ask them to write the letter in a space they feel is most appropriate/natural for them to write this letter.

The collection of photographs will serve as possibilities for the author of “The Wedding Ring.” What can we discern about the possible authors even though they are wearing disguises? What do their clothes and their location for writing say about their identity? Why are they writing on a laptop as opposed to paper? What do we think they are writing?? Can we tell what they’re writing just by this appearance? I think more questions and different ways of interpreting the photos will arise once I have taken the photos.

I like leaving aspects of this project to the discretion of the students (how to dress, how they want to write the letter, where to write the letter, with whom they will write, etc.), because to me this seems more experimental, which is fun, and it represents the lack of control from both the author and the readers in the writing and reading of a novel.

I’m not sure in what format yet would be most appropriate for displaying these photos. I could make a collage, where the photos can be viewed close together, or a photo essay, which is more conducive for close scrutiny of each individual photo as well as comparison. I also need to think about whether these photos should be displayed online or if I should print them—what is at stake with materiality, especially when considering bibliography, which strives to convey the materiality/physicality of a book through formatted description?


Exercise 6: Metadata

3 min read

I thought it would be really interesting to see what places were mentioned in titles to help us think about the obsessions of the time—what places were interesting, what places people wanted to read about. The results of the mapping, though, weren’t all that surprising. The highest concentration was in Europe (mostly the UK), with a sprinkling of places in Africa and Asia and a good number in the USA (what is now the USA). The reliability of this map, however, should be questioned. My favorite example of the flaws of the geocoding is that “The adventures of Abdalia, son of Hanif, sent by the sultan of the Indies, to make a discovery of the island of Borico, where the fountain which restores past youth is supposed to be found. Also an account of the travels of Rouschen, a Persian lady, to the topsy-turvy island, undiscover'd to this. The whole intermix'd with several curious and instructive histories. Translated into French from an Arabick manuscript found at Batavia by Mr. de Sandison : and now done into English by William Hatchett, Gent. Adorn'd with cuts” somehow geocoded to Illinois. There are lots of locations in that title; how it decided it was in Illinois is beyond me. So while the mapping experiment is interesting, it should be taken with a grain or two of salt (much like the mapping Robinson Crusoe project).

As for most of the Fusion charts, I found it difficult to draw meaningful conclusions from them because the data doesn’t necessarily mean all that much. There is, for some reason, a big spike around 1770 in the number of novels, but I’m not aware of a significant reason for that and it could be due simply to the dataset from which we drew. There’s a fairly steady increase in epistolary novels and in the use of non-narrative forms, but since general publications increased and since the data doesn’t take into account percentage of publications, those increases are to be expected and don’t mean much. This is borne out by the fact that the pie chart shows a fairly even distribution among epistolary novels, third-person narrations, and first-person narrations.

I’m a fan of word clouds, so I found the last part interesting. A word cloud of titles revealed (more confirmed than revealed, I suppose) a tendency to give lots of information in the title. Words like “containing,” “price,” “life,” “edition,” “travels,” “volumes,” “history,” “adventures” all show up prominently and all imply a certain piece of information being given in the title beyond the kind of title we would expect from a contemporary novel. Overall, I think these programs are cool and fun to play around with, but drawing definitive (or even speculative) conclusions from them is difficult. I think the further research question I’d be most interested in given the tools (which wouldn’t be that hard) would be the share (percentage) of epistolary vs third person vs first person narratives over time.


Assignment 6

One thing I learned from the publication date bar graph is that most of the novels in the collection were published after 1740. This could be because either more novels were published after that date or because fewer of the novels published before 1740 were collected. The narrative form pie chart was also really interesting. The top three forms seem to be “third person”, “epistolary", and “first person”. This suggests that the epistolary form was popular, as was suggested before in previous exercises.

After that, I used a word cloud to look at the titles of the novels. Frequent words seem to be “adventures”, “history”, “volumes”, “Miss”, “Lady”, “Written”, “Memoirs”, “Vol”, “Edition”, “Life”, “Spy" and “Travels”. The words “Miss” and “Lady” suggest a female protagonist, while the words “history”, “life”, and “memoirs” suggest that the novel focuses on the protagonist’s private life. It would be unsurprising if these latter words were also associated with a female protagonist, as one of the popular novel forms seen so far seems to be about a lady’s inner or private life or thoughts. Back to the list of frequent words, the word “written” is rather interesting. One thing it could mean is that author anonymity is becoming less and less frequent. It could mean instead that the title page insists the novel is written by the protagonist, or that the novel is written by a lady (who may remain unnamed). It would be interesting to go back and look whether the word “written” corresponds to a named author or an anonymous one. One of the words that surprised me was “spy”, as it doesn’t seem suited to an epistolary novel or necessarily to an adventure novel. It could be that novels about spies are also becoming popular, although I don’t think it’s a subject/genre we’ve really looked at before.

The Continued Study of Popular Novels and Their Effect on Literature

4 min read

Before I begin, I’d just like to say that everything written below is based on the assumption that the vast amount of novels that were not picked up by the database were spread evenly throughout the timeline. By this, I mean that the novels not picked up by the database followed the trend showed in the graphs. For example, I’m assuming that half of the novels the database didn’t pick up were all published in 1768, and the other half in 1776. So long as the trends stay the same (ie, there is still a sharp increase from 1740 to 1741 in the publication of novels), then I should still be fine. In the last exercise, I examined whether or not the books we studied in class had a measurable effect on the short-term publication of novels, determining whether or not novels such as Pamela or Evelina changed how novels were written or named. I found that it was difficult to notice this effect over the span of a few years based on the data available. Perhaps there was simply not measurable effect at all in such a short time frame. However, when I began this exercise, I decided that I would continue to look at the effects these popular novels had the European world of literature. Since I was working with a much larger time frame this time around, I was hoping to find some greater evidence that these popular novels impacted the literary sphere in some measurable way. Since I found that trying to see how themes or narrative style changed over time, I decided to use the publication date graph for this exercise, seeing if the publication of these popular novels resulted in a spike in publication. The novels that I was basing this study off of were Robinson Crusoe, Pamela, and Tristram Shandy. Just like the last exercise, I found it was difficult to measure this effect. Robinson Crusoe, which was written in 1719, was one of three novels written in this year according to the data. The only noticeable increase was in the next year, 1720, in which eight novels were published according to the data; after 1720, however, the numbers drop down to the 2-5 range for the next decade or so. Tristram Shandy and Pamela, on the other hand, seemed to have a much larger effect. After Pamela was written, the number of published novels spiked from one to fifteen, and for more than a decade the number of published novels per year did not drop lower than seven. After Tristram Shandy, novel publication jumped from 4 to 33, and then stayed relatively high compared to earlier years into the future. So, even if one of our influential novels didn’t seem influential as we first believed, perhaps these two novels make up for it. However, this may be assuming too much. The data only tells us how many books were published per year; it does not match these increases or decreases to any other numbers. Thus, it’s impossible to claim that these popular novels were the sole, or even major, contributor in the increasing publications of novels. For example, improvement in printing technology could have caused this increase, as well as a multitude of other social factors. So, unfortunately, it’s difficult to make a claim based on this data alone. If I were to return to this point and do some more research, I feel like I would really need to dive into the time period in which we see a rapid increase in novel publication. I need to have a stronger grasp on the social sphere of Europe during these times which, unfortunately, would be difficult, if not outright impossible, to accomplish using something like this. To its credit, however, this software has certainly pointed me in the right direction. And, even if it can’t give me a definite answer to my question or even point me to the wrong place, it is certainly a start.