Skip to main content

Experimental Bibliography Plan

3 min read

Making my traditional bibliography for The History of Eliza Warwick, it seemed like a lot of the information expected in such a bibliography has a representative function, or else describes an anomaly. Representative information, that could tell us something about the work as a whole, like the title and publication date, and information about anomalies that you wouldn’t expect given only the representative information, such as ornaments or weird pagination, both make the cut. Some of the information that the highly codified form of the traditional bibliography demands, like pagination and collation, struck me as not particularly useful in representing the novel, instead simply uniting to say, “We are a book with pages.” On the other hand, information about plot and characters had no place in the traditional bibliography. How do you decide what information is representative, and what information is just incidental? Isn’t this distinction kind of arbitrary?

These questions remind me of Barthes’s claim that one can distinguish between details that are symbolic and details that serve no other purpose than to unite with other otherwise meaningless details to create reality effect. As a number of people pointed out in class, the distinction between Flaubert’s pyramid of boxes and his barometer seems arbitrary -- how does Barthes know the barometer isn’t a symbol? How can you tell what is reality effect, and what has a representative or symbolic purpose?

Just as the barometer might conceivably be an ignored symbol, might in fact want to convey a meaning other than “We are the real,” the many pieces of information left out of the traditional bibliography, judged not to be necessary for cataloguing the novel, might also relevant to its categorization. The traditional bibliography includes information about volumes, but not the sub-divisions within volumes. Eliza Warwick, an epistolary novel, is divided into letters, bracketed by salutations, valedictions, and the occasional postscript, but the traditional bibliography doesn’t capture that information. So, my experimental bibliography will.

I’m taking the formal components of the letters that make of Eliza Warwick -- salutations, valedictions, and postscripts -- and superimposing them onto barometers. For each component of the letter, the barometer’s arrow points to one thing: “We are the epistolary novel.” Except, the letters’ openings and signoffs signify more than just reality effect; signoffs like “Your affectionate friend, C. Huntley” also convey information about the main characters and their affect towards one another. My goal is twofold: to convey this information and to suggest that it, like the barometer that is measuring it and classifying it as reality effect, can’t be easily classified as either reality effect or representative, just as people in class felt that Barthes should not be so quick to dismiss Flaubert’s barometer as a symbol.

Exercise 8: Narrative form + artistic representation

1 min read

In thinking about topic modeling and metadata for a group of 18th century novels, a particular challenge for me seems to be figuring out a question to explore using these methods which is neither so large that its answer would not be particularly interesting or meaningful nor so small that it is not much different from close reading. So, in trying to both narrow down and expand my interests in genre, form, etc. I really like the idea of combining algorithmic criticism with “middle-distance reading” to find something out about a group of novels that traditional literary criticism — with its dichotomous close and distant reading — can’t really do. I would like to take a group of novels written in the third person OR in free indirect discourse and look at the topics related to or involving artistic representation in the form of writing — so mentions of books, novels, poetry, other kinds of writing — to see if any commonalities pop up and if there might some kind of relationship to be extrapolated or theorized here about narrative form and the presence of narrative artistic representations in the novels.

Exercise8

Exercise 7: Topic Modeling

4 min read

All the following topics are based on the whole corpus.

41 topics, 1000 iterations, 20 words printed per topic

Public Men Do Public Man Things: people power country laws state government law great liberty public men nation equally constitution present influence justice interest order private

Love Story: man woman good love make thought men women young world creature wife word poor find sex thing give don girl

50 topics, 500 iterations, 12 words printed per topic

Topic Modeling If Fo Fatiffying: fuch fo faid reverie fame moft fome thefe foon himfelf thing

Seventeen(sixty) Magazine: beauty lady fine eyes young fair beautiful air women eye dress appearance

50 topics, 500 iterations, 5 words printed per topic

Young Pickle!: peregrine pm gentleman young pickle

: man good make men give

Reading Tristram Shandy: tears heaven soul grief distress

80 topics, 1000 iterations, 12 words printed per topic

Putting the ‘List’ in ‘Orientalist’: japan taycho great dairo chinese empire orator cuboy farm people japonese fika

So Torrid: passion love heart lover mistress affection object sentiments tender tenderness soul loved

Authorial Modesty: author great genius learning learned read book works work poet books taste

I hoped that narrowing down the number of words per topic would yield more coherent topics, but I actually found that shorter topics were vaguer and more random-seeming, perhaps because by the time you are looking for the five words that are most likely to co-occur, you’re likely to get sort of generic, everyday words like “man good make men give” -- the ones I list above are the most coherent of the 5-word topics.

Armstrong claims that the transition from using the class system to using a person’s internal moral qualities as a measure of a person’s worth occurred through the novel, since the novel imbued the middle-class woman with individuated subjectivity. The topics that are explicitly gendered male seem to confirm her hypothesis, if we accept that in the 1760s, the part of the shift in which all middle-class people, including men, could be measured by their internal moral qualities had not yet happened. The topic Public Men Do Public Man Things lacks nouns that could be coded as having to do with the internal subjectivity of characters. We might infer that novels where that topic is prevalent conform more closely to an old model of fiction in which social stature determines worth, even if their particular ideas about the social order (liberty! equally!) are relatively modern.

However, Armstrong’s primary point, that the interiority of middle-class women began to matter in novels, is not precisely confirmed by the topics that are explicitly gendered female. The topic I called Seventeen(sixty) Magazine is representative of a number of topics which seem to be the “women’s novel” topic. Interestingly, while they certainly do not place emphasis on social class in the way that public, manly topics do, they also place no more emphasis on interior virtues. Rather, they bring to light a middle ground between the privacy of “essential qualities” and the publicly visible social order: physical appearance. The topic Seventeen(sixty) Magazine features appearance words laden with positive connotations, like beauty, fair, and young. It would be interesting to see whether, in novels where this topic is prevalent, female characters’ worth is defined more by their “essential qualities of mind” or their physical beauty.

The topics that reflect romance open up onto a number of questions about Armstrong’s thesis. So Torrid and Love Story, topics reflective of the romance genre, are composed of words that do explicitly refer to essential qualities of mind: good, love, passion, affection, tenderness, soul...According to topic modeling, it is in the romance genre that characters are most likely to be endowed with inner moral virtues. The romance, a particular type of domestic fiction, seems to match most closely with Armstrong’s argument. Do these (heterosexual) romances, which presumably demand both men and women main characters to function, imbue both their male and female characters with qualities of mind? Armstrong says that domestic novels “seized the authority to say what was female” -- were these feminized novels doing that through both male and female characters, or is it the feminized nature of the genre alone that allows Anderson to make that claim (468)?

*I’m not sure if it’s fair game to compare a topic chosen from a list of 50, generated through 500 iterations, and containing 12 words to topics chosen from a list of 41, generated through 1000 iterations, and containing 20 words. I think that the main thing that matters here is that they are all topics generated from the same corpus, groups of words that are likely to co-occur in that corpus, and that should make it acceptable to compare them???

 

Exercise 8

2 min read

I’d like to explore the relationship between the most prominent topics within a corpus of novels and some pieces of metadata that may correlate to the novel’s availability to different socioeconomic strata of readers in order to see if we can track any changes in these factors along with the assertions of Ian Watt in “The Rise of the Novel,” namely that the novel was born alongside (and helped to create / propagate) the burgeoning middle class created by modern capitalistic systems. The metadata that I would look at would be:

  • price of the volume (can we find any correlations between the supposed content of the novels as revealed by topic modeling? do these prices change in subsequent republishing? when is the price included in the paratext?)
  • place of publication (what kinds of novels were published in Dublin as opposed to London as opposed to Philadelphia? how do these reflect the state of the middle class in each respective area?)
  • year (does the novel become reflect more middle class values as it ages?)

I would group the novels by these divisions of metadata (e.g. all the novels that cost 2-3 shillings or were published in Dublin from 1775 to 1780) and use topic modeling to see if I could find any reflection of these factors in its most prominent topics.

Research Question!

1 min read

Suppose I give you a book but don’t tell you the title or what kind of book it is supposed to be. Could you figure out the genre just by reading it? Probably; we all have some sense of what makes a fantasy novel different from an autobiography. I am curious if a computer could similarly distinguish different genres of writing - specifically, could one use topic modeling to classify a book? Suppose we develop some algorithm to somehow rate the “genre-similarity” of a huge literary corpus. What genres would we find there? Would they match up with the genres we are familiar with, or would the algorithm group together what we would consider very different novels? I imagine that we would find weird clusters of books, since this is essentially topic modeling but on a larger scale, and the topics we discovered when working with topic modeling were sometimes humanly coherent and sometimes not. But to see the final product of this type of analysis - some big picture of a huge number of literary works, grouped together by some features of language, I think would be pretty cool.

Exercise8

Exercise 8

1 min read

Because the novel I am using for my experimental bibliography project contains a prologue, and since my project engages with the content of the prologue, I have been thinking recently about prologues in general. Relevant questions about prologues include: what information do they convey? In what types of novels do they appear? What is their relation to the novel as a whole? Do they appear more often in first edition copies or subsequent editions of novels? I am also interested in this research question after talking with Professor Buurma about my project last week and learning that one suggestion for choosing which 18th century novels Penn should digitize is the novels with prologues. Preliminary information about these prologues could be very helpful in making this decision. I think topic modeling could tell us what themes are covered in the prologues. Other questions could deal with comparing resulting topics from metadata of novels with prologues vs. novels without prologues. I could also divide the novels with prologues into subcategories derived from metadata information (such as narrative form, publisher, or author).

Exercise 8

3 min read

One thing I'd like to use topic modeling and metadata for is to look for a shift from religious communities/dynastic realms and messianic time to nationalism and homogenous time (as described in Anderson's Imagined Communities. To do this, I think it would be best to split a corpus of eighteenth century novels into groups based on their publication date. One suggested grouping is 1700-1750 and 1751-1799, although one could also create more than two groups by making the year ranges smaller (for example, 1700-1733, 1734-1766, 1767-1799). However, it is possible that the later dates may contain a larger number of novels, in which case it might be better to have uneven groupings so that the split of novels between groups is not too disproportionate--though a large time range could obscure when the shift took place. It might be preferable to run this multiple times with different groupings to see how the groupings affect the results.

To identify the shift, I think one thing to look at would be to see whether there are any topics related to imagined communities: that is, any topic where an imagined community (a community where the constituents don't interact with all other constituents) is associated with descriptive words or traits. One particularly useful thing to look for would be the association of country names with other words. Topic modeling could also be used to look for simultaneity in novels, which Anderson argues enables the concept of nationalism and imagined communities. Here it would be helpful to look to see if topic modeling can help identify instances/trends of simultaneity (where characters are acting separately but at the same time). (Topic modeling may not be the best tool for this, but I'm not sure what would be. Perhaps it would be better to look at the frequency of words like "meanwhile".) It would still be interesting to see if words like "meanwhile" show up in topic modeling, and what words they are associated with if they do. Another thing to look for that doesn't require topic modeling is to see how time is described across eighteenth-century novels. Anderson discusses a shift from sacred time to modern time, and this shift could be identified through a rise of the usage of standardized or clock time. Anderson also talks about the newspaper's role in enabling the new sense of time/imagined community, so it might be worth looking at the mention of newspapers (frequency-wise) or seeing if the word "newspaper" appears in the topic modeling at all. If the word "newspaper" does appear in the topic modeling, it would be interesting to see what words the newspaper is associated with.

Exercise8

Topic-Modeling Metadata

2 min read

At first glance, topic modeling doesn’t seem to provide us with a lot of metadata; the outputs that the algorithm generates for us are pretty straightforward strings of words. But I think there’s more to topic modeling than meets the eye, and I would be interested in exploring and analyzing what little metadata topic modeling has to offer us. My research question would be something along the lines of: What can the metadata of topic modeling tell us about topic modeling as a practice, and about the novels they are attempting to topic model?

In order to effectively analyze the metadata of topics, we’d be burdened with the task of creating, or at least documenting, the metadata that’s available for each one. I would begin by tagging each topic with the number of other topics produced alongside it, the number of iterations, the number of printed words, and the presence of stop words or not. I would then create some basic content-related labels surrounding the topics based on what we’ve seen so far, such as “money” or “family” or “hilarious” or “???” depending on the topic. I’d also like to figure out a way to assign the topics a “relevancy score”, or some metric that indicates how much the topic “makes sense” to a human reader or how much meaning we can draw from it.

I think this research question, and its answers, would provide us with a SUPER macro-level picture of what’s going on in a corpus of novels, one worth discussing.

exercise8

Exercise 8

1 min read

Inspired by Habermas's arguments regarding individual subjectivity and private writing oriented towards an other, I would be interested in tracking instances in a group of 18th-century novels composed of letters of times direct address to a recipient appears, and how the frequency of these occurrences changes over time. To what degree are letters simply diary entries with a recipient's name at the top, or news reports, and to what degree is the recipient an important and relevant part of the discourse? Is the letter a monologue or interaction with an imagined other? I would want to track instances of the use of the word "you" within letters, as well as the use of rhetorical questions ending with a question mark, and see if a pattern emerges over time. Both topic modeling and metadata would be useful in putting together the variables of direct address and date of publication of the novel.

Exercise 8

1 min read

Research Question: How did the ways in which novels talked about women and social class change from the beginning to the end of the 18th century?

I think that an interesting way to go about answering this question would be to first use metadata, i.e. years of publication, to separate groups of novels written from 1700-1720, 1720-1740, 1740-1760, 1760-1780, and finally, 1780-1800. One could then use a topic modeling tool to create topics for the novels in each of these corpuses, and look for patterns in topics that mention gender or class. Do they tend to focus on beauty and wealth? Or on virtue? Maybe on intelligence and kindness? These questions go alongside the Questions of Virtue that raised by McKeon, and could help to think about how novels answered these questions throughout the century. A further analysis could even look at the titles of the works written during each of the time periods, and a simple word cloud could help to give a sense of what words were common and what words were not.

Tags:

Exercise 8

2 min read

How much do titles foreshadow what the novel is truly about? To what extent do the elaborate descriptive titles of the 18th century novels we’ve looked at reflect the themes with which the novel is occupied? Do the words that appear on the title page reappear throughout, or are they simply there to attract readers?

I’m not entirely sure how to execute this using only the exact topic modeling and metadata tools of the past two assignments, but very similar technology could answer these questions. The topic modeling would need to be limited to a single novel (if we wanted to do this very inefficiently, with tons of iterations), or there would need to be a way to connect the topic modeling to the metadata in such a way that matches novels with themselves. That was poorly explained. What I’m trying to say is that the two technologies would need to be combined in such a way that would allow us to compare words in titles to themes within individual novels. This would allow us to determine—albeit pretty abstractly and inconclusively—how much of a correlation there is between what the title promises the reader and what is delivered.

Alternatively, there could be a cool tool that uses the basis of topic modeling—co-occurrence of words—but examining the titles as well as the body of the text. In novels with “virtue” in the title, what percentage of the words are “virtue” or related terms? And what topic does “virtue” belong to? What does that tell us about novels with “virtue” in the title?

Exercise8

exercise 8

1 min read

I've been thinking about this for some time, and I'm still left scratching my head. I feel very deeply that topic modeling, data analysis, and metadata can be combined to provide interesting insight into old and new questions. But when I try to think about really good uses, I keep drawing a blank. I'm not sure what I want to know about the corpus, and even worse, I'm not sure what my possibilities are for utilizing the digital technology we have.

I fall back on my previous idea of a hyperlinked concordance, but I'm not sure what question I would try to answer with it. I've also been thinking about how it would be interesting to try to reconstruct a text or a corpus using the outputs of the tools--especially topic modeling, but also fusion tables, word clouds, maps, etc. What could we learn about trying to resee the thing we research through the researching tools? Would it be inaccurate nonsense, like when you use an online translator to translate a word into another language and then back to the original? Or might it broaden our perspectives?

Exercise 8

2 min read

I have been thinking a lot about novels’ consciousness and narration as I have been planning my final paper, and in particular I have been concerned with shifts in these areas in novels at different times throughout the rise of the novel. If we are meant to use the novel metadata file on the Github, I think we might be able to use topic modeling to explore these relationships. Obviously, topic modeling would not match specific years with narrative form or something like that, even though looking at specific dates, perhaps on a line graph, might be the easiest way to examine these shifts. Instead, I think you could do a topic modeling exercise where you use that metadata file and look at matchups between narrative form and the types of titles/words in titles. My research question would essentially be: “how can we look at shifts in narration and novels’ consciousness of its form and fictionality in a non-numeric way?” Like I said, I think we could examine this question by topic modeling the co-occurrence of certain narrative forms and book titles. For instance, we would probably see lists that grouped epistolary, first-person novels with long titles such as “the history of so-and-so” with lots of extra details included in the title. This would be contrasted with lists of more advanced narrative techniques with less laboriously-detailed and formulaic book titles. The difference between these various sets of co-occurring topics would hopefully reveal a shift in the way novels describe and think about themselves externally, in the title, and internally, in the narration.

Exercise 8

2 min read

The corpus metadata we have at our disposal allows for us to ask a lot of interesting questions. For me, it was rather striking the change in title location names that were seen in this dataset. When mapping Robinson Crusoe, we saw a mostly Eurocentric distribution of countries, but now it seems a bit more widespread, particularly in the United States. But, this doesn't allow for a fully exhaustive analysis of the actual places mentioned in each book. Exercise 7 allowed for us to look at topics in the actual text, and we found that countries and characteristics of those countries (imperialism and England or dynasticism and China etc.) were often present. My question is looking into the locations of these novels (and possibly those mentioned) and how they change over time. I predict that early 18th novels like Robinson Crusoe are likely to have a very Eurocentric focus and be accompanied with topics of seafaring and imperialism whereas novels in the later 18th century would likely be more "worldly" and invoke countries and topics far beyond the reach of Europe. Also, it would be interesting to analyze how publication location fits into this. This dataset doesn't include many publication locations outside of London, but maybe with a more comprehensive list or looking further into the 19th century, we could see how novel publications outside of this area deal with location. I would imagine that other locations may focus on areas outside of British empire control, and deal with completely different topics.

To do this, I would use the Google Fusion metadata to tag title nouns, adjectives, and locations that are associated with a country or feature. Likewise, topic modeling of this corpus would reveal prevalent country related topics and what novels they correspond to. In this way, I could pinpoint which novels deal with what countries and topics and then place them in categories by their publication date. Also, adding in other publication locations into this same analysis would give us a bit more diversity. This could then be visualized in 10 or 20 year chunks to give an idea of how location and perception of location changed with respect to time and publication location.

Exercise 8

1 min read

Although the metadata provided us with a lot of quantitative information about the group of novels we studied, there were several aspects of this collection of works that went un-described by the dataset. While completing the exercise, I found myself wondering about the popularity of these novels. Additionally, during the topic modeling exercise, we saw that while some of the topics generated seemed to be random amalgamations of unrelated words, topics that were relatively cohesive and identifiable did appear.

I would like to address the popularity of different topics during the 18th century and combine both metadata and topic modeling to track what subjects people were most interested in reading about. I would first choose several well defined topics (possibilities could include exploration/travel, family, literature/fine arts etc.) and collect data on the individual novels that comprise each topic. Obtaining informationon how many copies of each of these books were printed or sold would most likely require some digging, but if getting a hold of this data were possible, I could then monitor the popularity of each topic over a specified time period.

exercise8

Assignment 8: The Intended Audience

2 min read

Most of our exercises focused on understanding the novels and the literary trends of the 18th century. However, it is also important to understand the 18th century audience that the novels are trying to reach. From the metadata and through topic modeling, it is possible to better understand the intended audience or rather the point of view of the intended audience. Furthermore, as most of the novels we have read were published in 18th century Britain and during the era of the British Empire, it would be interesting to see the places that the novels mention, allowing us a better understanding of the world from the point of view of an 18th century Briton. I would create a map of “TitlePlaces” from the metadata, paying close attention to the locations that were mentioned the most in the metadata. If the novels were focusing on specific locations, it is possible that the audience could easily recognize or relate to those locations, as the authors would want their novels to relate to or influence the audience (e.g. Defoe mentions the “Brazils” and the plantations there to idealize the enslavement of the New World). Afterwards, using the Topic Modeling Tool, I would look for the names of the those locations. I would focus on finding any trends or patterns of words that are associated with popular locations (perhaps England is associated with “civilization” and Paris is associated with “progress”). The purpose of this research is to better understand an average 18 century reader, thereby giving us a better understanding of why authors chose certain techniques, locations, and themes in their novels.

Exercise8

Exercise 8

2 min read

I would love to make a connection between Barthes’ The Reality Effect and map out household objects mentioned in novels to see what objects and spaces hold more prominence. After having done so, we could look at specific examples and examine whether the reality effect is, in fact, taking place or whether Watts’ formal realism is in practice. We could consider whether the majority of these objects hold any symbolic meaning or are simply present for representing a real world. Further, we could inspect if objects tend to be traditionally feminine household items such as apparel or jewellery such as in the case of Pamela, rather than general descriptive items such as in the case of A House for Mr. Biswas. This is closely linked to Armstrong’s Desire and Domestic Fiction. It would also be interesting to see the trend of mentioning object for the sake of realism as in Robinson Crusoe, to objects with cultural and social symbolism as in Pamela, to objects for creating a new spatial and temporal realm such as in Daisy Miller and A House for Mr. Biswas.

Examining a novel such as The Moonstone would not be too helpful though as the items mentioned would mostly be for the sake of the progression of the narrative giving us no indication of the prominence of societal influences, and neither will examining Tristram Shandy, for Sterne’s “unnecessary digressions” would interfere with the results, so not all novels can be examined in this project.

Europe #1?

2 min read

Considering that many of the texts we’ve read were originally published in 18th century England, it would be interesting to revisit Eurocentrism as a potential research topic. These 18th century novels seem to place an emphasis on European exceptionalism, primarily by glorifying European social and ideological customs. Since it might be hard to explore this Eurocentrism with the guidance of single question, I would probably turn to Ian Watt’s idea of formal realism. In the simplest of terms, Watt argues that the novel serves as a reflection of the individual human experience. Contextualizing this with the novels we’ve absorbed this semester, the human experience that Watt identifies seems to be a predominantly European one—specifically pertaining to England.

We discussed this concept briefly in our discussions of Robinson Crusoe, but I imagine it can be further contextualized by using the software from exercises 6 and 7. Using the available metadata, I would create a word cloud for the ‘TitlePlaces’ column and keep an eye out for European locations. I might also experiment with the ‘TitleNames’ and ‘TitleNouns’ categories to see if any results seem specific to Europe (European names, objects, etc.). Additionally, I would run digital facsimiles through the topic modeling software. I would probably use chunks from the novel to this, as that seemed to be the most effective method in exercise 7. After doing this, I’d pinpoint topics that evoke European social conventions, locations, and ideologies. Given the importance of location and setting in this proposed research project, I would also be enticed to input the results of the ‘TitlePlaces’ category into Google My Maps, which is similar to what we accomplished in exercise 2.

Exercise8

Exercise 6: Metadata

4 min read

Exercise 6

Playing around with the metadata confirmed a trend we’ve talked about in class: Most of the novels in the corpus were published in London, with a smaller but significant percentage coming out of Dublin and the remainder scattered among other locations. Between 1700 and 1740, 94.9% of the novels in the sample were published in London, only 2.2% in Dublin, and none in other publishing cities such as Edinburgh or Bath. Meanwhile, in the second half of the sampled era, between 1740 and 1779, only 85.7% of the novels were published in London, with 12.3% in Dublin and small but notable percentages in Edinburgh, Glasgow, and a couple of other cities. I think it’s safe to say that this this trend speaks to the increasing popularity of novels outside of centers like London and the growing tradition of reprinting and pirating books. However, this sample of 855 novels is dubiously representative of All 18th Century Novels, and it seems possible that this trend and others speak just as much to idiosyncrasies and oversampling in this particular corpus as to actual patterns. Partly because of this, and because of the unwieldy and idiosyncratic nature of categories like TitleNouns and AuthorDates, I had trouble seeing the utility of the metadata and finding anything really exciting in it when playing around with Google Fusion.

The data, and the tools we have to analyze it, are somewhat limited. I thought it would be interesting to trace the prominence of particular types of paratext in conjunction with each other over time. Specifically, I wanted to see how often pieces of paratext coded as “Preface” and pieces of paratext coded as “To the reader” occurred in the same novel over time. Their co-occurrence might be a rough proxy for the amount of hedging, snark, and/or authorial self-abasement addressed to readers and editors. However, because all the types of paratext (preface, advertisement, errata, etc.) are lumped together in one column (paratextTitleControlled) charting the rise of a couple of individual types of paratext doesn’t seem to be possible. For instance, I would want a filter to pick up on Samuel Richardson’s Clarissa as having both a preface and a “To the reader” section, as the novel is described as having “Preface, Character information, To the reader, Errata.” But from what I can figure out, a Google Fusion bar chart of publication date, filtered by “Preface” and “To the reader” in the category paratextTitleControlled, would only show novels whose paratext has been coded in that order, leaving out novels whose paratext was coded in a different order. The search treats “Preface, Character information, To the reader, Errata” as a different value from “Preface, To the reader” rather than recognizing it as the combined occurrence of a preface AND a “To the reader” along with some other stuff (Errata and Character information) that’s irrelevant in my search.

To illustrate, Image 1 is a chart of paratext over time, filtered by “Preface, To the reader.”

Image 2 is a chart of paratext over time, filtered by “To the reader, Preface.” Even this chart -- the same two types of paratext, listed in a different order -- is totally different.

Notably, neither of the two charts above include Clarissa at all, since the filter can’t pick out the two types of paratext in the orders listed.

Similarly, it might be interesting to look at the volume and frequency of particular title nouns (or adjectives, but I was looking at nouns) over time. The word cloud I made (Image 3) points out which nouns occur most, as raw numbers, out of the whole corpus we have data on, but it doesn’t let you visualize changes over time. A bar chart would be more helpful for that, but again, if you wanted to look at, say, “French” and “amour” in conjunction over time to see what if anything you could learn about how novelists imagined the French, the filter would only be able to pick out titles where the coder had listed “French” and “amour” in the order you entered the terms into the filter.

The human arbitrariness of the way the novels were coded (e.g. some novels’ paratext includes “To the reader, Preface” while some includes “Preface, To the reader”), the way the categories are formatted, and the relative simplicity of Google fusion combine to make looking at how multiple values interact difficult. More sophisticated analysis tools, and a more sophisticated understanding of how to use them on my part, would let me get at more multidimensional ways that different values interact for the different categories of metadata.

Exercise6