I have to admit that I have yet to be able to draw any solid conclusions from using word clouds. In this assignment, for instance, I felt that there was a lot of interesting material in the various fusion charts in regards to gaining insight into the narrative form of the novels in this list. I was excited by these discoveries and wanted to build upon them in Step 3 of this assignment, but in a way I felt like I was really going backwards into the realm of unsupported visual conclusions rather than the factual, numerical, and comparable takeaways I was able to draw from Fusion. Obviously, there are inherent differences between charts and word clouds and the kinds of information that they are used to provide—even kids at science fairs know to use a chart rather than a word cloud to represent their results on their tri-fold poster board. Nonetheless, I thought that for the purposes of “computational analysis” in this exercise (as with many in the past), I found the word cloud to be generally unhelpful.
With that being said, I was pretty intrigued by the visualizations provided in Fusion. Although none of them were particularly deeply revealing or completely unavailable by simply using the spreadsheet, I thought that the two charts, especially used together, provided some interesting insight into trends in narrative form throughout the 18th century. Obviously, I noted the significant spike in novels in general in 1769, followed by a decrease of 31 books the following year. I also noticed that very few novels were written prior to 1741. As I attempted to compare the two big adversaries in narrative form, at least according to our tracking of novels from Robinson Crusoe to Northanger Abbey, I was surprised to learn from the bar graph that there were more third person narratives than epistolary novels overall. The pie chart, however, reduced this discrepancy by pointing out that while third person novels made up 34.9%, epistolary novels made up 28.6% of novels published in the 18th century. What this pie chart did not represent that the bar graph was able to, however, were the dates corresponding to these two narrative forms. It was interesting to observe that in addition to the discrepancy between the two forms (although less significant when considered on a percentage basis), third person novels also appeared earlier than epistolary novels and experienced more variance in their popularity, whereas epistolary novels coincided with the general 1769 peak. These comparisons between counts, percentages, and dates were all much more revealing than the simple relative sizes of the different types of narrative forms in the word cloud.

I used ABBYY and an online OCR that I found through the complicated process of picking the first result that came up in my google search, called I looked at chapter 9 and also the ominous (not really) “black pages.” This online OCR was not too wrong. Of course, it didn’t understand the “s” that looks like “f.” The translation of the black pages were pretty great though. Here is a small sample:

If I were into conspiracies I’d say this was the past trying to talk to us in a now ineligible code, or maybe it’s aliens. Anyway, I also put the black pages into ABBYY and the program processed the info for a few minutes, but in the end it didn’t translate anything. I guess ABBYY knows better.

I also put in chapter 9 into ABBYY. It worked better than the online OCR. ABBYY is often only off by a letter or two. I think it translates based on individual letter recognition, rather whole words. Otherwise, I think ABBYY would try to correct familiar words, much like autocorrect on our phone. Though of course autocorrect is also infamously prone to error.

This assignment made me think about the strange journey of written and printed material and reliability of that material. Before printing, people (monks and the few literate people) would transcribe or copy old texts in order to make a new copy. As I was consulting the picture of the 18th-century text to correct mistakes, I felt a bit like a monk copying old texts into new editions. I thought how maybe it would be easy to make a mistake and maybe change a word or two, maybe some that change the meaning of a sentence, both then and now.

I thought about the ever-increasing interactions between people and machines and text. And I thought about this machine I saw last semester ( at the Berlin Jewish Museum that writes the Torah at the speed a human would. The Torah is traditionally written by a trained scribe, and if the scribe makes a mistake that Torah has to be thrown away and start over.

I couldn’t help but think, when I was correcting my mistakes, about my faithfulness to the original text of Tristram Shandy as somewhat analogous to the novel’s attempt to be faithful to reality.

I think these thoughts were a bit all over the place. TLDR, nothing changes!

First, I found reading through the Table of Contents itself rather interesting. It is told by a third person, omniscient narrator that portrays both the events each letter/segment but also adequately conveys the emotions and ideas that the readers gather by reading Pamela's own writing. I found it to be actually quite informative and it is interesting to consider the audience that it was targeting at the time. Was it used as a reminder of the course of events and the typical means of locating a passage of interest in the book, or more so as an abridged version of the novel itself meant to be read independently.

On to the Voyant exercise, as many people have noted, most of the words that are most frequently seen in the novel (actions and titles aside) are virtues and qualities that one would expect to see in a conduct book. My immediate reaction is that the frequency of words like "good," "happy," "honor," "kind" etc. highlights this aspect of the book being a virtuous novel meant to "cultivate the principles of virtue and religion in the minds of the youth of both sexes." A vast majority of the most frequent words are positive, as listed above, thus suggesting that these are the main virtues that the novel is focusing on and attempting to cultivate.

The ease by which Voyant can thoroughly analyze an entire text is incredibly fascinating and makes this assignment rather interesting. I found that tracking the use of "Pamela" throughout the novel showed some interesting trends. Aside from addressing or signing the letters, a vast majority of the usages of "Pamela" were self-pitying remarks (i.e. "poor Pamel" or "hopeless Pamela"). This was something that I noticed while reading the novel as well, but looking through the specific usages of the word itself highlighted the self-pitying nature of her character at times. It also reminded me of scenes from Robinson Crusoe in which he pities his condition. We had talked about Robinson Crusoe being either very happy with his situation or woefully disappointed with it. Looking through the Voyant produced list of Pamela's name, I found a similar trend in that Pamela either refers to herself (or others refer to her) as "dutiful Pamela," "grateful Pamela," or "pretty Pamela." I'm not entirely sure what to make of this, but it does seem to create this air of pity around the main character as well as harkening back to the structure we saw in Defoe.

According to Armstrong, the novel established the divisions of the world as gender-based, rather than politics-based. Gender is a proxy for establishing personal identity based on thoughts, feelings, and virtue, rather than by religious sect, class, etc. To translate this claim into a very focused study of one word in one novel, I chose to look at words related to virtue, which Pamela is very preoccupied with. I was surprised to see that “virtue” did not show up on the word cloud, even after correcting the cloud to eliminate the most common words. However, “good” and “goodness” did show up on the word cloud. “Good” is rather large and thus was used very often in the book:

I then looked through a bit to see how these words are used. Here are some examples:
-good lady (her former master who dies)
-if I was a good girl…
-Good sirs!
-you are a good girl, Pamela
-good old widow
-good families
-if we are good…(talking abt God)
-rather than forfeit my good name
-good advice
-good character
-that’s my good girl! He exclaimed

Most of these are describing Pamela’s character or are in some way related to remaining a good person or virtuous person.

I also looked at its frequency throughout the book:

The usage of “good” fluctuates throughout the book, but it is relatively the same at the end as it is at the beginning. Could this illustrate that Pamela at the end keeps her virtue, as she is just as good at the end as she was in the beginning?

I would probably need to look at the usage of “good” in other texts, pre-Pamela and post-Pamela, and compare them to Pamela, to really make a claim about its usage in Pamela and whether it can attest to Armstrong’s claims that the inner self (thoughts, feelings, virtue) is becomes identity. But for now, I would say that its frequent usage and its similar usage at the beginning and the end can tell us that the maintenance of virtue is important to Pamela, and consequently, important to the readers of the time.

It is so interesting to me how people’s view of the world has changed throughout time. At least since Homo sapiens have been around, the geographical locations of landmasses haven’t changed much, but humans’ vision of these lands have changed immensely.

The most obvious examples I can think of is Marco Polo’s travels in Asia and his subsequent introducing Europeans this unknown place through his writings, and the discovery of the New World. It reminds me of being very young, before I really understood that there were other places that existed outside the U.S.

I would love to do some kind of meta-analysis on how often texts reference other places, when these places became more varied, and how these trends changed throughout time. This would involve making a map like the one in this exercise for thousands of texts from different centuries. By looking at this particular map, it seems that by the time Robinson Crusoe is published, the average European citizen had a pretty broad awareness of the world. But just the fact that RC is about a man who is trapped on an UNKNOWN island speaks to the possibility that people were very unsure about the finite nature of the world.

This makes me think of relating global awareness with other trends of thought. I remember reading theories of cosmopolitanism by Pheng Cheah, who gives a history of cosmopolitanism. I would be interested in tracing this trend of awareness of the world to the rise of cosmopolitanism. I think we may also compare global awareness to the rise of nationhood. One theory about the rise of nationhood is that nations are partly defined as a place that is different from other places. Nations need other nations to set them apart, so increased global awareness may partly account for this.

I realize I am thinking of a very Euro-centric line of thought, i.e. I am thinking of Europe’s increasing awareness of other places outside Europe throughout time. But perhaps we could do analysis on how different regions came to think of the rest of the world by studying texts from only a specific region. Then we could also compare the different trends from different regions.

A lot of my first thoughts regarding the map we generated with the location data is similar to what was previously mentioned. For one, the map view of the locations, even without the 17 errors, I think really drives home the idea of the sheer number of locations mentioned in the novel. But, contrary to what I had written in my last blog post, the mentions of locations of this book are simply that, just mentions of the locations. Reading further into the novel, we see that Crusoe visits very few of these locations, and is mostly referring to them in talk. Cody's idea that this novel is not globalized but rather globally aware is not only hilarious but really spot on when looking at this map. Furthermore, we can see the colonialist aspects of this book when viewing the distribution of locations of the map. We see a geographical focus on the European, colonial superpowers and then spotty references or visits to their colonies across the West African seaboard and the Americas. Also, the Great River being placed in Thailand made me smile.


Errors and (In) Accuracies
Thankfully, this time around I only had a small amount of difficulty in convincing my computer to follow the directions of the exercise. Once I surpassed that small challenge, I found that the most interesting thing about mapping the results of the NER locations list was the differences in the errors produced by the NER and by Google My Maps. Both programs made their fair share of mistakes in identifying the locations in Robinson Crusoe, but their errors were not of the same nature. As we noted in class, the NER made plenty of mistakes based upon the kind of entities that it considered locations, such as Providence or the Bible. While Google My Maps caught these mistakes and included them in the list of locations that could not be geocoded, it still slipped up in correctly identifying the context of the locations. Google My Maps revealed an obvious US-centric bias, designating Trinidad as a city in Colorado or Amazon as a town in Montana. It intrigued me that the NER committed errors in entity while Google My Maps committed errors in context—a mistake that converged in the case of Providence, Rhode Island.
This notion of Google My Maps’ incorrect preference for US locations was also interesting compared to the World Map of Cruso(e)’s voyages. On the one hand, the Google version is obviously a more accurate map in that it accounts for the hundreds of years of geographical revisions that have occurred since the 19th century. However, in regards to the actual novel, the older map more accurately represented Crusoe’s voyages by tracking his travel routes, instead of pinning down every location mentioned in the book as if Crusoe had visited each place, and by illustrating the Caribbean and European focus of the actual setting of Robinson Crusoe. Like the contrast between the NER and Google My Maps, the contrast between the inaccuracies of these two maps revealed errors in geographical accuracy on the one hand and literary accuracy on the other.

After my long and complicated battle with my PC, I am now finally able to appreciate the interest that the rest of the class has expressed for these lists. The extensive list of locations struck me as the most thought-provoking. Offhand, it seems appropriate that a novel about seafaring adventures would be brimming with references to exciting, far-off places, as the Location list suggests. The list runs the gamut from somewhat everyday locations such as England, Europe, and York, to iconic cities such as Madrid, Jerusalem, and Paris, to a variety of exotic and wild islands and "New World" destinations. However, in spite of this impressive enumeration of foreign destinations, the majority of the novel takes place in a single location. This observation relates to something I have noticed throughout the book: whenever Crusoe ventures to other parts of the island besides his "castle," he describes his endeavors as "going abroad." The common connotation of "going abroad" involves someone traveling to a foreign country, but when Crusoe goes abroad he remains on the same small island, never wandering beyond walking distance from his settlement. The diverse list and the phrase “going abroad” create a contrast between the idea of exotic adventures and the reality of Crusoe's extremely restrained, stationary existence. This disparity is particularly interesting in the context of Robinson Crusoe as a novel that was supposedly meant to show middle class English citizens to a life of extravagant travel and adventure. In spite of this conception, Robinson Crusoe focuses more on the story of one middle class man's incredibly mundane subsistence in a single place for 28 largely monotonous years. Rather than a revolution of middle class life, this seems to me to be an example of those in the middle class being unable to escape their mediocre station in life even in circumstances as extreme as Robinson Crusoe's. The two opposing stories told by the Location list and the single location in which most of the novel is set reveal a contradiction in Robinson Crusoe as an exciting adventure novel and as a somewhat uneventful account of middle class life.

As a first-time user of the NER and someone completely unacquainted with the process(es) of digital data-mining of literature, this was a tedious but ultimately fascinating endeavor. The thing that caught my eye right off the bat was the disparity between the lengths of my PERSON and LOCATION lists, the latter being much longer and more varied than the former. The novel, as we’ve come to know it, is typically a very personal and social work. We usually come to know at least a few characters personally, and personal relations are often the driving force of the narrative. In Robinson Crusoe, this is certainly not the case. There are so many locations ( listed and described in much detail in the novel itself, and its driving force is the relationship between our protagonist and his environs rather than our protagonist and the people he interacts with, which I took as confirmation of my immediate feeling that Defoe’s tale is a rather antisocial or introspective one. I also found that these data echoed our class discussion about how the early novel may have found a role in sating the burgeoning middle class’s desire for travel and worldly excitement.

The “organization” list is the most interesting to me. I think it is the most informative as to what type of book this is, and perhaps gives hints as to what century the book was written, but it’s also the most mysterious list. It is informative about content, because from this list I can see that many types of people and places are mentioned, such as “Spaniards and “Mountain Tenereffe,” so either there is travel involved, or the novel takes place in a cosmopolitan city. I can also get a hint about time this takes place, because words like “Moors” appear, which is a word not very much in usage today unless discussing history. It is also the most mysterious list, because unlike in the other lists, the content is very varied. I don’t think I would identify this as an “organization” list if I did not know. Some components are very interesting--such as “heaven” or “nature.” At first glance, I thought the list resembled a poem, partly because of such words. I think it would be an interesting exercise to give this list to someone who does not know that this list is from Robinson Crusoe, and ask that person to tell us what they can infer from the book based on the list.

"He told me, I might judge of the happiness of this [middle] state; by this one thing, viz., that this was the state of life which all people envied..." pg. 6