Skip to main content

Just Google it

3 min read

Combining the NER’s extraction with what MyMaps can do is pretty productive and a great way of visualizing the information the novel gives us. I appreciated our discussion in class on Monday of how when we’re reading, we don’t notice all of the proper nouns, all of the people and places mentioned — but the NER “flattens” all of this and by creating a list, makes all of the data jump out and seem important in the first place/all equally important. I would love to see what an actually correct MyMaps — so, all of the data from Robinson Crusoe totally cleaned (scrubbed?) and with the correct country info. I also think it would be cool if the location markers could be scaled based on the number of times he mentions it (so you’d have to use some kind of list with the duplicates still in it) — kind of a heat map of references in the book — or if you could size the markers based on how much time he spends in each location he actually visits. You could also use this — if all the data was correct — to mark where all of the material goods he exchanges or mentions come from and get some kind of approximation of European trade routes (or compare the Crusoe map to an actual exchange route map). Anyway, seeing a lot of possibilities for this overall. It’s funny to me that MyMaps can’t understand something like “Cape de Verde islands” or “St. Salvador” or “Havannah” — all of which are revealed as locations through a simple…Google search. I was also particularly amused by Google’s adding in of country data — locating places like Yarmouth in New England was funny. If someone looked at the map — made with imperfect data — and had never read Robinson Crusoe, I don’t think they’d really get a sense of the importance of locations in the novel in terms of with whom/where he trades, where he thinks he is, where he actually is, and where he spends most of his time. The “Farther Adventures” map gives something of a better sense of this. I’m glad I know how to use this tool now and how to link the NER with other tools. OT: I think creative data visualization through maps is especially cool and have lots of examples but this map (midway through the article) of Citibike use from the New Yorker is my favorite interactive one. I have no idea how you would make this but I wish I did/that I could model novels that involve a lot of movement in a similar fashion.


Genre?; reduction vs conversion; what's so great about lists?

3 min read

Sometimes the NER worked really well, and sometimes not as much. I can imagine a range of applications for the tool that would be really compelling, and I’m really looking forward to thinking more deeply about the question of what is gained and what is lost by reducing — or maybe converting is a better term — a novel to a set of data. For me, using the NER on multiple texts either within the same genre or across genres to draw out particular distinctions or similarities would be really productive (obviously you’d need to clean the data pretty carefully if you wanted to draw real conclusions and advance theories about this), since I’m curious what you can tell using these kinds of tools/data about the ways in which texts adhere to, break with, or manipulate generic conventions. In general my experience with the NER was positive and made me feel a) like less of a failure when it comes to computer science after a disastrous CS21 experience, b) like I won’t fall behind in academia since I didn’t know what digital humanities actually consists of, and c) excited about the possibilities of doing exercises like these and seeing what the tools we’re going to use can tell us about novels/their history/the generic narratives associated with the texts.

W/r/t the actual exercise — I was particularly compelled by the NER’s failure when it came to cataloguing money in the book, since the first section of the novel is structured around/deeply concerned with money, status, power, etc. So, if someone who was just looking at the lists rather than reading the book was trying to understand that facet of it, it would be difficult. What relevant information does come through happened for me in the lists of names — there are lots of Biblical references, which aligns with how RC takes up Bible study pretty seriously during his island time — and the lists of dates and times, since these are basically what the novel has as a structure or plot, the passage of time, besides RC’s survival etc. His constant dating is reflected here.

In terms of what we can learn about lists from the book itself — beyond the NER we noted in class and briefly discussed that Crusoe is obsessed with cataloging and listing his belongings and that this might be sort of the framework for his proto-psychology, etc. I would love to explore more why exactly his lists and the tale of his survival is so compelling and creates narrative momentum — is it just the satisfaction of repeated problem-raising followed by closure/solution with material goods?

TL, DR: Great tool with hopefully big applications to generic questions I’m interested in; sometimes works, sometimes doesn’t, but is an exciting new way of converting the book; RC obsessed with cataloguing and listing and it’s somehow interesting, why?


After my long and complicated battle with my PC, I am now finally able to appreciate the interest that the rest of the class has expressed for these lists. The extensive list of locations struck me as the most thought-provoking. Offhand, it seems appropriate that a novel about seafaring adventures would be brimming with references to exciting, far-off places, as the Location list suggests. The list runs the gamut from somewhat everyday locations such as England, Europe, and York, to iconic cities such as Madrid, Jerusalem, and Paris, to a variety of exotic and wild islands and "New World" destinations. However, in spite of this impressive enumeration of foreign destinations, the majority of the novel takes place in a single location. This observation relates to something I have noticed throughout the book: whenever Crusoe ventures to other parts of the island besides his "castle," he describes his endeavors as "going abroad." The common connotation of "going abroad" involves someone traveling to a foreign country, but when Crusoe goes abroad he remains on the same small island, never wandering beyond walking distance from his settlement. The diverse list and the phrase “going abroad” create a contrast between the idea of exotic adventures and the reality of Crusoe's extremely restrained, stationary existence. This disparity is particularly interesting in the context of Robinson Crusoe as a novel that was supposedly meant to show middle class English citizens to a life of extravagant travel and adventure. In spite of this conception, Robinson Crusoe focuses more on the story of one middle class man's incredibly mundane subsistence in a single place for 28 largely monotonous years. Rather than a revolution of middle class life, this seems to me to be an example of those in the middle class being unable to escape their mediocre station in life even in circumstances as extreme as Robinson Crusoe's. The two opposing stories told by the Location list and the single location in which most of the novel is set reveal a contradiction in Robinson Crusoe as an exciting adventure novel and as a somewhat uneventful account of middle class life.

As a first-time user of the NER and someone completely unacquainted with the process(es) of digital data-mining of literature, this was a tedious but ultimately fascinating endeavor. The thing that caught my eye right off the bat was the disparity between the lengths of my PERSON and LOCATION lists, the latter being much longer and more varied than the former. The novel, as we’ve come to know it, is typically a very personal and social work. We usually come to know at least a few characters personally, and personal relations are often the driving force of the narrative. In Robinson Crusoe, this is certainly not the case. There are so many locations ( listed and described in much detail in the novel itself, and its driving force is the relationship between our protagonist and his environs rather than our protagonist and the people he interacts with, which I took as confirmation of my immediate feeling that Defoe’s tale is a rather antisocial or introspective one. I also found that these data echoed our class discussion about how the early novel may have found a role in sating the burgeoning middle class’s desire for travel and worldly excitement.

The “organization” list is the most interesting to me. I think it is the most informative as to what type of book this is, and perhaps gives hints as to what century the book was written, but it’s also the most mysterious list. It is informative about content, because from this list I can see that many types of people and places are mentioned, such as “Spaniards and “Mountain Tenereffe,” so either there is travel involved, or the novel takes place in a cosmopolitan city. I can also get a hint about time this takes place, because words like “Moors” appear, which is a word not very much in usage today unless discussing history. It is also the most mysterious list, because unlike in the other lists, the content is very varied. I don’t think I would identify this as an “organization” list if I did not know. Some components are very interesting--such as “heaven” or “nature.” At first glance, I thought the list resembled a poem, partly because of such words. I think it would be an interesting exercise to give this list to someone who does not know that this list is from Robinson Crusoe, and ask that person to tell us what they can infer from the book based on the list.

I found a listicle on why we're fascinated by lists. Enjoy! #Assignment1

Assignment 1: Dates are weird.

2 min read

First and foremost, I think the experience of using the NER was fascinating. I've always wanted to think about novels in this sort of broad, quantitative way and I'm glad the NER finally allows us to do that.

The list that caught my eye the most was definitely the "DATE" list, because of the way that the formatting and presentation of dates, and thus the reader's sense of time, evolves over the course of the novel. As I pored through the entries (which I assume are presented in TextWrangler in the order in which they appear in Robinson Crusoe), I noticed two things: first, that the specificity of the dates mentioned in the novel experiences a gradual decline from the full month, date, and year (such as entry 5: 1st September 1659) to just the month, day, or even just the season by the end of the novel/list (entries 300 and 335: summer), and second, that Defoe really, really, REALLY likes Fridays for some reason. I guess we all like Fridays in a way, but it seems unusual for Defoe to mention Friday as many times as he did; I count well over 100 counts of the mention of the word "Friday". It's also interesting to note that most of the entries occur in the latter half of the list.

I was particularly drawn to the "DATE" list because, as an aspiring writer of novels, I often find myself having difficulty accurately conveying a sense of time, and deciding how much to explicitly write down versus letting my readers figure out on their own what time period I'm writing in. I enjoyed being able to see how frequently Defoe bothers to tell people what year or even what day of the week it is. I don't think I'd necessarily model my revelation of time after Defoe's in particular, but I'd love to try this exercise with novels by writers that I'm aspiring to write like, such as Chimamanda Ngozi Adichie or Jennifer Eagan.

Questions I'm left with: If it's possible, how do we measure the distance in between two entries on these lists? Why does Defoe like Fridays so much? What other novels can we use the NER for?


Assignment 1: Name & Location

2 min read

Before disclosing my findings from the lists that I ultimately created, it should be known that the technology I used was rather uncooperative throughout the entire process. While some of the difficulties I faced were likely due to the fact that I’m using a Windows device, I must also admit that my computer skills are sub-par at best. However, after roughly two hours of receiving errors and conducting tedious tweaks, I was finally able to produce a series of lists that I hope have at least some semblance to what the exercise intends. Hopefully my future attempts go a bit more smoothly, but until then I’m just going to run with what I’ve got.

Perhaps an appropriate segue, the “money” list caught my attention largely because I believed it to be a result of yet another technical error. This, however, is due to what I can only assume is a lapse in NER’s coding. Oddly enough, the program seems to be mistaking punctuation for money and tagging it under the corresponding category. What also interested me were the “location” entities and, more specifically, how they form such a broad and extensive list of locales. Given that Robinson Crusoe tells the tale of an aspiring adventurer, this observation should be expected. A similar theme carries over to the “person” entities, which feature a variety of non-European names such as “Friday” and “Xury”. In light of this information, it seems that Robinson Crusoe makes a conscious attempt to be as culturally rich and globally expansive as it physically can within the confines of a 250 page novel. For a reader living in 18th century England, I imagine this ambitious globe-trotting tale was a genuine marvel of its time.


Assignment 1: Locations on the World Map

I marked as many of the locations mentioned in the book as I possibly could.

Assignment 1: Locations

2 min read

While I was reading Robinson Crusoe, one of the things that always caught my special attention were places or locations. I’ve always been fascinated with world geography and have traveled a decent amount. I always try to picture a place with reference to its location on the world map. After taking a look at the list of locations in RC that the NER spit out for me, I was impressed with the number of places mentioned in the book.

Robinson Crusoe, even though not being a seasonal traveler, seemed throughout the novel to be someone who knows about the world, and this list reinforces that. The sheer reach of global locations mentioned is quite extensive and included places in Europe, Asia, Africa, North America, South America, and exotic locations such as Cape de Verde, the Caribbean, and Leeward Islands. Often times while reading RC, I came across phrases that weren’t lists, but almost seemed like lists as they contained a number of different locations close together. Just the first paragraph of the novel is an example as he mentions 4 locations in it. Also the NER didn’t recognise Newcastle as a place which is kind of disappointing.

An interesting tread I observed was that it began with locations mainly in Europe, mostly the U.K., then shifted to places in Africa, South America and numerous others, and seems to settle back in Europe, mostly in France. This intrigues me and I wonder if our protagonist is going to settle down somewhere in France by the end of the novel. Only the passage of time and the turning of pages will tell.

P.S. I’m also posting a map of the world in which I marked most of the locations mentioned in the book. Also, working with the NER was as easy as convincing my parents to let me be an English major i.e. not very easy.


Assignment 1: Locations and Organizations

2 min read

According to the NER algorithm, locations and organizations, although different in denotation, are similar in connotation in the sense that they both represent the religious and imperialistic references in the text. Interestingly, locations seem to be more specific to the imperialistic references, while organizations seem to be more specific to the religious references. Notably, there are about five times more lines in locations than in organizations. Although the number of lines should not matter, it may indicate that the focus of the text is more intent on imperialism, rather than religion.

Mentioned in about 300 lines, the locations include Europe, Africa, the Caribbean, and South and Central Americas. The vast number of locations conveys a feeling of enormity for the imperialistic world at that time. However, all these locations are of monotheistic faith. As Christian European powers colonized Africa and the Americas, they sought to spread Christianity, a further reason as to why religion and imperialism seem to be so closely tied in the text. This association between religion and imperialism can be exemplified by the inclusion of "Providence" as a location.

Surprisingly, NER classifies many places into organizations. Organizations include tagged words that relate to monotheism such as "Heaven", "Solomon", or "Temple". This consistency suggests that NER classifies religious references, including religious locations, into organizations. However, there are references to Islam such as "Moors of Sallee", highlighting how Europeans managed to convert much of southern Africa to Christianity, yet unable to convert northern Africa that was mainly Islamic.

My experience with NER was not graceful. I eventually had to re-path Java SE Development Kit into Environment Variables in order for the program to work. Before that revelation, I spent too much time on Window's cmd tool.


Running the Stanford NER on Windows - Album on Imgur

I made a step-by-step guide to making the NER work on Windows. Instructions are in the image descriptions.


Assignment 1: List of Locations

2 min read

After reading through the first good chunk of the novel, I thought I had a solid handle on the message behind the book: adventuring is bad. By this, I mean that the author is trying to reinforce that a person should be happy with what they have in life. Between the author always mentioning how Crusoe should have simply stayed with his lot, how God always strikes him down whenever he tries to push his luck, and his father’s warning about going out to explore the world, Defoe almost seems to depict adventure and the desire to better your own life as negative things. I thought this was a bit ironic, considering that I had always thought of the novel as a book that romanticized exploration and survival. Until I looked at these lists, I had a pretty solid idea about what the book was about. Of course, I was proved wrong. Despite the small list of names, there are nearly 100 different locations mentioned throughout the book. This contrasts very heavily against Crusoe’s father’s ideal of settling down and sticking with what you got, which, as I mentioned above, seemed to be the underlying theme of the novel. Yet this massive list of locations suggests a novel that definitely doesn’t strongly oppose the desire to explore and experience the world. These two things seem to strongly contrast against one another; it’s almost impossible to have both messages within one novel. But maybe this was intentional. At the time, novels were not at all seen in a very positive light. Defoe almost certainly knew that. So perhaps he purposefully drove home the concept of listening to your parents, accepting what you have in life, and listening to God to appease the moral critiques at the time. Yet underneath this message, Defoe creates a character that, despite the difficulties and trauma he goes through, still desires to explore and see the world. Maybe this wasn’t at all intentional; maybe the message changes later on in the book; or maybe I’m just very wrong. Who knows? All I can do for now is continue reading and hope that the next 150 pages or so doesn’t completely undermine this thought.


Assignment 1: Times in Robinson Crusoe

2 min read

I thought that the 'times' list generated by the NER program was rather interesting. The list of unique times gives only "night, last night, evening, afternoon, this morning, morning, this night, midnight, and two hours before midnight". This list seems to be pretty accurate as far as I can tell, though I think it left out times like "within half-an-hour of sunset" and "a little after sunset". The program might have left out other times as well, but I'm not sure if it did or how many. One interesting thing about this list is that the times are a bit vague except for the last two. I think one likely possibility is because the speaker did not know the specific time, or it could be that he did not need to be more specific than that. An interesting thing to do would be to backtrack and see when the specific times were used. The unique list does lose some information that the non-unique list has; namely the frequency with which the times are used. Looking over the list, it looks like night and evening are used quite a bit, and less frequently, morning.

Another interesting thing I noticed is that the program sorts 'Heaven' into the 'Organization' list while it does not seem to be in the 'Location' list. I suppose Heaven could be seen as an organization by referring to God and the angels, but it could also be seen as the place in which they live.