Skip to main content

Exercise 8

2 min read

The corpus metadata we have at our disposal allows for us to ask a lot of interesting questions. For me, it was rather striking the change in title location names that were seen in this dataset. When mapping Robinson Crusoe, we saw a mostly Eurocentric distribution of countries, but now it seems a bit more widespread, particularly in the United States. But, this doesn't allow for a fully exhaustive analysis of the actual places mentioned in each book. Exercise 7 allowed for us to look at topics in the actual text, and we found that countries and characteristics of those countries (imperialism and England or dynasticism and China etc.) were often present. My question is looking into the locations of these novels (and possibly those mentioned) and how they change over time. I predict that early 18th novels like Robinson Crusoe are likely to have a very Eurocentric focus and be accompanied with topics of seafaring and imperialism whereas novels in the later 18th century would likely be more "worldly" and invoke countries and topics far beyond the reach of Europe. Also, it would be interesting to analyze how publication location fits into this. This dataset doesn't include many publication locations outside of London, but maybe with a more comprehensive list or looking further into the 19th century, we could see how novel publications outside of this area deal with location. I would imagine that other locations may focus on areas outside of British empire control, and deal with completely different topics.

To do this, I would use the Google Fusion metadata to tag title nouns, adjectives, and locations that are associated with a country or feature. Likewise, topic modeling of this corpus would reveal prevalent country related topics and what novels they correspond to. In this way, I could pinpoint which novels deal with what countries and topics and then place them in categories by their publication date. Also, adding in other publication locations into this same analysis would give us a bit more diversity. This could then be visualized in 10 or 20 year chunks to give an idea of how location and perception of location changed with respect to time and publication location.