Deprecated: __autoload() is deprecated, use spl_autoload_register() instead in /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php on line 17

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 22

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 23

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 25

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 26

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 27

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 28

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 29

Warning: session_set_save_handler(): Cannot change save handler when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Data/AbstractSQL.php on line 86

Warning: session_name(): Cannot change session name when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 45

Warning: session_start(): Cannot start session when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 46

Warning: session_cache_limiter(): Cannot change cache limiter when headers already sent in /home/rachelsa/public_html/rise2015/Idno/Core/Session.php on line 47

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 57

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 58

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Webmention.php on line 376

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/Webmention.php on line 377

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/PubSubHubbub.php on line 41

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Core/PubSubHubbub.php on line 42

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/Idno/Common/Page.php on line 998

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/external/bonita/includes/Bonita/Templates.php on line 170

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/templates/default/shell.tpl.php on line 5

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/public_html/rise2015/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/public_html/rise2015/templates/default/shell.tpl.php on line 6
Anna Gonzales
Skip to main content

Exercise 8: Narrative form + artistic representation

1 min read

In thinking about topic modeling and metadata for a group of 18th century novels, a particular challenge for me seems to be figuring out a question to explore using these methods which is neither so large that its answer would not be particularly interesting or meaningful nor so small that it is not much different from close reading. So, in trying to both narrow down and expand my interests in genre, form, etc. I really like the idea of combining algorithmic criticism with “middle-distance reading” to find something out about a group of novels that traditional literary criticism — with its dichotomous close and distant reading — can’t really do. I would like to take a group of novels written in the third person OR in free indirect discourse and look at the topics related to or involving artistic representation in the form of writing — so mentions of books, novels, poetry, other kinds of writing — to see if any commonalities pop up and if there might some kind of relationship to be extrapolated or theorized here about narrative form and the presence of narrative artistic representations in the novels.

Exercise8

Stuck!

6 min read

David Blei’s article in the topic modeling issue of the JDH helped me focus my thoughts after messing around with the topic modeling software for a while. I wish that I had a more complete understanding of the math at work behind the software (or math past the beginnings of AP Calculus at all, actually…) because I think that would really help me to focus the tool of topic modeling towards accomplishing my goals. This is actually what I’m grappling with in thinking through the implications of topic modeling for criticism and reading practices, and I am finding myself challenged by seeing not only the limitations but also the affordances of this software. Specifically I guess I am trying to take Blei’s article, my own feelings about the topic modeling exercise, and Stephen Ramsay’s piece, and put them into a productive conversation or make them help me reconcile some questions I have about close-reading vs. computer-assisted approaches.

Basically, I had difficulty approaching topic modeling as something I could use for anything other than “proving” my pre-existing conceptions about a book or a corpus. This feels as though it shouldn’t be the point — as Ramsay and Blei both explain, the point of using computer-assisted approaches like topic modeling in order to produce criticism is not to confirm preexisting assumptions or provide “proof” for a theory. Blei writes of the humanist scholar using topic modeling, “She discovers that her model falls short in several ways. She revises and repeats,” and Ramsay similarly points out that the end goal of literary criticism is not really an end goal at all, but merely a desire to open up further and more interesting and more productive conversations about texts, conversations which can be jump-started and supplemented by the implications of digital approaches. So, if the point isn’t to prove our pre-existing theories about corpuses or books, then how do we let topic modeling open up new questions for us?

I guess that the particular issue with which I’m struggling is the basic assumption of topic modeling, that texts are composed out of topics, when, as we discussed in class, the topics are actually created out of the texts. The loops of this are complicated enough — but it seems as though the dominant idea behind topic modeling that counting these bags of words can actually tell us something significant about the corpus as a whole or about individual books. I can’t figure out how to reconcile this with what I feel like is a basic assumption of literary theory: that close reading, by humans, is the answer to creating good criticism, that we are sort of looking for the “figure in the carpet,” which is both in the text and on the page but not plainly comprehensible or countable (though the conception of literary criticism in James’ story is odd, as it refers to Corvick’s barely-begun work on Vereker as “the last word,” as though a critic could or would want to write a “last word” on any work of fiction actually worth reading and discussing…). This exercise brought up some of the same ideas we discussed when using the NER, of a flattening of data. As the software doesn’t know anything about what the words actually mean and thinks of them as numbers, it makes them all appear equally important (despite its probabilistic structure). It assumes that at least some of these collections of words are significant in some way, or that we can adjust the settings on the software enough to create significant meaning out of these topics, or that we can achieve Ramsay’s goal of having productive new discussions using these topics/that Blei’s ideal humanist can be pushed to revise. I don’t want to fall into the intentional fallacy, but it seems like the very point of reading, at least after interpretation and basic comprehension become in some ways the same thing in novels (as we’ve marked this shift in James), is that all words — even after one eliminates stop words — in a text are not created equal and do not have equal significance. This isn’t to say that one can’t still use topic modeling in a productive way and I hope I’m not just uselessly handwringing here — it’s just to question some of the basic assumptions that go into creating the approach in order to pay attention to these as we try to build a criticism out of these technologies.

To try to pull all of what I am saying into a more coherent and final expression: my main question is if topic modeling and other approaches like these represent just a digital form of close reading, one of which humans are incapable, or if topic modeling and similar approaches are completely at odds with close reading practices and assume that meaning is actually machine-readable in some way. I guess my real concern is that the line critics must walk seems extremely fine — that is to say, distinguishing between Ramsay’s conversation-opening function of computer-assisted approaches and the more hunt-down-evidence ideas I can’t seem to free myself of while using this software/which Blei seems to somewhat reference, writing that he hopes topic modeling will “point us to such evidence” for the ideal humanist scholar. How do we/I deal with/combat the basic assumptions built into topic modeling in order to make it generative and useful as a tool? How have others done this? What is the nature of the criticism that can come out of using topic modeling? I don’t want to set up a straw man of humanists not reading and just using computer-assisted approaches, but I want to be careful that I am still human-reading and close-reading my books while using these technologies, treating them as supplementary methods rather than the end of human reading. Of course I don’t think anyone is doing this — but I am hesitant or at a total loss, really, to see what I could say about the topics I got out of this exercise having not read the books in the corpus!!

topicmodeling

Tiny crafts + GIFs + soundtrack for "Pupil of Pleasure"

4 min read

In my experimental bibliography, I hope to materialize and thematize some of the concerns I have had throughout the semester about the seemingly intractable conceptual problems of creating digital facsimiles and representations of books. I want my project to create an affective reaction in the viewer or user of the project, who will feel the same intense desire I feel to touch the physical books bibliographic data and digital facsimiles attempt to capture; I want to touch on the inaccessibility to the uninitiated of bibliographic data; and I want my work to function in the space created by bibliographic data which is both deeply machine-oriented and deeply humanistic.

I asked Nora Battelle, who worked on the END this summer, about her experience with the bibliographic data she collected, since I felt my project was missing something. She explained to me that all of the data is extremely personalized and that everyone’s cataloguing records look different, as our traditional descriptive bibliographies in this class vary in style. Though the records were all the same in terms of all being machine-readable and all going to serve the same purpose, Battelle explained, each cataloguer has their own style and the types of things each cataloguer chose to notate were different. Though the aim of collecting the data is to capture all of the information about the book, that is simply not possible and cataloguers will all record very different things. Battelle also mentioned that each cataloguer is so distinctive that database users can tell that certain entries were written by the same person. Though it is a database, a collection of digital information that can be read by a computer, it is also discursive and personal and irrepressibly human. That is incredibly exciting to me and something I hope my project will reflect.

My ideal and first-draft conception of the project, then, is as follows: I will create a small (fits in one hand) representation of each category of bibliographic data the traditional descriptive bibliography collects — format, title page, notes, etc. (I imagine these like little sculptures, with, for instance, the format representation a sculpture of a sheet of paper folded a certain number of times). I will then film someone’s hand playing with each small sculpture. I imagine that watching these short videos (hopefully GIFs?), the viewer will want to touch the objects, much as I strongly desire to hold, gaze upon, and flip through the real physical objects represented by digital facsimiles in the ECCO. I also hope that presenting these small sculptures of each category without any context will make the viewer feel slightly alienated or without an understanding of the categories, in order to recreate the feeling of inaccessibility I get when looking at traditional descriptive bibliographic data. A separate set of audio files, which ideally would be only playable once the viewer has watched the GIFs, or which would run out of order and at random throughout the viewer’s watching of the GIFs, would have people describing — first using the traditional bibliographic descriptive language and then speaking in personal and discursive ways — the book. This will capture the essential humanity and personalization of bibliographic description.

I am very unsure of my ability to technically make all of these things happen, since I am very bad at crafting, especially on a very small scale and I have zero knowledge of filmmaking or editing. But, if i figured out a way to make all of these things happen, it could be extremely cool. My final project might not end up materializing all of this, but I will definitely try and make use of my friends with artistic and digital skills, if that’s allowed.

experimentalbibliography

2 min read

Descriptive Bibliography

Melmoth, Courtney. The Pupil of Pleasure. 2nd ed. London: G. Robinson & J. Bew, 1777.

[handwritten Ann Walton, ? mm] | [illegible handwriting, ? mm] | THE | PUPIL OF PLEASURE | By COURTNEY MELMOTH. [illegible handwriting, ? mm] | THE SECOND EDITION, | CORRECTED, AND IMPROVED. | [Picture, ? mm] | He left me to my God. | LONDON, | Printed for G. ROBINSON, and J. BEW, | in Pater-Noster-Row. | 1777.

Collation

2mo. Vol. I . A3, B-I6, K-L6 (is this still right if there are many blank pages in between each signature?) Vol. II B-D5, E6, F-H5, I6, K-L6, M3 In Vol. 2, some pages without signatures signed with either 2, 4, I, in same order every time. What is this? Why gatherings of 5? Is this same format as Vol. 1? Why no J signature in either volume? Not a letter printing press could handle??

Contents

Vol. I: 1r: Title. 1V-xiv: Preface, page with indistinguishable signature (verso of xiv?): Advertisement, B1r: Half-title. B1r-p. 239 (no signature, signatures end with L6 some pages before and then are blank for a while, how do I refer to this???): Text Vol. II: 1r: Title. B1r: Half-title. B1r-p. 252 (just some illegible handwriting on here): Text

Pagination

I 239p: II 252 p.

Notes

Epistolary form. Sourced from the British Library. Digital facsimile retrieved from Eighteenth Century Collections Online. “Second edition, corrected and improved.” Says “4 DE 58” in what I think is a stamp but could be handwriting right above the “FINIS” on the last page of Vol 2. Has some things that could be either illegible marginalia or splotches of ink that got into machine as digital facsimile was being made?

Unrelated bonus Moonstone video game thing

Moonstone graphic

Video game

The French Adventure Travel World History Tale of Mr. Don Lady Character

Visualizing this metadata is super exciting — it seems like there are huge comparative possibilities here, in terms of working with a giant set of information about early novels. Having explored the END website, the existing data visualizations, such as the publisher network, seem like excellent uses of the data, and it also looks like you could combine all of the categories in a staggering variety of permutations to ask and work towards answers of many different research questions. While playing with the data — specifically, visualizing formats with a pie chart — what became fascinating to me was the way in which the metadata draws together a huge variety of literary critical questions about material conditions of publication, reception history, authorial intent, generic categorization and formal analysis. The metadata collected brings up all of these questions for me: what kind of constraints or possibilities were authors working with as they wrote their books for publication, how might material conditions come to bear upon the texts themselves, what do we make of the commonality of certain words in titles, and on and on.

Considering larger things about visualizing and analyzing metadata, I think Ramsay’s article is quite helpful here — I was initially caught up by the fact that this is a small selection of an incomplete set of data (incomplete because we don’t know how many novels/have access to every novel printed, read, etc. in the eighteenth century, for instance) and that drawing conclusions (after properly calculating statistical significance) wouldn’t be possible since this isn’t a random sample or a full set of data. But Ramsay’s point, that our interest as literary critics is not to prove things 100% and rather to open up interesting possibilities and questions, and that digital approaches to texts can assist mightily with this endeavor, clarifies that this is not as big of a problem and doesn’t prevent us from generating criticism, asking/beginning to answer fascinating questions about texts, having new conversations about books, etc.

What specifically excites me is the way in which the metadata provides a way in which to engage with my interest in genres. To pick just one question I thought of while looking at the data — making a word cloud of the TitleNouns category specifically — I’d really like to compare the information from the title pages in order to see if generic assumptions can be supported or challenged with this information (as Ramsay does with the arguments about gender in the criticism surrounding Woolf’s To the Lighthouse). Based on that title information, one might be able to question whether our generic categories are broad or varied or specific enough to actually enhance our understanding of novels, and from there work towards an understanding of one specific dynamic of the relationship between literary and material form.

Evelina in 1778...or, EVELINA!!!!!!!

6 min read

BIBLIOGRAPHY

The implications of the bibliography in terms of enhancing our understanding of genres and the literary history of the period are really compelling for me. As stated in the introduction, the bibliography is founded on the assumption that a complete set of data (as complete as can be, I guess) can tell us what a novel was in the late eighteenth/early nineteenth centuries. What does this mean? The terms of this begin to be worked out in the introduction and become more clear looking at the actual data set/bibliography. The strengths of bibliographic data in terms of allowing users to make comparisons across novels and to “provide a basis for an assessment of the work of all novelists in the period” is particularly fascinating. What kinds of large conclusions can be drawn from this data and what are its limits? I’m curious about how much the difficult editorial decisions which are referred to in the introduction complicate the process of creating a usable and comprehensive bibliography (and what does this lead us to ask about the subjective forces that go into creating a supposedly complete database in other disciplines/formats)? I’m also unclear on how bibliographies might challenge, refine, and revise existing canons and histories or I guess what is at stake in claiming the creation of bibliographies represents an act of restoration or recovery of lost history. What also sticks out to me is a sort of flattening of all of the novels included in the bibliography — they all come to seem not equally important but, through their inclusion, on something of the same level, if that makes sense (I am thinking of our discussion of flattening of the importance of all of the references we extracted from Robinson Crusoe with the NER).

In terms of the bibliographic data on the year 1778, there is an extraordinary amount of information you can get just from using this data. Particularly notable to me was the high number of anonymous authors — this stands out because when I picture the rack at today’s bookstore, the author’s name is often the same size or larger than the title…this raises a lot of questions for me about the stakes or meanings of authorship in the novel in 1778. I also noted the way in which novels are still making claims to truth (based on a true story, a sketch from nature, founded on real facts) in 1778 and wonder what this scaffolding of sincerity might mean for these novels. The criticism noted in the bibliography is also pretty unequivocally negative and unflattering, encouraging readers for the most part not to peruse the novels reviewed. Three frequent critiques: that the writing in the books is unclear, that they provide bad moral examples, and that the incidents are not interesting/various enough (i.e. book does not provide enough entertainment value). This seems to reveal that critics had some sense of what to expect from a novel in terms of style, entertainment, and moral value and that novels were evaluated based on these standards. It seems from this data that you could break the novels in 1778 down into “examples” or “instructions,” memoirs, histories, diaries, or accounts of travel.

ECCO

After comparing Evelina to other books in ECCO from the bibliography its title and prefatory material feel as though they stick out even more. We discussed in class briefly the way in which her name comes first and is the title or stands for the rest of the book — what is particularly notable about this as well to me is that Burney does not claim moral instruction in the book or that it will be a history or an example. She actually seems to resist this kind of categorization by sticking to the descriptive yet vague “A Young Lady’s Entrance into the World. In a Series of Letters.” This gives us the “what” and the form — we know that this will be about a young lady entering into the world and that it will be epistolary but we don’t know if it will have a more morally instructive or a more entertainment focus, whether it will be a satire…anything! This is not “The Life and Opinions” or “The Life and Strange Surprising Adventures” — it is “Evelina,” just as “Pamela” goes with its heroine’s name. To me this opens up a lot of questions and possibilities in terms of what the novel wants to say about female subjectivity and about the importance of naming and names to its narrative. It’s unclear to me where Burney thought the book fit in in terms of hierarchies of prestige, taste, etc. — the prefatory material, dedication, etc. is weirdly self-deprecating and apologetic, which feels deeply gendered to me even though the authorship began anonymous. I’m curious if Burney was trying to fit into what she saw as a tradition of novel-writing, how she might have conceptualized of the gendered dimensions of this field, whether her purpose was fame or fortune or moral instruction or entertainment in writing the novel…or, rather than providing definitive answers to these questions, I would wonder how the novel might succeed or fail at all of these things and what the implications of this are for the literary history of the period. Also after looking more at the bibliographic data I’m still curious about why Evelina was so much more successful and well-reviewed than pretty much any of these other novels — despite the fact that the reviewers note some inconsistency in terms of content (they point up the ways in which Captain Mirvan does not seem like he belongs in the book at all, like he doesn’t know he’s in Burney’s novel almost…).

ARTEMIS

There are some clear problems/limitations with Artemis but also some really exciting implications! I am thinking of Ramsey’s article and trying to focus on what Artemis opens up. The necessarily holistic nature of computing is really on Artemis’ side here — the subject indexing can achieve so much that humans themselves could only do with a lot of time and a lot of very painstaking work. I guess I’m curious about how one would make sure the subject indexing is both useful/relevant and also inclusive enough (how you determine the scope of which words are associated, in other words). Term frequency is also interesting. The question is how one would make the jump from term frequency and term clusters into criticism — which is where Ramsey’s formulation of criticism’s function in terms of opening up further possibilities for discussion rather than providing conclusions comes in handy. These kinds of tools could easily answer the types of questions — and create new questions — Ramsey raises with regards to To The Lighthouse. Cool! (OT sort of: I’m also thinking of this extremely funny article about one of my favorite books: http://thehairpin.com/2012/04/unsettling-repetitions-in-contemporary-literature-the-art-of-fielding/)

Exercise5

Preparing the text/un-preparing methods of reading

3 min read

As someone not very well versed in computer science, it’s difficult to see a way past the technical difficulties of the OCR programs besides just hand-correcting/typing up the words! I think training a program to recognize common words in these novels — maybe using a database of words from novels from the similar time period? — would be the best technical solution. So those are some of the technical issues that we talked about as being sort of solvable, if difficult. I was thinking about autocorrect as tangentially related to this conversation and thought of this article from last year on the history of something which structures and deeply shapes our language and the way in which we relate to one another nowadays, at least in my mind! http://www.wired.com/2014/07/history-of-autocorrect/

We mentioned briefly how there are conceptual problems which may be completely unsolvable in terms of creating machine-readable texts. I think a lot is lost in the conversion to a digital version — or rather there are certain things, such as Tristram’s graphs and squiggles and loops and funny typographical quirks of his book — which can’t be simply translated, fed into a program, corrected slightly, and emerging without losing anything. I’m not sure what you can do here — can you describe the typographical stuff in the book and give it tons of hashtags or some other categorization that would make it easy to locate within the book? OCR programs raise questions of what the function actually is of the stuff, beyond words, contained within a book — or I guess they add another different sort of layer onto the distinctions we draw between content and form and ask us to develop some sort of different literacy or visuality or critical method/framework for reading these in relation to the rest of the text. So what does this model like and what can we say about the relation between the words of the text and its more visual or typographical elements?

I found the digital facsimiles not very difficult to access but extremely frustrating to search! This seems like something it would be great to fix when you want to do big-data collection/research and clarifies to me how difficult it must have been to compile/how useful it is to have Project Gutenberg-style clean copies of these works. It seems that a lot of DH wouldn’t be possible without this digitization. So my other question would be, what is the function of these digital facsimiles before they are extracted/converted into texts? Are they just ebooks, basically? Another way of saying that might be, what is the difference between digital facsimiles and texts, or are they simply a great way to preserve books and condense a ton of physical material into a digital database which one can then draw on to do further work?

Exercise4

"And why"

4 min read

Richardson’s indexing: I would love to spend more time digging into this and figuring out what exactly Richardson thought he was doing — or I guess what kind of work the index is doing, since I don’t want to get into some kind of intentional fallacy trap. It was compelling to me to see what information or parts of the book Richardson believes are important and will contribute to an “easy and clear view” of the book. It’s almost as though he is puffing or blurbing himself by pointing to the locations in the book which might be most interesting — though it’s hard to know if he is acknowledging that some of us might not want to suffer through all of the pages or just might not have access to the entire text, or that he would be attempting to transmit some kind of vision/message to those who couldn’t access the full version? Indexing has to be considered some kind of technology here — he is doing work on the text and reducing it/transforming it into something distinct from the proper text. The difference in style stands out to me in particular — it’s impossible to imagine that Richardson could be so brief…also, is he the voice narrating this index or giving it to the reader? That could definitely account for the shift since this is no longer epistolary but describing the form and the content of his epistolary novel…complicated…it’s difficult to talk about this without more knowledge about the material conditions at play in the whole Pamela production. I also think it’s interesting that he often says “Pamela feels so and so way, or does so and so thing, ‘and why,’” — the why is kind of what’s important here, which lends credence to the idea that Pamela’s subjectivity is fully realized here, that she has a deep interiority.

Exercise: I love the word cloud feature/the general funny mid-2000s aesthetic of this website…I also think the stop word feature is incredibly useful. However, I’m having trouble drawing a lot of conclusions from the Voyant visualization — once I eliminate all of the words which don’t seem very interesting to me, how significant are the remaining words (statistically, I mean? Is this even calculable?)? Does it mean anything that I can pick out “master” and “poor” and “I” and “Pamela” once I cut all the words I think are boring? Or other, even more infrequently-occuring words? In other words, how applicable are the insights from this tool? What does critical work look like that uses these technologies to make arguments? In general though, I do see some really big applications for this — I spent some time unsuccessfully looking for something we looked at in one of my Shakespeare classes, which was a sort of visualization of the similarity in word choices between Shakespeare’s plays. We discussed this in the context of authorship and the class came to a sort of conclusion that it didn’t exactly matter whether or not Shakespeare wrote all of his plays, since we study them as though they are all by the same person…How does DH work fit into established canonicity notions like these? I”d love to think through its applications for GSST work specifically/read some of this work. Checked out the Journal of Digital Humanities and some other related sources and they all look really cool.

Challenge: There are plenty of applications/challenges to the Armstrong selections we read to be made using the Voyant text. Her contention that the novel evolves a new female subject — that it creates a form and disseminates it into the world — and that this subject’s value is based on her “essential qualities of mind” could certainly be mapped onto the usage of “know,” “think,” “feel,” etc. used in Pamela — maybe especially if charted against the usage in other books. This seems time-consuming but particularly interesting. With more time it would also be great to pick out all of the gender-related words and compare them to the instances of class-related words — this could lend support to or push back against Armstrong’s argument that differences of class/political issues are subsumed/subordinated to issues of gender. Maybe a collection of words could also lend credence to the ideas we’ve been discussing that these issues of gender are resolved at the end of novels…this would be fun to do with basically any novel involving a marriage plot! (Is fun the word? I don’t know.)

indexing

Just Google it

3 min read

Combining the NER’s extraction with what MyMaps can do is pretty productive and a great way of visualizing the information the novel gives us. I appreciated our discussion in class on Monday of how when we’re reading, we don’t notice all of the proper nouns, all of the people and places mentioned — but the NER “flattens” all of this and by creating a list, makes all of the data jump out and seem important in the first place/all equally important. I would love to see what an actually correct MyMaps — so, all of the data from Robinson Crusoe totally cleaned (scrubbed?) and with the correct country info. I also think it would be cool if the location markers could be scaled based on the number of times he mentions it (so you’d have to use some kind of list with the duplicates still in it) — kind of a heat map of references in the book — or if you could size the markers based on how much time he spends in each location he actually visits. You could also use this — if all the data was correct — to mark where all of the material goods he exchanges or mentions come from and get some kind of approximation of European trade routes (or compare the Crusoe map to an actual exchange route map). Anyway, seeing a lot of possibilities for this overall. It’s funny to me that MyMaps can’t understand something like “Cape de Verde islands” or “St. Salvador” or “Havannah” — all of which are revealed as locations through a simple…Google search. I was also particularly amused by Google’s adding in of country data — locating places like Yarmouth in New England was funny. If someone looked at the map — made with imperfect data — and had never read Robinson Crusoe, I don’t think they’d really get a sense of the importance of locations in the novel in terms of with whom/where he trades, where he thinks he is, where he actually is, and where he spends most of his time. The “Farther Adventures” map gives something of a better sense of this. I’m glad I know how to use this tool now and how to link the NER with other tools. OT: I think creative data visualization through maps is especially cool and have lots of examples but this map (midway through the article) of Citibike use from the New Yorker is my favorite interactive one. I have no idea how you would make this but I wish I did/that I could model novels that involve a lot of movement in a similar fashion. http://www.newyorker.com/news/news-desk/interactive-a-month-of-citi-bike

RobinsonCrusoe

Genre?; reduction vs conversion; what's so great about lists?

3 min read

Sometimes the NER worked really well, and sometimes not as much. I can imagine a range of applications for the tool that would be really compelling, and I’m really looking forward to thinking more deeply about the question of what is gained and what is lost by reducing — or maybe converting is a better term — a novel to a set of data. For me, using the NER on multiple texts either within the same genre or across genres to draw out particular distinctions or similarities would be really productive (obviously you’d need to clean the data pretty carefully if you wanted to draw real conclusions and advance theories about this), since I’m curious what you can tell using these kinds of tools/data about the ways in which texts adhere to, break with, or manipulate generic conventions. In general my experience with the NER was positive and made me feel a) like less of a failure when it comes to computer science after a disastrous CS21 experience, b) like I won’t fall behind in academia since I didn’t know what digital humanities actually consists of, and c) excited about the possibilities of doing exercises like these and seeing what the tools we’re going to use can tell us about novels/their history/the generic narratives associated with the texts.

W/r/t the actual exercise — I was particularly compelled by the NER’s failure when it came to cataloguing money in the book, since the first section of the novel is structured around/deeply concerned with money, status, power, etc. So, if someone who was just looking at the lists rather than reading the book was trying to understand that facet of it, it would be difficult. What relevant information does come through happened for me in the lists of names — there are lots of Biblical references, which aligns with how RC takes up Bible study pretty seriously during his island time — and the lists of dates and times, since these are basically what the novel has as a structure or plot, the passage of time, besides RC’s survival etc. His constant dating is reflected here.

In terms of what we can learn about lists from the book itself — beyond the NER we noted in class and briefly discussed that Crusoe is obsessed with cataloging and listing his belongings and that this might be sort of the framework for his proto-psychology, etc. I would love to explore more why exactly his lists and the tale of his survival is so compelling and creates narrative momentum — is it just the satisfaction of repeated problem-raising followed by closure/solution with material goods?

TL, DR: Great tool with hopefully big applications to generic questions I’m interested in; sometimes works, sometimes doesn’t, but is an exciting new way of converting the book; RC obsessed with cataloguing and listing and it’s somehow interesting, why?

robinsoncrusoe