Skip to main content

S = F: In my head and in the computer's

3 min read

By analyzing chapter twenty in Tristram Shandy I wanted to see how the OCR would react in converting French words and accents to text (because I was not, and am still not, certain of the the process an OCR program takes to recognize alphabetical letters in an image, I was not sure if it analyzed the image by word context or by specific letters) and this chapter contained two letters written in French. Using both ABBEY and Google Docs to perform my OCR analysis, I was actually quite impressed. While ABBEY did not always use correct accent marks the words themselves were generally spelled correctly (besides,of course, those words with the odd f shaped s). While there were other small mistakes the OCRs made such as there was a large consistency of a mix-up between letters and numbers, I am mainly interested in how the OCR would work with texts in other languages, if it is largely English based or if it would even work with character based languages like Chinese or non-roman alphabets such as Korean Hangul? For novels (or even just books in general) like Tristram Shandy where there are additional decorations, symbols, pictures, and/or oddities, a digital facsimile can flatten out these special characteristics of the book similar to how the NER and My Maps flattens out the content by presenting to the reader what is literally written rather than the ideas and concepts that are subliminally inserted into our reading. This is further flattened out again in the process of returning the digital to the printed text. While this process allows the novel to be easily distributed it also subtracts the quirkiness, the hidden details of the novel. Yes, technology now allows amazing graphics and symbols to be easily added to a novel in the present time but transitioning these graphics from the eighteenth century from print to digital to print again is not completely possible for a computing program quite yet. It is not easy for a program to interpret abstract symbols into text. How much are we losing in the process of transitioning from working with an image to working with text. What exactly are we losing? Creativity? Individualism? A humanistic quality? OCR programs generally try to find textual context that it can translate but what if what it is trying to interpret textually has a greater purpose? Is it another thing that is lost in translation?

Exercise4

Prizmo, and the Meaning of Mistakes

2 min read

I used Prizmo (after much difficulty trying to download Abbey FineReader Pro onto both a PC and a Mac), and right of the bat, the program warned me that the image I had uploaded was not high enough resolution to decipher with the utmost accuracy. Prizmo ran some mysterious code that increased resolution and then proceeded to recognize maybe every tenth word. One of my favorite mistakes included "Blefling$ on his head," which sounds like a remix of a Fiddler on the Roof song.

One of the most interesting mistakes Prizmo made was failing to translate line by horizontal line, but instead (due to, I think, irregularities in spaces between words), translated some chunks of text vertically, I guess because the words were physically closer together.

Prizmo also had difficulty with italicized text, especially capital letters, and words that were fully capitalized and spaced out, like GEORGE the Second (the G E O R G E being stretched to imply importance?). I was also surprised to see it miss completely some words that had neither spacial differences nor font changes, like the word "gloriously," which it substituted with a string of dots, the number of which matched the correct number of letters in the word. I wonder if it didn't recognize this because the s might have been coded as an "f."

Certainly Prizmo's errors tell us something about the difficulty in translating textual conventions across centuries (although spacing out words to emphasize them is still done on Tumblr). It also brings up questions about the true substance of a text - how many shadows of shadows am I looking at when I read the OCR text, and to what degree is an original copy of the novel one of Plato's "real" puppets? What other acts of translation occur between the world, the author's mind, his pen, or voice, and each subsequent reprinting? What purpose do conventions like spacing, or italics, serve in animating the characters or etching their voices into our minds? For me, the narrator's voice is overly inflected with enthusiasm, like a roller coaster, because of the use of long sentences that contain within them high (italicized) and low (parenthetical) phrases, because I read them to myself in a human voice, but someone else could parse that differently.

Exercise4

Exercise 4

3 min read

So I chose to OCR chapter XXI from volume 1, as it was the most interesting chapter for me to read. The finding from this exercise were interesting, but certainly not in the same regard. The OCR I used was Prizmo, as I had difficulties getting ABBYY to work on my computer. The first interesting thing that I noted about Prizmo is that when it has identified the number of characters in a word, but seems to be having difficulties figuring out the characters, then it just prints dotted lines to indicated the word. For example, “consummation” (printed “confummation”) was OCRed as ••••••••••••. And other than a few humorous recognitions such as changing “Homunculi” to “H0MUNCVtl”, and “Mr. Shandy” to “Mr. 8handy”, Prizmo did a decent job of identifying words. What was intriguing was that sometimes it would pick up words that were carried forward to the next line of text, and sometimes it wouldn’t. For example, “Ho-munculi” was identified as “H0MUNCVtl” but “cha-racters” was picked up as two different words “ch” and “ra&ers”.

Thinking about the second question, I can’t help but be a little perplexed. I used to have a very ardent view about e-books (digital facsimiles) that they are a terrible way to read a book and the only right way is through a physical copy. However, recently I’ve changed my mind and come to be more open to the idea of reading e-books. But, there are always drawbacks. This exercise reminded me that a electronic version of a book isn’t a completely accurate version of a book. The way the book is supposed to be indexed, the calculation of text per page, editing styles, etc cannot be presented properly to the reader through an e-book. For example, the famous black pages of Tristram Shandy are surely to be distorted in some way in an e-book format. The impact just isn’t the same. The exercise also can be linked up with Ramsay’s article. Books have been analysed in a non-computational way for thousands of years, but only now do we have the technical advances of performing computational test that would otherwise have been arduous and tedious. But I still can’t help but wonder if that is the way to go. I’m still not sure what I think about it, and I’m going to need some time to figure it out.

Exercise4

OCRs and I don't mix.

4 min read

I hit some roadblocks when trying to find a good OCR software to use for this assignment. Starting with the 1760 version of Vol. 10 chapter 1, I attempted to upload the PDF of the chapter into Google Docs, but to no avail. The only software applications I was able to access (or that Google would let me access) were the ones directly related to viewing strictly PDFs. I guess Google didn’t think it was a document of text in disguise like I had hoped. I also tried to download the trial version of Abbey FineReader Pro and it insisted that the software was only for Windows computers. But then, miraculously, Prizmo worked for me and I successfully uploaded the text.

While doing this exercise, I did notice that searching the images on ECCO returned surprisingly accurate results. It recognizes the typographically archaic S when searching for things, at least in some cases. Searching for “state” returns the typographically abnormal “state” in the images of Tristram Shandy accurately, which I found quite impressive. It was able to also return accurate results with the word “loss”. However, searching for “stead” in the ECCO did not return any results, even though it is very clearly legible on image 43. Even more interestingly, searching for both “Yorick” and “Torick” will return the name “Yorick” in the ECCO, but the two different searches produce entirely different lists of reference locations. For example, searching “Torick” may return the textual “Yorick” on (a hypothetical) image 57, but searching “Yorick” will return “Yorick” on (a hypothetical) image 43. I have no idea what causes the ECCO to differentiate between the different occurrences of the archaic “S” character or the different printings of “Yorick”, but I’d be curious to learn why.

When it came to actually using Prizmo, I was actually pretty disappointed in its firepower. First, it broke up the pages I chose to use (in this case, page 35) in seemingly random and arbitrary places, sometimes in the middle of sentences, making the OCR’d text very difficult to read. I had to manually adjust and readjust the text boxes so it read as one full page instead of a bunch of awkward blocks of text. Even as a full page, there’s a ton of problems with the text. Some of the most noticeable misfires deal with the text in italics, and I guess understandably so, since those look the least like standard English characters. But in one instance, “Rosinante” becomes “t ofi;,.a. te”, which just isn’t even recognizable. Maybe I’m being a little too much like Simon Cowell here, but I honestly didn’t like using the OCR on Tristram Shandy. The time it takes to go back and fix and decipher all of its little bugs and errors is greater than the time it would have taken me to get through the original text and type it by hand.

Despite my negative experience with an OCR, I can see why they are a pretty useful tool for thinking about almost all novels in ways we previously thought were impossible. I’m sure there are better OCRs out there that can and have gotten through Tristram Shandy and others and made them into super-awesome squeaky-clean digital versions of texts, which we can then use for cool digital humanities stuff like we’re doing here. But in creating these digital texts of early novels, we sort of steamroll them into a format that they weren’t really intended to be consumed in, and we run the risk of losing some of that novel’s meaning that is meant to be conveyed by the physical structure of the pages it’s printed on. I think digital fascimiles are a good intermediary, because they retain some of the key parts of the novel’s physicality that are essential to its meaning while presenting it in an easily accessible, digitized format.

Exercise4

Exercise 4 - Tristram Shandy in the OCR

2 min read

I used ABBYY Pro 12 to look at a couple pages of Chaptter VIII, which I choose because of the various odd typographical things going on there. As expected, the software consistently mistook the old fashion character for "s" as an "f" and stripped down some of the formatting of the original text (the word "HOBBY-HORSES" in the text becomes a confused "HOBBY HoRses"). Also the dashes are a mess in the file I extracted, although I can't say much for the original text either... Overall though the OCR did a good job of capturing the words on the page. I get the impression that the OCR doesn't use any sort of word-predictive algorithm like Google Search or iPhones use, which would definitely help catch those words that are clearly not correct.

I must admit now that seeing Tristram Shandy reduced to an unformatted and misspelt word document in Times New Roman, font size 55, made me sad. It just seemed so out of context. Even if all the errors were corrected I think something wouldn't feel right. The broader question here, I think, is how do we experience a novel like Tristram Shandy? The obvious answer is that we read it - we see words on the page, make connections between them and the story and our lives, and come to some sort of understanding about their meaning. But the experience of reading is as much textual as is visual and tactual. The way words are presented to us has a profound impact on the way in which we interpret them, and any sort of translation technology will affect the way we think about whatever it is that we're reading. Just consider for a moment the two black pages in the Oxford Edition of Tristram Shandy. Obviously turning the page unsuspectingly into that black abyss is a much different experience than scrolling down casually into those same pages. The effect is lost on the reader and it just looks silly. Any transformation of a text from one medium to another must keep this in mind; the same tricks simply won't work, and the work has to be adapted to make sense in the new medium.

Exercise4

Preparing the text/un-preparing methods of reading

3 min read

As someone not very well versed in computer science, it’s difficult to see a way past the technical difficulties of the OCR programs besides just hand-correcting/typing up the words! I think training a program to recognize common words in these novels — maybe using a database of words from novels from the similar time period? — would be the best technical solution. So those are some of the technical issues that we talked about as being sort of solvable, if difficult. I was thinking about autocorrect as tangentially related to this conversation and thought of this article from last year on the history of something which structures and deeply shapes our language and the way in which we relate to one another nowadays, at least in my mind! http://www.wired.com/2014/07/history-of-autocorrect/

We mentioned briefly how there are conceptual problems which may be completely unsolvable in terms of creating machine-readable texts. I think a lot is lost in the conversion to a digital version — or rather there are certain things, such as Tristram’s graphs and squiggles and loops and funny typographical quirks of his book — which can’t be simply translated, fed into a program, corrected slightly, and emerging without losing anything. I’m not sure what you can do here — can you describe the typographical stuff in the book and give it tons of hashtags or some other categorization that would make it easy to locate within the book? OCR programs raise questions of what the function actually is of the stuff, beyond words, contained within a book — or I guess they add another different sort of layer onto the distinctions we draw between content and form and ask us to develop some sort of different literacy or visuality or critical method/framework for reading these in relation to the rest of the text. So what does this model like and what can we say about the relation between the words of the text and its more visual or typographical elements?

I found the digital facsimiles not very difficult to access but extremely frustrating to search! This seems like something it would be great to fix when you want to do big-data collection/research and clarifies to me how difficult it must have been to compile/how useful it is to have Project Gutenberg-style clean copies of these works. It seems that a lot of DH wouldn’t be possible without this digitization. So my other question would be, what is the function of these digital facsimiles before they are extracted/converted into texts? Are they just ebooks, basically? Another way of saying that might be, what is the difference between digital facsimiles and texts, or are they simply a great way to preserve books and condense a ton of physical material into a digital database which one can then draw on to do further work?

Exercise4

Exercise 4: OCR

2 min read

I found this assignment very entertaining. The first OCR program I tried (it got this honor by virtue of being the first Google result—oh, all-powerful Google), www.onlineocr.net, failed in pretty impressive fashion. Here are some small excerpts from the first chapter: 4%.

F

N S

TRISTRY rid_ SH.4A ND 17, Gent.

II4 I t 44;14084 Ir.1... 74 • % r-4 9.11 illr• mkt 47; A

For some reason it began to catch on for a bit: With either my father or my mothcr, or indeed both of them, as they. were in duty both equally bound to it, had minded what they were about -when they begot me; had they duly confalciA. how much depended upon what thcv -were then doing; §--- that not: crIly thc procludic Et of a rational Ecincr was con-k:, , cern'd in it, Dut that pofnbly the before falling apart again:

  • .-1:,7 .11:1: 0 1 4 ,. . .1 F' iTY11:1 n :-‘11 Z.111C1 t.',..:;:fir Cr.li I wonder if its success has to do with the amount of random surrounding shapes. The first page, which the program struggled with, is dotted with stray ink that probably contributed to the confusion. Gdocs was actually fairly successful; beyond a few very strange series of characters (“སྐབླླ་མཟ † : : } -г fё;н А в, г”) and what looks like an attempt at an emoji (2- :), it was more or less spot-on.

I transcribed a lot of interviews this summer for my internship, and this exercise brought back memories of that tedium (believe it or not, recordings of Israeli authors being interviewed in crowded sidewalk cafés are not easy to decipher). Doing that job, I was struck by the subjectivity of my task; there were so many different ways to transcribe speech while staying faithful to the words, and how I punctuated the conversation made a big difference in the tone conveyed. Though the OCR should theoretically stay faithful to the text and punctuation, it seems a similar situation of subjective transcription/translation in which an aspect of the original has been lost. It feels to me more like a summary, even though the words are nearly identical; it can stand in for the plot, but not necessarily the intangibles.

Exercise4

Exercise 4

3 min read

I chose a relatively straightforward couple of pages – Volume I Chapter IX (pages 15 and 16) of the 1761 edition – to OCR with ABBYY Pro 12. The OCR program consistently made some mistakes in deciphering old-fashioned typography and seemed a bit oversensitive to various dots or smudges on the page, so that ABBYY Pro has Tristram signing off his dedication as his Lordship’s “moji bumble fervant.” Some of the program’s misreadings were pretty consistent; for instance, the program consistently misread the old-fashioned S as an F, and the lower-case C with a curlicue as an upper-case C. I would guess that some OCR programs, if not this one, have tools that would allow you to create custom settings like “read all those F-like thingies as S-es,” sort of like Voyant allows you to add custom stopwords, although I confess I couldn’t figure out how to create those settings on ABBYY Pro.

The OCR program also gets confused by Tristram’s extensive use of long dashes, and converts them into short dashes, blank spaces, and bizarre indentations. In cleaning up the text, I tried to replicate the length of the original dashes, but I was only able to achieve an approximation. This conversion and the subsequent arbitrary refurbishing of the punctuation mean that the effect of the original haphazard dashes gets compromised. A punctilious editor might decide to standardize the dashes, losing the slapdash effect of those in 1761 version, where they seem to be almost a mimesis of Tristram Shandy’s broader halting, digressive structure. On the other hand, faithfully trying to replicate the length of the 1761 dashes for my modern clean version encodes the particular way they appeared in one version, implying that exact transcription of the novel at a particular historical moment is more authentic or correct. Considering the proliferation of later reprints and bowdlerized editions, which people read throughout time and which all constitute the cultural phenomenon that is Tristram Shandy, it seems as arbitrary to exactly replicate the 1761 text as to not. This exercise shines a light on the tension between the novel as a physical, written form and the novel as a living cultural event. Since the figure of the author was often hidden, and novels like Pamela and Robinson Crusoe were published under the pretense of being authored by their title characters, my sense is that the authors of proto-novels were more shadowy figures than authors are nowadays. The success of response novels such as Shamela or The Clockmaker’s Outcry against the Author of the Life and Opinions of Tristram Shandy suggest that readers saw the novel’s truth as being mobile with the cast of characters rather than lying with the original authors. In that tradition, a haphazard transcription of the 1761 text does not so much “lose something in translation” as augment the rich history of rewritten, retold, and resold versions of the original novel.

   

Exercise 4 and the Adventure of OCR Technology

3 min read

I chose to look at Chapter 24, the extensive description of Uncle Toby’s “Hobby-Horse.” My OCR experience using Adobe Acrobat was initially very frustrating--the first edition of Tristram Shandy that I found in the ECCO did not translate well using the software, and my end result was a complete mess. I ended up with a mixed bag of text: there were a few converted words and sentences on each page that I was able to edit, but blocks of text turned into small images of the old Tristram Shandy page that I couldn't change, and some background letters weren’t even registered (like the s that looks like an f). When I saved the attempt as a Word document, most of it was unreadable; when I looked at it with Google Docs, I ended up with a 27 page monster of mostly pages of blank space and the same strange mix of Tristram Shandy text and editable letters thrown in for fun.

Luckily, I was able to find a different edition of Tristram Shandy which worked so, so much better (possibly because the pictures of the pages weren’t as slanted? Theorizing exactly why the second version of TS actually worked is an interesting question that I don't know the answer to). This time, Acrobat was able to acknowledge all the text, so my Word document was in readable English and in a form that I could edit. The Google Docs version was only 5 pages, and the mistakes were humorous instead of infuriating. In both versions, I noticed that the OCR didn’t really know what to make of Tristram Shandy’s capitalization of “HOBBY-HORSE” or his italicization of “Toby,” and the s’s were still f’s. The new versions sometimes threw in random symbols before the T in Toby or wrote it like ‘I’, but the word was mostly understandable. “Hobby-Horse” also went through some interesting changes, the most humorous being “Ho n n - v - H oa s”.

While using technology to analyze our text is pretty exciting, extracting text from a digital facsimile of a book loses the level of human interpretation that allows us to correct for the s’s that look like f’s and the differing fonts and sizing of letters. With the various translations -- from novel to digital facsimile to representation in a database to our final level of text extraction -- pieces of the physical whole of the novel are gradually lost, preserving only the words themselves (and sometimes not even that, as our OCR experiments have shown). This can present a reader with a very different reading experience. But since the words are essentially the same, does this different reading experience then influence our interpretation of the text itself?

Exercise4

I used ABBYY and an online OCR that I found through the complicated process of picking the first result that came up in my google search, called onlineocr.net. I looked at chapter 9 and also the ominous (not really) “black pages.” This online OCR was not too wrong. Of course, it didn’t understand the “s” that looks like “f.” The translation of the black pages were pretty great though. Here is a small sample:

I A
1.4 - ler .4 .. .. -= 1 4 .4_24 , _ .44Nn a 411. ill 11. 4.11 -PI - ...r. . r..-%-.... 10.4
.. .
,frt iii-IIPF7-. . tor■ , I.1 le
a -'re. '.11,611._ 4.. - p 1- .. I' . A ikri! 197 a * ' 7 ., d w -. .1 wihrI •' 416APItilF.1*. .014 6 I. : * 1: .4,..
..4.1...4 ■ -roce
.61-1 1 *I v d: di °III 't `I 4* 44 le 1r-04461. ' l'e11 it a t .2 11 11 1'4 , niN P 1r0
". e • . . II ate' ., A I A inn i # Ipg .,p jefr b ii E-

If I were into conspiracies I’d say this was the past trying to talk to us in a now ineligible code, or maybe it’s aliens. Anyway, I also put the black pages into ABBYY and the program processed the info for a few minutes, but in the end it didn’t translate anything. I guess ABBYY knows better.

I also put in chapter 9 into ABBYY. It worked better than the online OCR. ABBYY is often only off by a letter or two. I think it translates based on individual letter recognition, rather whole words. Otherwise, I think ABBYY would try to correct familiar words, much like autocorrect on our phone. Though of course autocorrect is also infamously prone to error.

This assignment made me think about the strange journey of written and printed material and reliability of that material. Before printing, people (monks and the few literate people) would transcribe or copy old texts in order to make a new copy. As I was consulting the picture of the 18th-century text to correct mistakes, I felt a bit like a monk copying old texts into new editions. I thought how maybe it would be easy to make a mistake and maybe change a word or two, maybe some that change the meaning of a sentence, both then and now.

I thought about the ever-increasing interactions between people and machines and text. And I thought about this machine I saw last semester (http://www.jmberlin.de/main/EN/01-Exhibitions/02-Special-Exhibitions/2014/bios_torah.php) at the Berlin Jewish Museum that writes the Torah at the speed a human would. The Torah is traditionally written by a trained scribe, and if the scribe makes a mistake that Torah has to be thrown away and start over.

I couldn’t help but think, when I was correcting my mistakes, about my faithfulness to the original text of Tristram Shandy as somewhat analogous to the novel’s attempt to be faithful to reality.

I think these thoughts were a bit all over the place. TLDR, nothing changes!

Exercise 4

2 min read

I OCR-ed Tristram Shandy with both Google Docs and FineReader, and was surprised to discover that Google did a much better job. Primarily, it was able to interpret the long "s" correctly about 95% of the time. I looked at Chapters II and III, and the only problems I saw were that it added spaces in HOMUNCULUS, probably because the letters were unevenly spaced in the original text, occasionally used "f" for "s" as in "minuteft philofophers", and ignored long dashes. FineReader, on the other hand, didn't correctly identify any of the long "s"s in those two chapters, resulted in sentences like "becaufe it fcattered and dilperfed tlie animal fpirits". In one instance, an "s" became an "l", and in another, an "h" became an "li". It seems possible that Google Docs is using some sort of dictionary to autocorrect words like "fpirits" into "spirits"; either that or it was just more carefully programmed to interpret older texts.

A digital facsimile like this 1759 edition of Tristram Shandy seems to be about as close as one can get to having a book that looks the way it did when it was printed over two hundred years ago. Of course, this isn't quite accurate, since every copy of the book looked a little different, and it is impossible to understand what the book was made out of by looking at it on the internet. But seeing as it seems so close to being authentic, it's interesting that once you extract the text with Google or FineReader, a lot of that authenticity is gone. The letter "s" goes back the way we expect it to be, because there's no such thing as a long "s" in novels today. The long dashes disappear, as do the slight irregularities of spacing and alignment, the italicized proper pronouns, the bracketed numbers. Only a human could really reproduce the text in a digitized format that would keep it true to the style of the original.

Exercise4

Assignment 4: Mass Production and Profit vs. Style and Authenticity

2 min read

I initially tried to use a PDF to Text converter on a random website. Yet, this attempt was useless, as the converter only copy and pasted the image of the PDF into Docx. Next, I used Adobe Acrobat DC to convert the PDF into Text. However, there were a plethora of errors to the point of the text being unreadable. For example, the s’s were being read as f’s. This error is excusable, since the s does look like a f even to the human eye. Additionally, the n’s were being read as 11’s, and the m’s were read as 111’s. Obviously, these mistakes are due to the font and style of 18th century novels. As technology evolved and books became mass produced, readability was favored over style, resulting in the modern fonts. The OCR programs are more programmed to handle modern fonts, leading to simple misinterpretations. Although these mistakes aren’t hard to correct, it is very tedious to fix them by hand. The ideal solution would be to use a self-learning Artificial Intelligence to correct the errors. As more files are processed, this ideal A.I. would increasingly “learn” to OCR the files as accurately as possible, even assimilating the font and style of 18th century novels. Nevertheless, as the current prototypes of A,I, aren’t advanced enough, the next best solution would be to use another program or a specialized plugin to aid in the conversion process.

A digital facsimile is more like a “memory” of the original copy, preserving the plot but discarding the specific details that make the novel unique. Although digital copies are more convenient, they are not tangible, just husks of physical copies. In order to better understand the relation between digital facsimiles and physical copies of books, think of photos. Although they capture the moment of a person’s past, they are lifeless. In the same sense, the font and style of the original book give a unique atmosphere to the audience; the contemporary versions of the text are only revisions of the original and lack authenticity.

Exercise4

Exercise 4

2 min read

Using ABBYY FineReader, I decided to look specifically at Tristam Shandy’s first chapter. After converting the PDF by running it through ABBYY, certain mistakes immediately garnered my attention. Most notable of these errors was the program’s incorrect interpretation of the long s which, as we discussed in class, looks strikingly similar to a lowercase f. Considering that even the human eye may have difficulties making this distinction at first, I’m not shocked that ABBYY mistook the long s for another character. Another striking error I discovered is the program’s interpretation of certain punctuation--specifically Tristam Shady’s use of semicolons. ABBYY converted one semicolon into a an asterisk followed by a comma. In another instance, a semicolon was mistaken for a bullet point followed by a comma. Some other mistakes included numbers being interpreted as letters and the letter o being replaced with the letter c. While these errors are certainly abundant, they weren’t particularly surprising or at all difficult to correct. The cause looming behind almost all of these mistakes is a simple one: the 1760 volume of Tristam Shandy features a font and style that differs from modern counterparts. As a result, it’s unrealistic to expect 21st century software to provide us with a clean interpretation of such an old text.

Taking a step back from the minor errors produced in the digital facsimile, it’s interesting to think about process as a whole. Conducting this exercise shows how the authenticity of an old text can be lost when creating a digital copy. The font of the copy produced by ABBYY looks very modern and crisp compared to the faded and almost parched-looking font of the genuine 1760 version. These findings with ABBYY serve as a reminder that contemporary versions of Tristam Shandy--the one we’re reading in class being no exception--are products of refurbishments and revisions that have accumulated over the centuries.

Exercise4