3 min read
By analyzing chapter twenty in Tristram Shandy I wanted to see how the OCR would react in converting French words and accents to text (because I was not, and am still not, certain of the the process an OCR program takes to recognize alphabetical letters in an image, I was not sure if it analyzed the image by word context or by specific letters) and this chapter contained two letters written in French. Using both ABBEY and Google Docs to perform my OCR analysis, I was actually quite impressed. While ABBEY did not always use correct accent marks the words themselves were generally spelled correctly (besides,of course, those words with the odd f shaped s). While there were other small mistakes the OCRs made such as there was a large consistency of a mix-up between letters and numbers, I am mainly interested in how the OCR would work with texts in other languages, if it is largely English based or if it would even work with character based languages like Chinese or non-roman alphabets such as Korean Hangul? For novels (or even just books in general) like Tristram Shandy where there are additional decorations, symbols, pictures, and/or oddities, a digital facsimile can flatten out these special characteristics of the book similar to how the NER and My Maps flattens out the content by presenting to the reader what is literally written rather than the ideas and concepts that are subliminally inserted into our reading. This is further flattened out again in the process of returning the digital to the printed text. While this process allows the novel to be easily distributed it also subtracts the quirkiness, the hidden details of the novel. Yes, technology now allows amazing graphics and symbols to be easily added to a novel in the present time but transitioning these graphics from the eighteenth century from print to digital to print again is not completely possible for a computing program quite yet. It is not easy for a program to interpret abstract symbols into text. How much are we losing in the process of transitioning from working with an image to working with text. What exactly are we losing? Creativity? Individualism? A humanistic quality? OCR programs generally try to find textual context that it can translate but what if what it is trying to interpret textually has a greater purpose? Is it another thing that is lost in translation?