Skip to main content

Assignment 4

1 min read

For this assignment, I chose to OCR chapter 20 of Tristram Shandy. I wanted to see how the software would deal with images that contained French text as well as English. I used a few different OCR softwares for this assignment: the demo versions of FineReader and Prizmo and a couple of free ones I found online. Unfortunately FineReader only let me export the first three pages of chapter 20 and the online free ones only did the first page or so. Prizmo did OCR the French text, but I have nothing to compare it to.

The most common mistake in the OCR versions of the text was that sometimes the letter s was replaced with the letter f or the letter l. I think it's because of the typeface used for s in the images. The FineReader version also messed up on the punctuation a couple of times, inserting ^ and superscripted cc and ee. The Prizmo version, when converting the French text, sometimes replaced letters with symbols like ~ and <<. It also sometimes left out individual letters. This was done when the language was set to French, as the program wasn't good at catching accents when the language was sent to English.

One mistake that could be autocorrected is definitely the use of f when s should have been used. It shouldn't be hard to check when placing an s instead of an f in a word makes more sense.

FineReader did a pretty good job of preserving the spacing and font of the original text, but with the online free OCR software and Prizmo, these aspects of the text were lost.

Even if the spacing and font of the original text is preserved, there's a notable difference between looking at an electronic version of a text (whether it's machine-readable or a digital facsimile) than a physical copy of the book. The former has to be looked at on a screen while the latter can be held and paged through and physically interacted with. I think the most significant thing lost when a physical text is converted to an electronic text is the ability to write on it. Obviously one wouldn't want to write on an original version of the text, but I find that writing on a physical modern edition of the text (underlining parts, making comments and questions) helps me to better understand it. Sometimes I think writing on a text is pretty vital in helping me comprehend it. When a text is converted to an electronic copy, this process is lost or becomes more difficult to do. Another difference between the two is that some people may not want to stare at a screen for long periods of time.

Exercise4