Introduction

Tesseract is an Optical Character Recognition (OCR) engine; a technology where image text can be converted into plain text. This post introduces OCR via a simple Shiny application.

Shiny Application

This application allows a selected PNG, JPEG, TIFF or PDF file for conversion to plain text. The result can then be highlighted, copy and paste or otherwise be downloaded. Note, the result is often not exactly perfect. It depends on the quality of the image.


Conclusion

Teseraact is powerful OCR engine. While it is far from perfect there are actions that can improve the result; such as pre scaling, enhancing and trimming of the image.