We’ll using Optical character recognition (OCR). This process of extracting written text from images into machine-encoded text. The package Tesseract brings one of the best open-source OCR engines to R.

1-Defining a function to check if package are installed

is.installed <- function(mypkg) is.element(mypkg, installed.packages()[,1]) 

if(!is.installed("tesseract"))
   install.packages("tesseract")


Load library

library("tesseract")


2-Loading Images and processing OCR

The First Image

Figure-1 - Image with low resolution Text

Load imagem one and extract characters by OCR with low quality of image

text <- ocr("data/imagemExemplo.png")
cat(text)
## esle é um lexlo exepmlo para Ser
## usado em ocn nz unguzgem n


Take a look at the results. Some words are wrong because the resulution is low.

The Second Image

Figure-2 - Image with low resolution Text

Load imagem one and extract characters by OCR with low quality of image

text <- ocr("data/imagemExemplo2.png")
cat(text)
## Este é um texto exemplo
## de OCR na Linguagem R


Now, you can load a hand write image. I draw this on Paitbrush with my finger over a touch screen.

Load hand write image

Figure-3 - Image with hand write text

Figure-3 - Image with hand write text

text <- ocr("data/imagemExemplo3.png")
cat(text)
## ESCREVENDO
## Com 0 dedO


The Scientist