Character Recognition using R

We’ll using Optical character recognition (OCR). This process of extracting written text from images into machine-encoded text. The package Tesseract brings one of the best open-source OCR engines to R.

1-Defining a function to check if package are installed

is.installed <- function(mypkg) is.element(mypkg, installed.packages()[,1]) 

if(!is.installed("tesseract"))
   install.packages("tesseract")

Load library

library("tesseract")

2-Loading Images and processing OCR

The First Image

Load imagem one and extract characters by OCR with low quality of image

text <- ocr("data/imagemExemplo.png")
cat(text)

## esle é um lexlo exepmlo para Ser
## usado em ocn nz unguzgem n

Take a look at the results. Some words are wrong because the resulution is low.

The Second Image

Load imagem one and extract characters by OCR with low quality of image

text <- ocr("data/imagemExemplo2.png")
cat(text)

## Este é um texto exemplo
## de OCR na Linguagem R

Now, you can load a hand write image. I draw this on Paitbrush with my finger over a touch screen.

Load hand write image

Figure-3 - Image with hand write text

text <- ocr("data/imagemExemplo3.png")
cat(text)

## ESCREVENDO
## Com 0 dedO

The Scientist

Character Recognition using R

Delermando Branquinho Filho

9 de fevereiro de 2017

1-Defining a function to check if package are installed

2-Loading Images and processing OCR