We’ll using Optical character recognition (OCR). This process of extracting written text from images into machine-encoded text. The package Tesseract brings one of the best open-source OCR engines to R.
is.installed <- function(mypkg) is.element(mypkg, installed.packages()[,1])
if(!is.installed("tesseract"))
install.packages("tesseract")
Load library
library("tesseract")
The First Image
Load imagem one and extract characters by OCR with low quality of image
text <- ocr("data/imagemExemplo.png")
cat(text)
## esle é um lexlo exepmlo para Ser
## usado em ocn nz unguzgem n
Take a look at the results. Some words are wrong because the resulution is low.
The Second Image
Load imagem one and extract characters by OCR with low quality of image
text <- ocr("data/imagemExemplo2.png")
cat(text)
## Este é um texto exemplo
## de OCR na Linguagem R
Now, you can load a hand write image. I draw this on Paitbrush with my finger over a touch screen.
Load hand write image
Figure-3 - Image with hand write text
text <- ocr("data/imagemExemplo3.png")
cat(text)
## ESCREVENDO
## Com 0 dedO