Digitize Notebooks

I need to import sections of the notebook as JPGs, additionally sketches of maps are presented. I will use other methods for other media, etc.

I will try to read the text with the Tessearct package, but I might have trouble with that because the handwriting is messy.

Begin Trial

Load Packages:

library(tesseract)

## Warning: package 'tesseract' was built under R version 3.6.3

library(magick)

## Warning: package 'magick' was built under R version 3.6.2

## Linking to ImageMagick 6.9.9.14
## Enabled features: cairo, freetype, fftw, ghostscript, lcms, pango, rsvg, webp
## Disabled features: fontconfig, x11

import image of text

NoteBook1Pg2 <- image_read(path = "C:/Users/Owner/Pictures/BoothBookPg1And2.jpg")

view the image

plot(NoteBook1Pg2)

I want to isolate page 2. The right of the book’s spine.

NoteBook1Pg2

croppedImage <- image_crop(NoteBook1Pg2, geometry = geometry_area(x_off = 870))
plot(croppedImage)

I have page 2. now I want to read it’s text.

clean and enhance the image

myImage <- image_convert(image = croppedImage, type = "grayscale")
plot(myImage)

myImage2 <- image_trim(image = myImage, fuzz = 30)
plot(myImage2)

now attempt to read the text

MyText <- ocr(image = myImage2, engine = tesseract())
cat(MyText)

## f [Shek Hatha mprarel nS"
## ik II l e neler ec th) >
## eet wer
## | Rrambrtatn tht Sur Giles Cre Pile “gale
## ee He ee ec . H, D2
## t. Ath, Wel the Incl krel
## Y Safer iit
## Bele Wak Fie, Rod’:
## ' Wet fy Nennaher. tar RA
## | rh eA
## Satin. . ba ckaok.. 4
## | he 6
## | Fh. Dah Pal
## ficuny Int ton KeWorbteck ©)
## als thgh fiat poplar te eel
## frdice Colt in Ke Uret Sua dnl
## | ee ihe. fh isl,
## | tt dre, Arka , etetrkd Mey
## pet! of [oper
## \ Sedo oe re bebe Hi Behe Re on ctr

Digitize Notebooks

M.G. Barclay

5/26/2020

Begin Trial