Leonardo Cavalcanti (cavalcanti.indg@gmail.com)
Abril 22th, 2015
The main goal for this project is create a app in shiny serve the suggest next word in a phrase. Especially for mobile device. It'll help the user save time of typing words. For accomplish this task I did the following steps.
This dataset are actually a sample from web, for this reason we try a lot of strategy for process this dataset without sample to reduce it size. It's important to notice that for this project I used a Macbook Pro with 8Gb of memory RAM. This was reallychallenger.
I used three main packages in R to do that. Tm, RWeka and Slam.It to say that some function are better for read file and help to manage this task, for example: DirSource function to direct access file to create a corpus and so on. We created a Corpus with all dataset and clearing without sample it.
Finally I created 4 database, each one with N-gram order (1 up to 4), each of this database have column with words and last one freq of this sequence found at dataset.
The algortith developed follower this main steps:
This project help me a lot, I never had worked with natural language processing or text analytics any kind. Even though this app it's not state of art in NLP field, was very challenging for me.