This project was part of the Coursera Data Science Capstone project from Johns Hopkins. In this project, a Natural Langue Processing (NLP) app capable of word prediction and auto-complete was developed (app here).
- A Katz' back-off model based on N-grams was implemented.
- The NLP model was trained and evaluated using textual data from news, blogs, and twitters in English.
- The app was trained and evaluated using only a subset of the original data set (original dataset here).
- Two models were deployed in the app (i.e., "Smaller" & "Larger"), each one with different size and Coverage.
- The app take advantage of the R packages: tm, RWeka, dplyr, ngram, and doParallel.