Donato Scarano
23/07/2018
Build a predictive algorithm based on NLP (Natural Language Processing).
Build a Shiny App that integrate the predictive text model and algorithm.The app will accept a text input and will predict the next word.
Swiftkey technology is used for this project emulating the way the prediction work on SwiftKey's smart keyboards on smartphones.
We familiarize ourselves with the dataset and with the concepts of NLP and text mining.
Some useful sources:
Natural Language Processing Wikipedia page: https://en.wikipedia.org/wiki/Natural_language_processing
Text mining infrastucture in R: http://www.jstatsoft.org/v25/i05/
CRAN Task View: Natural Language Processing: http://cran.r-project.org/web/views/NaturalLanguageProcessing.html
Stanford University course on NLP (not in R): https://www.coursera.org/course/nlp
Cleaning the Data. After loading the data we do sampling to reduce the memory footprint and speed up processes.
Exploratory Analysis. We perform an analysis of the data understanding the distribution of words and their relationship.
Frequency Analysis. We analyzed and visualized the variation of frequencies of words and pairs.
Modelling. We built n-gram models to predict the next word based on the previous 1,2,3 words.
Prediction. We built a predictive model based on previous modelling and we evaluated its efficiency and accuracy.
Create a Data Product. We build a Shiny app that implement the prediction model.
Create a Presentation. we build a “pitch” of our Shiny app to promote it.
The application is available here: https://donscara.shinyapps.io/SwiftKeyNLPAPP/
This presentation can be foun here: http://rpubs.com/donscara/406829