Gustavo Seifer
08.August.2021
Background and rationale
Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities.
The main objective of this project was to develop a text prediction model which involves Natutal Processing Language.
APP
Through a simple user interface the App predicts the next word.
Understanding the problem
Data acquisition and cleaning
Exploratory analysis
Statistical modeling & Predictive modeling
Creating a data product (Shiny App)
Creating a short slide deck pitching your product
The text from different sources (News, Blogs, Twitter) was analyzed, cleaned and properly transform in a tidy format through tokenization (each word per row = token).
The words was randomly sample in order to reduce the computation time.
The words were filtered in order to eliminate stopping words and words without meaning
The words were counted globally and by source.
n-grams (bi-grams and tri-grams) were generated. Based on the n-grams a predicted model was developed
The next word is quickly predicted after the input of the user.
This is a first mockup open to be improved and feed with more sources in order to increase its prediction power.
Main advantages
Main disadvantages
Next Steps: to extended it to other languages and to extend it to other OS.
ShinyApp
https://gus079.shinyapps.io/shiny_app/
R for Data Science
Text Mining with R
https://www.tidytextmining.com/index.html
Supervised Machine Learning for Text Analysis in R