Hung Dinh
July 2, 2018
This project builds a text prediction app, inspired by Swiftkey app and was selected as the final project of Data science Specialization by Johns Hopkins.
The app is expected to suggest the next word from some text input by the user.
In this presentation, I will summarize the key points of:
The data comes from 3 sources: English news, blog and twitter posts with more than 4 millions lines and more than 100 millions words.
With the training set (90% total size), I:
For more details, please visit: http://rpubs.com/nhohung/NLP_processing
I use a Back-off 4-grams prediction model as described below:
NA
is returned if there is no 2-grams match.Validation: from the test dataset (10% total size), select random words (input) and their next one (ground truth). The model will predict 5 words from the input. If any of them matches the groud truth then the prediction is correct. My model shows 25% accuracy.
For more details, please visit: http://rpubs.com/nhohung/NLP_prediction
The final model is obtained with 3.7 MB of data that is required to load when the app starts. The response time is almost instant.
You can see the app here: https://nhohung.shinyapps.io/TextPrediction/.
When you're there, just start typing something into the top left box, 5 suggestions will be displayed immediately.