Yuan Hu
November 15 2014
This data science capstone is corporated with SwiftKey, who builds a smart keyboard to predict words that makes people to type easily on their mobile devices.
In this capstone we will work on understanding and building predictive text models like those used by SwiftKey.
The data set is downloaded from Coursera site, the Capstone Dataset.
In order to predict word by doing text mining, I will acquisit and clean the data sets, and then build a plausible model.
The size of the initial training dataset is about [ 560 M ] in total. Surprisingly, the final smoothed 3-gram model RData file is only [ 22 M ].
The user input sentence will be cleaned and tokenized.
The algormithm will search and return the single predicted word.
Prediction is done within 1 second
# typical prediction time cost
user system elapsed
0.27 0.01 0.30
Try the app here:
https://yuanhu.shinyapps.io/SinglewordPrediction/