Anh Nguyen
Mar 8, 2017
Coursera Data Science Capstone Project
Project Goal When using mobile devices, the capability of having a text prediction can help the user type words faster with greater accuracy. With this challenge in mind, we have set out to build a fast and accurate prediction app that will predict the next word in a given sentence.
Features:
Initially, the first prototype of our product utilized a complex design that had logic based on the presence of stopwords. That initial model started with a low prediction accuracy of 10.1%. Through research, we found that the best improvements in accuracy came from:
This release of the prediction model utilizes a simple backoff model. Based on the number of words provided, we first try to match the last 3 words using a “4 ngram” data set. If no matches are found, we repeat the process with 3 ngrams and 2 ngrams. The ngram data sets were also trimmed to increase prediction speed by removing any prediction that had more than 3 possible results - since we only provide 3 predictions, any excess is unnecessary.
Our accuracy of 25.1% was achieved by using 50% of the training set but then cutting back dramatically on single occurence tokens. This creates an accurate data set but at a small file size for fast load and prediction speeds. The prediction accuracy will most likely go up if a larger training set is used, but we were constrained by time and resources. All accuracy measurements are done using OOSE (Out Of Sample Error) validation data sets with 10k observations.
Link to app:
https://tudinhhuong303.shinyapps.io/swiftkey/
Instructions:
After the app loads, simply enter the sentence of text into the input box. You will see reflected what you entered along with a suggested completion of the word you are currently typing. Enjoy!