Christopher Han
March 18, 2019
This data product takes in a word or a sentence and predicts the next word. The model is trained on 70% of the data and uses a stupid backoff model with ngrams ranging from 1-5. The application is deployed at this link https://chrishan.shinyapps.io/finalwordprediction/
Method
The algorithm uses a stupid backoff model. First the model starts with a 5-gram match, given the sentence is long enough. If there is a match, the probability of the word is calculated based on the 5-gram match. If there is not a match, it moves onto 4-gram, to 3-gram, and so on.
The shiny application consists of the following elements:
Using the benchmark provided here Benchmark, we observed how the model performs on a test set.
| Result | 3-gram | 4-gram | 5-gram |
|---|---|---|---|
| Overall top-3 score | 17.18% | 17.57% | 17.56% |
| Overall top-1 precision | 12.77% | 13.41% | 13.45% |
| Overall top-3 precision | 20.92 | 21.09 | 21.02 |
| Average runtime | 18.20 msec | 20.08 msec | 23.84 msec |
| Total memory used | 105.32 MB | 106.51 MB | 106.88 MB |
The 5-gram model provides the best overall top-1 precision with being able to predict the next word on the first try 13.45% of the time. The final deployed application uses the 5-gram model on the basis of this result.