Terry Scantlebury
2017.12.25
NextWord is a Shiny App that takes as input a sequence of words and predicts the next word. Predictions are based on a trigram language model, using Modified Kneser Ney smoothing.
In an independent benchmark the app recoded the following scores. Overall top-3 score: 17.75 % Overall top-1 precision: 13.19 % Overall top-3 precision: 21.77 % Average runtime: 27.47 msec Number of predictions: 28464 Total memory used: 443.24 MB
This project fulfils the Johns Hopkins Data Science course Capstone Project (via Coursera) requirements and demonstrates a way to alleviate some of the frustration and pain felt by mobile device users.
People are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities.
Most mobile devices have a smart keyboard that makes it easier for people to type. One cornerstone of smart keyboards on mobile devices is the predictive text module.
This project demonstrates how such an app predicts the next word.
Use the first two words in trigram to do a lookup. Then interpolate scores from the bigram and unigram to arrive at a final score. Scores are ranked and the top 3 words are returned.If no trigram found backoff to the bigram. If no bigram found backoff to the unigram.
The trigram formula for Modified Kneser Ney Interpolated Smoothing
\( P_{KN}(w_3|w_1,w_2) = \frac{max(C(w_1,w_2,w_3) - D,0)}{C(w_1,w_2)} + D* \frac{N(w_1,w_2,\cdot)}{C(w_1,w_2)} * (\frac{max(N(\cdot,w_2,w_3) - D,0)}{N(\cdot,w_2,\cdot)} + D * \frac{N(w_2,\cdot)}{N(\cdot,w_2,\cdot)} * \frac{N(\cdot,w_3)}{N(\cdot,\cdot)}) \)
\( C \) is actual frequency counts, \( N \) is the continuation counts as defined by Kneser Ney, and \( D \) is the discount. Modified Kneser Ney uses three (3) distinct discount values, calculated from frequency of frequencies seen once, twice, three or more times.
In an independent benchmark the app recoded the following scores.
Usage & Description
This application consist of two sections the next word predictor and a documentation page. You type an input phrase into the box provided and the app predicts the next word you will type.
The app can be found at https://terryscantlebury.shinyapps.io/nextword/
For the best browsing experience, while running NextWord, use Chrome or Firefox - not Edge.