Scott Semel
April 23, 2015
You can go to my gitlab repository here.
How do you Use the App?
Just Enter the phrase that you want to predict the next word from.
You can run the app here.
Katz's backoff
\( P_{Katz}(w_{i}\mid w_{i-n+1}^{i-1}) = \) \[ \begin{cases} d \cdot P_{MLE}(w_{i}\mid w_{i-n+1}{i-1}) \ {\text{if } C(w{i-n+1}{i})>0} \ \ \lambda \cdot P_{Katz}(w_{i}\mid w_{i-n+1}{i-2}) \ _{\text{otherwise}} \end{cases} \]
Kneser-Ney (Interpolation) \[ \begin{align*}P_{KN}(w_{i}\mid w_{i-n+1}{i-1}) &= \frac{C(w_{i-n+1}{i})-D}{\sum_{w_{i}}C(w_{i-n+1}{i})} \ &+ \lambda \cdot P_{KN}(w_{i}\mid w_{i-n+1}{i-2}) \end{align*} \] \( \text{where } C(w_{i-n+1}^{i}) = \) \[ \begin{cases} \text{freq count} & _{\text{highest order}} \ \text{N(unique histories)} & _{\text{lower orders}} \end{cases} \]
Count is discounted to shift some probability mass to lower-order model for interpolation
We're restricted to 100mb on ShinyApps, but in real life some have trained on billions or trillions of words
We can try Kneser-Ney or one of the Backoff models as we increase amount of data we train on
We only use 1% of the data provided by SwiftKey and Coursera to fit into the 100mb limit
There are many ways in which this model can be improved.
References