Tinniam V Ganesh
22 Aug 2015
This presentation highlights the steps in creating a Word Predict Shiny App
The Kneser-Ney smoothing is based on determining the 'continuation probability' of the next word.
The Kneser-Ney formula is given below \( P_{\mathit{KN}}(w_i \mid w_{i-1}) = \dfrac{\max(c(w_{i-1} w_i) - \delta, 0)}{\sum_{w'} c(w_{i-1} w')} + \lambda \dfrac{\left| \{ w_{i-1} : c(w_{i-1}, w_i) > 0 \} \right|}{\left| \{ w_{j-1} : c(w_{j-1},w_j) > 0\} \right|} \) where \( \delta \) is the 'discount' and \( \lambda \) is a normalizing constant
\( \lambda(w_{i-1}) = \dfrac{\delta}{c(w_{i-1})} \left| \{w' : c(w_{i-1}, w') > 0\} \right|. \)
Create n-grams csv file with n-1 gram, next word and continuation probability
a) Additive smoothing+ Katz backoff b) Kneser-Ney smoothing processed as follows
Thank You!