Sufia Khatun
September, 2016
It is a user interface word predictor app. This app performs the following task:
This app will be found here
Different algorithms were investigated for this app such as Katz's back-off model, Stupid backoff model and Kneser-Ney smoothing. It is found from the study that stupid backoff model is much easier and simpler for the web app.
N-gram model was used to predict next words. This app uses 5-grams to 2-grams to predict words based on Stupid backoff model. The algorithm defines following scoring function:
\[ \begin{aligned} S(w^{i}|w^{i−1}_{i-k+1}) = \begin{cases} \frac{freq\left ( w^{i}_{i-k+1} \right )}{freq\left ( w^{i-1}_{i-k+1} \right )}, & \text{ if } freq\left ( w^{i}_{i-k+1} \right ) >0 \\ \alpha*s(w^{i}|w^{i-1}_{i-k+2}), & \text{} Otherwise \end{cases} \end{aligned} \]
Where, S( ) = score of canditate \( w^{i} \) for a given condition \( w^{i-1}_{i-k+1} \) in n-gram. \( s(w^{i}|w^{i-1}_{i-k+2}) \) is the score of canditate \( w^{i} \) for a given condition \( w^{i-1}_{i-k+2} \) in (n-1)-gram and \( \alpha \) is equal to 0.4.
\[
Score\left (world\right ) = \frac{freq\left (\text{world}\right )_{n=5}}{freq(\text{it would mean the})_{n=4}},\ \ \text{if}\ freq( \text{it would mean the world})_{n=5} >0\\
\]
The score of possible next three words of “it would mean the” are:
| Possible Next Word | Freq-5gram | Freq-4gram | Freq-3gram | Score |
|---|---|---|---|---|
| world | 187 | 202 | 301 | 0.94 |
| difference | 0 | 0 | 40 | 0.03 |
| same | 0 | 0 | 35 | 0.03 |