Dinara Mukhtarova
A Shiny app that takes as input a phrase (multiple words), one clicks submit, and it predicts the next word. I used English texts from blogs, news and Twitter to compute n-gram frequencies. Based on those frequencies I find the most likely end of a phrase.
The algorithm I'm using is called Stupid Backoff. This algorithm assigns a score to every candidate word as follows:
\[ S(w_i|w^{i-1}_{i-k+1}) = \begin{cases} \frac{freq(w^i_{i-k+1})}{freq(w^{i-1}_{i-k+1})} & \quad \text{if } freq(w^i_{i-k+1}) > 0\\ \alpha S(w_i|w^{i-1}_{i-k+2}) & \quad \text{otherwise}\\ \end{cases} \]
Here, we're using \( \alpha = 0.4 \).
Stupid Backoff is inexpensive to calculate in a distributed environment while having a high quality for large amounts of data.
If a user inputs a phrase that has no matches in the database then the app returns word “and” (since that phrase is possibly a name of something). If he/she enters an empty string (or a string that has no words in it) than the app returns word “the” (since it is the most popular word in English and fits good for a beginning of a phrase).