Ricardo Rios
April 15th, 2016
Data Science Capstone
Johns Hopkins University | Coursera
Text predictor is a shiny application that uses Markov chains to predict the next word given a sequence of words according to the information provided in the corpus called HC Corpora.
If the sequence of words provided to text predictor is unknown, text predictor will use stupid back off model with the following squeme:
\[ S\left(w_{i}|w_{i-n+1}^{i-1}\right)=\begin{cases} {\frac{f\left(w_{i-n+1}^{i}\right)}{f\left(w_{i-n+1}^{i-1}\right)}} & \textrm{if }f\left(w_{i-n+1}^{i}\right)>0\\ {\alpha}S{\left(w_{i}|w_{i-n+2}^{i-1}\right)} & \textrm{otherwise} \end{cases} \]
\[ w_{i-n+1}^{i-1}=w_{i-1}w_{i-2}\ldots w_{i-n+1} \]
\[ S(w_{i})=\frac{f\left(w_{i}\right)}{N} \]
| Unigram | Bigram | Trigram |
|---|---|---|
| 0.036 | 0.054 | 0.035 |