C. Euler
2017-03-19
This presentation outlines the methodology behind the submitted next word prediction model.
The corpus used for modeling is based on 5% of the Twitter, news and blog corpora available from the Coursera assignment page. The following steps were carried out to prepare:
The model is based on Bayes' theorem that connects previous knowledge of different aspects of the problem to obtain a solution. Specifically, it determines the probability of an event B provided that A happen (\( P(B|A) \)) based on the reverse, \( P(A|B) \) and the separate probabilities \( P(A) \) and \( P(B) \) to be
\( P(B|A) = \frac{P(A|B)\cdot P(B)}{P(A)} \).
In this context, \( A \) is the occurrence of a specific n-gram, \( B \) is that of a specific word and, thus, \( B|A \) the occurrence of a specific word in a specific n-gram.
The app is usable by simply typing in a word or phrase. The model result is printed in blue.
The shiny app is available at shinyapps.io.
The underlying code is available at github.