Sonia Sharma
June 16, 2016
Aim: Use data from a corpus called HC Corpora consisting of news, blogs and twitter text, to build a language model to predict the next word in a sentence.
Here is an overview of the steps involved in the process of building and implementing the model
Our prediction model is given below where V-vocabulary size, count() or N = number of tokens, \( \lambda \) = weight
How the algorithm works and its main features
Our model seems to perform fairly well even when it is built on only \( 5\% \) of the sample. Using a bigger sample, say \( 10\% \), \( 20\% \), \( 30\% \) will certainly improve the accuracy of the model even more.
The shiny application is very simple and easy to use, with instructions provided on the webpage. The user can type in a string of English language words and as soon as they stop, the next word predictions appear in the tab below it, as can be seen in the snapshot below.
THANK YOU!