Shiny App for Text Prediction
Shouman Das
01/31/2017
In this project, we make a shiny app which can predict the next word while writing. Basically we two types of model:
In N-gram model, we first create our N-gram(for our app we used bigrams, trigrams) and save it in a data file. This is a Markov Chain model. For example, if a bigram A-B has the highest probability in the training set, then for A we predict the next word to be B. In Katz backoff model, we discount the probability of highly available n-grams to rare n-grams. In our example we used a simplified version of backoff model.
Three types of Data:
This dataset is highly uncleaned, so we used several cleaning procedure to have cleaned usable data. Some well known r package such as “tm”, “stringi” etc have very useful. At the end of cleaning, we tokenized the data to have our unigrams, bigrams, trigrams.
Katz Backoff Model (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.7219)
N-gram model (https://en.wikipedia.org/wiki/N-gram)