WordPredict.Rmd

Tinniam V Ganesh
27 Jul 2015

This presentation highlights the steps in creating a Word Predict Shiny App

The steps taken were

Use the package RWeka to create n-grams
Remove sparse terms
Convert to a data frame and compute frequency of n-gram
Use Markov chains to calculate conditional probability P(C|AB) = Count(ABC)/Count(AB)
Use the smoothing algorithm where the Count of the n-1 gram is 0
Arrange the counts in descending order of conditional probability
Write this to the term, next word and the conditional probability to a CSV file

The backoff algorithm given a phrase “This is so” is as follows

Start with the quadgram for the given phrase e.g.“This is so” If there are 10 next words stop.
Otherwise sum the probabilities of the phrase in found in quadgram e.g Pq
Compute alpha = 1 - Pi
Search for next 2 words “is so"in the trigram table.
Multiply the probabilities of trigram Pt with alpha Pt' = alpha * Pt
If the total of trigram next words and the previous word = 10 then stop.
Compute new alpha = 1 - Pt'
Continue like this with the bigram and unigram
Store only the n-1 gram, next word and conditional probability as CSV files

Read all the CSV files. The CSV files contain n-1 gram, next word and Probability
Read the last 3 words in typed phrase.
Search in the n -gram and back of n-1 gram for e,g. search in quadgram backoff to trigram etc
Display the top 10 words in a table when the user presses submit button