Word Prediction Application

Lynna Jirpongopas
Sat Aug 22 12:07:13 2015

The prediction algorithm (part 1)

These are the general steps taken to predict the next word:

5% of Twitter data & 1% of news data were used to build the model
The model takes the user's input text and determines amount of words in the text, let's call it “n”
Then the sampled data gets tokenized at n+1 grams

The prediction algorithm (part 2)

The model then looks for matches of the input text to any lines in the n+1 grams tokenized data like this:

linesWithTheText <- tokenizedData[grepl(paste("\\<", inputText, "\\>", sep=""), text, ignore.case=T)]

Once matches are found, the model select the ones that appeared in sampled data at highest frequency
If there are ties, the model randomly selects one of them!
Stop words were not excluded. These are good indicators for predicting the next word in common phrases!

The app instructions & features

Input text next to “Your text:”
Please avoid using puntuation marks as these were ignored in the model
Please wait about 2 mins for the app to execute each job submitted
Longer text takes alonger time. Try 2 words text input first
Feature: same input text can be submitted more than once and the app may yield different answers!

Example screen shots of the app

alt text