Word Prediction Application

Lynna Jirpongopas
Sat Aug 22 12:07:13 2015

The prediction algorithm (part 1)

These are the general steps taken to predict the next word:

  • 5% of Twitter data & 1% of news data were used to build the model

  • The model takes the user's input text and determines amount of words in the text, let's call it “n”

  • Then the sampled data gets tokenized at n+1 grams

The prediction algorithm (part 2)

  • The model then looks for matches of the input text to any lines in the n+1 grams tokenized data like this:
linesWithTheText <- tokenizedData[grepl(paste("\\<", inputText, "\\>", sep=""), text, ignore.case=T)]
  • Once matches are found, the model select the ones that appeared in sampled data at highest frequency

  • If there are ties, the model randomly selects one of them!

  • Stop words were not excluded. These are good indicators for predicting the next word in common phrases!

The app instructions & features

  • Input text next to “Your text:”
  • Please avoid using puntuation marks as these were ignored in the model
  • Please wait about 2 mins for the app to execute each job submitted
  • Longer text takes alonger time. Try 2 words text input first
  • Feature: same input text can be submitted more than once and the app may yield different answers!

Example screen shots of the app

alt text alt text