Next Word Prediction Application

Bahareh Eghtesadi
Jan 30 , 2016

  • In this project I developed an application that predicts the next word based on the text input that the user enters.
  • The application includes a text box where the user enters a text.
  • Then, the last 1 to 3 words of the text input is used to predict the next word.
  • The most probabile word to come after is given as the prediction result.

Data Processing

  • The Data is from HC Corpera. The data is all in US English and it is from blogs, news, and the twitter.
  • All the data from different sources are combined, and then some prepration is done using the 'tm' package. The prepration involves transfering to lower case, removing punctuations, numbers, white spaces, and stop words.

Algorithm

  • I used the n- gram language models to predict the next word.
  • First, I tokenized the data. Then, created 1- , 2- , and 3-gram models.
  • Using the text input, the algorithm first checks the 3 - grams, then 2-grams.

Instructions

  • The user has to enter a text. Then, the predicted word is represented in the right sidebar.
  • For example, let's assume the user enters “happy new”
setwd("/home/bahar/projects/NextWord/PredictNextWord")
source("./PredictNextWord.R")
wordPrediction <- NextWord("happy new")
year