Predict the next word in the sentence

Vamshideep Devershetty
12/13/2014

Problem Statement

Typing on mobile devices can be a serious pain.

I have created a model that will reduce this pain by automatically predicting the next word in the sentence

For the purpose of this project

  • I have used data set from twitter, blogs and news
  • Explored the data set
  • Cleaned the data set
  • Built a prediction model
  • Deployed as a Shiny App

Data Exploration and Cleaning

The raw data set contained a lot of unnecessary characters. I have loaded some of those random lines to explore the data set and clean it

  • Removed all the weird characters and numbers
  • Transformed the text to lower case
  • Split the sentences into individual character elements
  • Removed punctuations apart from - and '
  • Removed unnecesarry words

Building the Prediction Model

  • I have used 70% of my data set to build my model.
  • I need to build an array that displays the frequency of words in that data set
  • I also had to build a frequency of words for 2-grams, 3-grams and 4-grams
  • Sort all the n-grams in decending order of probability
  • I then storeD our n-grams to the disk to save on memory

Building the algorithm

  • I have loaded all the n-grams from disk for our model
  • Get the input sentence from the user
  • Select last few words(max 3) from the sentence
  • Find those words in our n-grams (4,3,2,1)
  • Display the next word from the sentence
  • Word Cloud
    Word Cloud

Application Link

Thanks for checking my prediction app