NLP Prediction Presentation

Wesley Engers
4/24/2016

Purpose of NLP Text Prediction App

  • In our modern fast paced environment users want instant results to allow messages to be sent more quickly. Having text prediction and auto-complete functionality helps the user achieve this goal.
  • This Algorithm allows the user to input the previous 3 words of a sentence or phrase and the algorithm will predict what the next word is.

Cleaning the data

  1. 3 sets of documents were used

    a) Twitter, Blogs, News

  2. TM package and Quanteda package used to parse and clean up the data

  3. Data was tokenized and ngrams built

    a) 1-gram, 2-gram, 3-gram, and 4-grams constructed

    b) Last word was used as the “Predicted Word”

    c) 5% of documents were used to reduce computation time

Creating the Prediction Algorithm

  1. For each set of ngrams the last word is treated as the predicted word
  2. The algorithm searches the 2 gram for a match to the first word in the 2 gram which is the previous word in the sentence. It then counts which words are the most common predicted words and assigns them a probability based on the proportion of times that the predicted word appears.
  3. A similar process is done for the 3 grams and 4 grams.
  4. Prediction Algorithm is based on a weighted average of probabilities fore each of the ngrams

    a) (1*1gram+2*2gram+3*3gram+4*4gram)/10

    b) The 1 gram is just the most common words in the dataset

  5. The Predicted Word with the highest probability is return to the uses as a best guess

How to use Text Prediction and Challenges

  1. The User Inputs the previous 3 words of the sentence into each of the 3 text boxes
  2. The User presses the submit buttom to run the Text Prediction Algorithm
  3. After a little delay, the algorithm returns a prediction for the next word.
  4. It was quite difficult to balance creating a good performing algorithm with one that would respond fast enough for user demands