Natural Language Processing

bretlaisy
Sat Aug 15 22:10:33 2015

An interactive apps that predict the next word

Introduction

The Capstone Project builds an interactive shiny apps that read in the words that a user type in.

The user starts typing in the words in a text box, which can be a single word or a phrase.

The Program will predict the next most likely word base on n-grams algorithm

The Shiny Apps

alt text

Click here to try it

Prediction Algorithm

The prediction uses the statistical properties of the n-grams model to predict the next word base on the words in the text box

Below algorithm shows how the prediction works:

  • If there is only 1 word, it uses the statistical probability in the unigram prediction to derive the top frequency word

  • If there are 2 words, it checks the statistical probability in the bigram algorithm to predict the third word

  • If there are 3 or more words, it will use only the last 3 words to return the next word

How it works

If the word is unpredictable, it will display the 3 most common words - “the”, “be” and “to”

The text corpus is processed by combining the Coursera Capstone Dataset:

  • en_US.blogs.txt
  • en_US.news.txt
  • en_US.twitter.txt

which can be downloaded from Capstone Dataset

Thank you

The Application can be found here:

Happy trying and hope it meets your prediction!