Capstone Project:

April 2, 2016

Next-Word Predictor App

Coursera Data Science Specialization

John Hopkins University

About the App

The app takes a word or (multiple words) from the user. It is an exercise of using a training data from a preprocessed corpus, and some Natural Language Processing algorithms. As output, a predicted word is displayed back to the user. Created using RStudio as a Shiny app

User Instructions:

  • Enter a few words in the text area box
  • Use the clear button to reset and clear the Textarea box

About the Algorithm

Probability of an upcoming word:
  • P(W) = P(w4|w1,w2,w3)
  • P(Wi|Wi-1) = count(Wi,Wi-1)/count(Wi-1)
  • P(“its water is so transparent”) = P(its) x P(water|its) x P(is|its water) x P(so|its water is) x P(transparent|its water is so)
Maximum Likelihood Estimation (MLE)

Intead of using probabilities, Kneser-Ney Smoothing and N-grams are used to rank possible next word. The ranking is based on a training data from the Corpus or Dataset using

App Demo

Check out the app HERE