Predict the next word with FingerNext!

Coursera Data Science Capstone Project Slides

Roy Chen
Student/Medlab tech

Introduction

As part of the final project for the Coursera Data Science Capstone, I worked on text mining and word prediction.

The text mining data is based on english news, twitter, and blog files.

The text data is then tokenized and checked for profanity words, and punctuations removed.

Then the n-gram model is built to compute frequency tables with given words or phrases.

Performance of prediction algorithm

The predictability of the model is about 5.2% based on number of words predicted correctly through testing phrases.

The time it takes for the predicted words to display is approximately 45 seconds.

The database and the app algorithm will be updated in the future versions.

FingerNext App

FingerNext is a shiny app, where the user simply enter 1) word and/or 2) phrase into the text box.

Once the words are entered, the app runs these words into the word prediction algorithm behind the scene.

Eventually the app would provide the best two matches of word predictions below the text box!

Come check out my FingerNext