NextWordPredictor(NWP) ShinyApp Presentation

Devi
Aug 23 2015

Coursera Data Science Specialization : Capstone Swiftkey Project

alt text

Introduction

The goal of this project is to “Create a Shiny app that accepts as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.”. The data is taken from a Corpus called HC Corpora.The training dataset is downloaded from Capstone Dataset.

alt text

Application Summary

This model was developed using 500K randomly sampled lines from Blogs postings,News articles,and Twitter feeds. A modified Katz Back-Off model was developed using n-word sequences (n-grams) ranging from 2 to 6 words. Frequent n-grams were identified and used to calculate probabilities. Numbers, punctuation, capitalization, and profanity words were removed. In addition to the next word, this application displays a prediction data table and wordcloud.

Application Algorithm

(1.) Process text input from user (separate/tokenize into n words)
(2.) Search (n+1)-gram frequency table for matches
(3.) Calculate probabilities of each match (frequency/total)
(4.) If no matches, search the next lower-order n-gram table
(5.) If no match in 2-gram table, use most frequent 1-grams
(6.) Return word with the highest probability score (0-1, 1=best)

How to use the Application

Enter a word or phrase in English in the text box.(or) You can also select any of the phrases from the Quizzes. Then Click the “Predict” button.The best next single word prediction will be displayed on the “Prediction Results” tab or the “Quiz Prediction Results” tab depending on the users selection from the side bar panel of the app respectively. Note: It may take a few seconds to load the app initially !

alt text

Acknowledgement and Links:

Special thanks to Jeff Leek, PhD, Roger D. Peng, PhD, Brian Caffo, PhD. Coursera Data Science Faculty.

ShinyApp Server: https://devi.shinyapps.io/NWPShinyApp

Slide Deck: http://rpubs.com/Devi/