Capstone: Predict Next Word

                        NLP SwiftKey Word Prediction
                     Coursera/Johns Hopkins University 
                             Data Science Specialization

N.Halici

December 19, 2019

caption

Overview

The objective of this capstone project is to build a Shiny app to allow a user to enter a phrase and have the application predict what the next word will be.

Input for this project was provide as text data from twitter, blogs and news feeds. An exploratory data analysis phase was completed. The data was sampled and cleaned by converting the data to all lowercase and removing puncuation, white space, numbers and special character (such as quotes and hyphens).

Models

Once the sample data was prepared, a predictive model was built to estimate the next word to be typed for any phrase.

The data from the three input sources was combined to create a single Corpus. The corpus was tokenized into a series of the most common n-grams of 1, 2, 3 and 4 word phrases (unigram, bi-gram, tri-gram and quad-gram).

A backoff model strategy was then employed to try to match a users input to common 4 word phrases. If not found, the model would backoff and look for similar 3 words phrases, and then 2 words phrases.

Finally, a shiny application was then built to allow reviewers to test the project code.

The Application - I

The app is made as straight forward as possible. The user can enter a word or multiple words in the input field for which the next word is to be predicted. The app can be access via this link: (https://halici.shinyapps.io/DS-Capstone-Project/)

Simply start typing on the text field and up to 4 possible next words will automatically display below the field. Each predicted word is clickable and clicking on the desired word will add it to your phrase and predict the next word.

The Application - II

Prediction tab (default) displays the predicted next word. About tab, describes the app.

Screenshot of Application

Resources

References