Data Science Capstone final project

Mary0523
1-13-2019

Objective

This presentation is a brief introduction to the application which can be used to predict the next word based on the words you entered.
This is application is the final project of coursera course Data Science Capstone.

Methods

This application used three text files as training data: blogs, news, and twitter: https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip

N-gram model is used for prediction of the next word based on previous 1, 2, or 3 words.
The three text files were cleaned by removing numbers, removing punctuation, removing symbols, removing separators, and convert to lower case. Then the cleaned texts were tokenized into n-grams (unigram, bigram, trigram, and fourgram).
The tokenized n-grams then sorted and saved as data frames in R. The data frames were used by the application for next word prediction.

The Application

One the left side of the website enters the words or a sentence you wants to predict the next word. Then click on the button “Click to predict”. The next predicted word will show on the right panel.

https://mary0523.shinyapps.io/CapstoneProject/

alt text

Links to data and R scripts

The application is hosted on shinyapps.io: https://mary0523.shinyapps.io/CapstoneProject/
The code for this application and related files are stored in GitHub: https://github.com/Mary0523/DataScienceCapstoneProject