Data Science Capstone Presentation

12/20/2016

This presentation is to present the word prediction App written as part of the John Hopkins data science capstone project.

This app will “attempt” to predict the next word in english when the user types any english word or sentence. The predictions are based on training data set provided at https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip.

Overview

Using the training data set in english, a corpus was created from the cleaned data which was then used for creating a prediction model.

The App created is a simple App without any Jazzed up UI to demonstrate the capability of the App in predicting the next word while the user is typing some words in the text field. The use case for UI is pretty basic for a layman to use it as well.

Prediction Model

Once the training data set was cleaned, bi-gram, trigram, quadgram were created from the corpus.

Data was explored for high frequency words using wordcloud and histograms and it was discovered that computing from the whole data set in realtime was computationally expensive.

Hence, resulting bigram, trigram and quadgrams were stored to improve performance.

App Usage

The Application requires user to enter text into the text field and predicts the next word for the user below it.

Conclusion

The Application can be found at https://gauravsri.shinyapps.io/Data_Science_Capstone_Project/
Only english language is supported by the App.
Training Data Set is available at https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip
Bigram, trigram and quadgram are used for prediction of the next word.
Data Exploration report can be found at https://rpubs.com/gauravsri/milestone