Christine Arsenault
September 1, 2018
This presentation was created as the final step in the Capstone project for the Data Scientist specialization offered through Coursera / Johns Hopkins.
The project goal was to build a predictive model of English text. The skills needed to complete this task include natural language processing and text mining. The model was created using the Shiny Application in RStudio.
The source data for this project can be found at: https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip
My Shiny App can be found at: https://ctarsenault.shinyapps.io/word_prediction_application
If you would like to review my source code, it is located on GitHub at:https://github.com/CArsenault/DS_CourseWork
In order to build a prediction algorithm, data was scraped from blogs, twitter and the news. This data was provided as part of the assignment. There are several processes that need to be completed before the model can be built.
The model for the next word prediction was based on the Katz Back-off algorithm. This process works as follows:
C Arsenault Word Predictor