jbassard
March 2018
Final project of Data Science Specialization at Coursera in partnership with JHU and SwiftKey
This R-based presentation will briefly but comprehensively pitch a Shiny-application for predicting the next word of a sentence.
This project is part of the Data Science Specialization on Coursera. It is the Capstone project where the main objective is to develop a data science work.
The final goal is to create a Shiny-application that predicts next word based on an user input sentence.
The application is available at Shiny.
This exercise was divided into several sub-tasks like data cleaning, data exploratory analysis, creation of a predictive model and more.
For the analysis and exploration of data, application of natural language processing (NLP) and text mining concepts was necessary and done using common R-packages.
Source Data used to create a frequency dictionary and thus to predict the next words comes from publicly available HC Corpora (English texts from News, Tweets and Blogs, total of about 2.4 million records).
The Word Predict App is simple and easy to use, yet powerful!
(1) Simply start typing on the text input field and (2) up to 10 possible next words will automatically be displayed below this field. Then (3), you can click on one predicted word to add it in the input for next words to be predicted and so on. See the screenshot below of the main appplication panel.

The application also briefly presents the project and the building of the prediction model.
The complexity of this capstone project is rather challenging in comparison to other exercises of this Data Science Specializatoin. This project and the Data Science Program were interesting and I learned a lot from this project.
This project only scratches the surface of natural language processing and predictive analytics applications. There are many possibilities for improvement including:
There is sometimes an error mentioning “An error has occurred. Check your logs or contact the app author for clarification” following a long user inputs (code to improve). I was not able to identify and correct this error since no error appears in Rstudio where the code has been written. It might be a limitation from Shiny-app.