Ran Tang
July 7th 2021
This is the slide deck for the final project of the Coursera Data Science Specialization Capstone course.
The project is to build a predictive model using NLP to recommend the most likely next word given a string input.
The model itself was built using R and developed into a Shiny Application.
The Model was trained using data from SwiftKey and is from blogs, news, and tweets. The data is based in
English.
The data can be found here: Data
The prediction application process begins with building corresponding n-grams from a sample of the data, and storing this data to save computational time. Using the N-gram data, we can predict the next word based on occurrence frequency.
We've created a simple, easy to use interface for the user for testing our application.
See a screenshot of it below: