author: Ran Tang date: July 7th 2021 font-family: ‘Helvetica’ autosize: true transition: fade
This is the slide deck for the final project of the Coursera Data Science Specialization Capstone course. The project is to build a predictive model using NLP to recommend the most likely next word given a string input.
The model itself was built using R and developed into a Shiny Application.
The Model was trained using data from SwiftKey and is from blogs, news, and tweets. The data is based in English.
The data can be found here: Data
The prediction application process begins with building corresponding n-grams from a sample of the data, and storing this data to save computational time. Using the N-gram data, we can predict the next word based on occurence frequency.
We’ve made a simple, easy to use interface for the user for testing our application.
See a screenshot of it below: