Coursera Data Science: Final Project

Ran Tang
July 7th 2021

NLP Text Prediction using SwiftKey Data

Introduction - Coursera Data Science Specialization Final Course Project

This is the slide deck for the final project of the Coursera Data Science Specialization Capstone course. The project is to build a predictive model using NLP to recommend the most likely next word given a string input.

The model itself was built using R and developed into a Shiny Application.

Background Information and Methodology

The Model was trained using data from SwiftKey and is from blogs, news, and tweets. The data is based in English.

The data can be found here: Data

The prediction application process begins with building corresponding n-grams from a sample of the data, and storing this data to save computational time. Using the N-gram data, we can predict the next word based on occurrence frequency.

Application Example

We've created a simple, easy to use interface for the user for testing our application.

See a screenshot of it below:

Coursera Data Science: Final Project

NLP Text Prediction using SwiftKey Data

Introduction - Coursera Data Science Specialization Final Course Project

Background Information and Methodology

Links

The application can be found here:

The Github Repository and source code for the app can be found here:

Application Example