Coursera Data Science: Final Project

Ran Tang
July 7th 2021



NLP Text Prediction using SwiftKey Data

Introduction - Coursera Data Science Specialization Final Course Project


This is the slide deck for the final project of the Coursera Data Science Specialization Capstone course. The project is to build a predictive model using NLP to recommend the most likely next word given a string input.

The model itself was built using R and developed into a Shiny Application.

Background Information and Methodology


The Model was trained using data from SwiftKey and is from blogs, news, and tweets. The data is based in English.

The data can be found here: Data

The prediction application process begins with building corresponding n-grams from a sample of the data, and storing this data to save computational time. Using the N-gram data, we can predict the next word based on occurrence frequency.

Links

The application can be found here:

Text Prediction Application

The Github Repository and source code for the app can be found here:

Git Repo

Application Example

We've created a simple, easy to use interface for the user for testing our application.

See a screenshot of it below: