Text Prediction

Data Science SwiftKey Capstone

Ken Whaler
Mar 2019

Introduction

The application is the capstone project for the Coursera Data Science specialization held by professors of the Johns Hopkins University and in cooperation with SwiftKey.

The goal of this capstone is to mimic the experience of being a data scientist. As a practicing data scientist it is entirely common to get a messy data set, a vague question, and very little instruction on exactly how to analyze the data.

Objective

This project takes messy data, processes it for analysis, builds a predictive model and then does the actual predictions when text is entered.

N-grams are processed outside the shiny app. This ensures quick response times.

The application predicts the next two words most likely used.

Method

After creating a data sample from the HC Corpora data, this sample was cleaned by conversion to lowercase, removing punctuation, links, white space, numbers and all kinds of special characters. This data sample was then tokenized.

Usage

Using the app is straight-forward. A default value of “Blue” shows a user predicted text by default. The user can then change the text and increase or decrease the number of predictions.

ui

References and Source Code