Predictive Typing Application

Zach Colburn
March 25, 2017

Project Overview

My objective in developing this application was to enable software-assisted typing. Specifically, the objective was to use the text entered by a user to predict the word that user would type next. This project can be divided into three parts: data collection, model development, and model evaluation.

Part 1 - Data collection

Data was acquired from HC Corpora (https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip). The data consists of text scraped from:

  • Twitter
  • Blogs
  • News

Model Development

Model development consisted of four key steps:

  • Data cleaning: numbers, symbols, and profanity were stripped from the data.
  • n-gram generation: The data was split into n-grams of lengths ranging from 1 to 5.
  • Model assembly: The frequencies of unique n-grams were determined. These n-grams were split into a prefix and a suffix (the latter being just one word). Split n-grams were assembled into a database.
  • Predictive function generation: The predictor function takes the user's input text and matches it against prefixes in the database. The suffixes corresponding to the highest frequency matches are returned as predictions.

The model shown on the next slide takes the user's input text and returns the specified number of predictions, as well as statistics concerning the frequency of those prefixes and suffixes.

The Application

appImage

Model Evaluation

Input text of the indicated length was used to predict the following word. The successful prediction rate, given the indicated number of allowed guesses, is reported as a percent.

Prediction 1 Prediction 2 Prediction 3
Prefix length 1 5.91 50.20 82.46
Prefix length 2 8.04 12.96 19.24
Prefix length 3 5.40 8.76 12.26
Prefix length 4 5.35 8.41 11.76

Application: https://zcolburn.shinyapps.io/predictive_typing_application/

Documentation: https://github.com/zcolburn/predictiveTypingApplication