22 September 2018

Introduction

The goal of this project was to develop a prediction algorithm to predict words based on previous text and create an user interface that can be accessed by others.

The project is part of the capstone project of the Data Science Specialization by Johns Hopkins University in partnership with Swiftkey.

SwiftKey builds a smart keyboard app that makes it easier for people to type on their mobile devices. It uses predictive text models to predict text.

How it works

A dataset with various texts from blogs, news sites and twitter was downloaded and imported into R.

The dataset was then:

  1. Cleaned
  2. Randomly Sampled
  3. Transformed into a term-document matrix
  4. Tokenized into 4-, 3-, 2- and 1-Grams

The tokenized data was then used to calculate frequencies which were used to predict which word is most to follow the precedent text.

How to use the App

To use the app the user needs to:
1 Enter the text on the text box

2 Set the number of predicted words to display

Additional Information