"SwiftKey" Typing Prediction

Charles Yinchu Dong
10/18/2017

Coursera Data Science Capstone Project

Introduction

  • The goal of this capstone project is to build an interactive web application that could predict words according to your typing input.

  • The project includes six parts: data collection, data cleaning, exploratory analysis, tokenization, building prediction model, developing data product.

  • The start point is collecting data from blogs, news, and Twitter. Then use tm package to process corpus, use RWeka package to tokenize, design algorithms to build prediction model, use Shiny package to develop application.

Algorithms

  • Firstly we need to get the N-Gram model (N from 1 to 4): unigram, bigram, trigram, and quagram. As well as the words frequencies.

  • The idea is to search inside the N-Grams in a certain way according to user's input. For example, search in the quagram first and store the results. If we are not satisfied with the results, then search inside the trigram, and so on. In the end, compare the frequency and make the final decision.

Brief Look

This is what the app looks like!

All you need to do is to type whatever you want in the “Text Input” box, and the three boxes below will pop up prediction words. The likelihood decreases from left to right.

image

Additional Information