Data Science Capstone Project

Shree
06/17/2017

Text Prediction Algorithm using Swiftkey dataset

Assignment

The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:

A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word. A slide deck consisting of no more than 5 slides created with R Studio Presenter pitching your algorithm and app as if you were presenting to your boss or an investor.

Algorithm Flowchart

  • Step 1: Remove punctuation, numbers, common words, profanity etc.
  • Step 2: Search for a match, if matches not found then skip to step 4
  • Step 3: Shorten input till enough matches found, calculate a penalty value
  • Step 4: Calculate probability scores for matches

Probability Model

P(game | looking + forward + playing) = log(P(looking + forward)/P(looking)) + log(P(forward + playing) / P(forward)) + log(P(playing + game)/P(game))

The above formula shows how the algorithm calculates a score based on Markov assumption given a predicted word.

  • This model was chosen due to speed returning predicted words and memory restrictions
  • To find matches we apply a penalty to the probability score

Shiny Application

To use application input a phrase to be analyzed, select the max number of results to return and press the “Predict” button.

Output: It shows the original input phrase, the filtered phrase that the analyzes, and a table showing predicted words.

Click here to use the application.