Swift - Text Prediction (Capstone Project)

Vedant Mane
July 27, 2020

This presentation outlines a text prediction app designed by Vedant Mane.


The github repository is available at https://github.com/vedantmane/SwiftKey-PredictiveTextAnalyis

Overview

The objective of this capstone project is to build a text prediction application in shiny that predicts the next word given a phrase. The application tries to predict the next possible word with a trade off between the accuracy and the comutational time required for the output.

This application is built using data obtained from US blogs, news & twitter. The summary of the dataset is as follows:

                         Blogs     News   Twitter
Size                    200 MB   200 MB    200 MB
Number of lines         899288    77259   2360148
Number of characters 206824505 15639408 162096241
Max Characters           40833     5760       140

The dataset is available here

Prediction Algorithm

Algotithm

  • Read User Input
  • Clean & tokenize the input
  • Predict using n-gram model
  • Generate stupid backoff
  • Rank the predictions
  • Display predictions to user

N-GRAM Prediction Model

The algorithm predicts the next word using the n-gram model (5-grams, 4-grams, 3-grams, 2-grams) designed using Markov Chain technique and for the unobserved n-grams it uses stupid backoff to precdict the next word.

Shiny APP


The application is deployed at https://vedantmane.shinyapps.io/swift/

Strengths of the App and Future Plans

  • The app is relatively fast, and returns a list of the top 10 suggested single words in less than a second. The n-gram look-up tables are lean, because they are pre-sorted and only contain the word(s) to match and to predict the word.
  • Currently, the app does not perform “sentiment analysis” for user input. Improvements could include building n-gram tables based upon a user's frequent input & analysing the sentiment of the user.
  • Also, the app could make more accurate predictions by determining parts of speech; however, this would negatively affect the app's computational performance (speed).