Text Prediction App Coursera JHU Capstone Project Presentation

Dipanjan Sarkar
22nd August, 2015

Introduction

Objective

This application has been created to fulfil the requirements of the Coursera JHU Capstone project in association with SwiftKey. This application deals with text prediction, where we enter free text written in English language and the application tries to predict the best next word for that entered text.

Tools and Libraries used

  • R Language for development and presentations
  • tm, dplyr, data.table for backend algorithms
  • shiny for web app development

Application Description

Backend System Architecture

How it works - Algorithm and System details

System Components

  • Datasets - Contains the input data for model building.
  • Text Processing Engine - Deals with text pre-processing, profanity filtering and corpus creation.
  • Model Builder - Uses n-gram tokenizers for n(2,3,4) to build n-grams and store them as data frames.
  • Prediction Engine - Uses a Backoff Model with probability estimators to predict the top 5 best next words.

Prediction Engine Algorithm

The prediction engine uses a backoff model with probaility estimators to determine the top 5 best next words based on prior histories of the input text. For this, it uses the generated n-gram models.