Capstone presentation of word prediction app

author: Jamieon

date: January 24th, 2016

Overview

Go to https://jamieon.shinyapps.io/word-predictor/ to try out the presented shiny app.

The application:

  • predicts next word after word inputs by the user

  • shows top 5 words with their probablities

  • is fast after loading the data

Data processing

The following steps are taken in data processing phase:

  • tokenize: a get_tokens function is written, files taken and tokens returned

  • to get n-grams freqency: data.table library is used to processes the data

Algorithm

n-gram freqency data is used and:

  • Good-Turning discounting for freq<10 1,2,3-gram is applied

  • using Katz-back off the p_kz(w3|w1,w2) and p_kz(w1|w2) are calculated

  • the model is stored using ARPA format

Application screenshot

app image