Text Prediction App

Honeydukes
Sat Apr 16 23:16:00 2016

A capstone project for Coursera Data Science Specialization in collaboration with John Hopkins University and SwiftKey

Why This App

Typing on mobile = Considerable time = Frustration.
—> Need smart keyboard to predict next word.

How This App Was Built

Build Corpus
- Use Random Sampling
- Split into train(60%), devtest(20%) & test(20%) sets.
Tokenization
- Train set is cleansed (lower case, remove punctuations etc.)
- 1,2,3,4-gram tokens built.
n-Gram Look-up Table - This is an (n x 6) matrix with
- 4-gram tokens (w1-w2-w3-w4) as anchor + Frequency
- + corresponding tri (w1-w2-w3) + bi (w2-w3) + uni (w3)
- + corresponding next-word(w4).

Katz BackOff

Good Turing Smoothing

Probability discounted for seen n-grams with freq of freq <= 5.
Excess probabilities re-distributed to unseen n-grams.

What Shiny App Does

What the User Needs To Do

Go to https://honeydukes.shinyapps.io/Project/