Parag Sengupta
May 7, 2018
Build a English text prediction model under Natural Language Processing and Text Mining.
Goal: Predict the next word in a sentence a user would “most likely” want to type after an initial sentence input
Primary Use Environment: Handheld or mobile device - speed user typing by suggesting the next word or autocomplete user search query
Data Source: 3 different corpora comprising of tweets, blog posts and news articles in English. Source from Switfkey https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip
Methodology: 4-gram probabilistic model
Project Links:
Shiny App : https://paragsengupta.shinyapps.io/NextWordPrediction
Pitch Slide Deck:
Preprocessing
N-Grams
Probability Formulae Used
| Quadgram ML Estimate | Trigram ML Estimate | Bigram ML Estimate | Unigram ML Estimate |
|---|
Prediction
| If a match | If no match |
|---|---|
| 4-gram with maximum probability is returned | Last 2 words extracted and matched with 3-gram table |
| 3-gram with maximum probability is returned | Last word extracted and matched with 2-gram table |
| 2-gram with maximum probability is returned | Unigram with maximum probability is returned |
Shiny App Screenshot
How the App works
Performance Notes