Sunil Kumar (@sunil4data; sunil_iitb96@yahoo.co.in)
11 Aug 2018
Goal of this Capstone course: -
Goal of this final project: -
Create an algorithm for predicting the next word given 2 or more words as input using n-grams Language Model
A large corpus of blog, news and twitter data was loaded and analyzed
N-grams were extracted from 10% of corpus data and then used for building the predictive model
Various methods of improving the prediction accuracy and speed were explored (refer to 'NLP Background study notes & findings' in https://www.kaggle.com/suniliitb96/tryswiftkeyinr?scriptVersionId=5037782)
Challenges of n-gram language modeling
n-gram Language Models
Next Word Prediction
Pre-computed LM model containing probabilities of 1,2 & 3-grams is available to Shiny App for serving Next Word Predictions
User enters incomplete sentence of 2 or more words whose next word is to be predicted
Same data cleaning & tokenization steps used on 'training' data is applied on this input sentence
Input parameters of prediction algorithms
Results