Swiftkey : Datascience Capstone Project for Coursera

Sandeep Anand
9/11/2017

For more details on my Shiny App please visit https://sananand007.shinyapps.io/swiftkey/.

Goal of the project is to predict the next word
Requirements of this project were met based on the Final capstone guidelines
Most of the concepts were based on Natural Language processing
The predictive text model built for this project is using the data provided from a corpus called HC Corpora

This project uses the classical n-gram modelling to develop the final algorithm
Words were tokenized using the tidytext and the ngrams packages respectively to see if we can speed up the process
Different Sample sizes were taken to form the Training set also to speed up the process
unigram , Bigram and Trigrams are used for the KBO model
Predicting for the quizzes were done with Morkov chain rule using Bayes theorem
For the Final App , Katz backoff was used , where we discount the bigrams and the trigrams based on observed and unobsereved cases
Katz backoff understanding and implementation was extremely timetaking and a real challenge