Swiftkey : Datascience Capstone Project for Coursera

Sandeep Anand
9/11/2017

This is a Outine of the application for Predicting the Next word given a set of words

For more details on my Shiny App please visit https://sananand007.shinyapps.io/swiftkey/.

  • Goal of the project is to predict the next word
  • Requirements of this project were met based on the Final capstone guidelines
  • Most of the concepts were based on Natural Language processing
  • The predictive text model built for this project is using the data provided from a corpus called HC Corpora

Algorithm Used

  • This project uses the classical n-gram modelling to develop the final algorithm
  • Words were tokenized using the tidytext and the ngrams packages respectively to see if we can speed up the process
  • Different Sample sizes were taken to form the Training set also to speed up the process
  • unigram , Bigram and Trigrams are used for the KBO model
  • Predicting for the quizzes were done with Morkov chain rule using Bayes theorem
  • For the Final App , Katz backoff was used , where we discount the bigrams and the trigrams based on observed and unobsereved cases
  • Katz backoff understanding and implementation was extremely timetaking and a real challenge

App Usage

  • App is extremely simple and ordinary
  • Please wait for at least 1 minute with the sentence input to get the prediction alt text

Sources