June 17, 2019

Problem Statement

The goal of this project is to create a word prediction system similar to what is available on all modern cell phones. The requirements for the project were as follows:

  • Develop an algorithm to predict the next word based on an input string of words
  • Deploy that algorithm as an R Shiny app
  • Create this presentation using the R Presentation system

We were given access to a corpus of text sampled from Twitter, blogs, and news articles. This would help us get started on building and training our prediction model.

Approach

** Data
* I wanted a richer set of text, so I augmented the provided corpus with the Brown Corpus and the Corpus of Contemporary American English
** Implemented a simple back-off algorithm
* Tokenized the corpus into n-grams where n = 1, 2, 3, and 4 using the R quanteda package
* Found most frequent word/token to follow each n-gram
* Saved the predictions in a file for easy lookup in the app
* The app looks for a prediction matching the last 4 words of the input. If it finds a match it returns it. If not, it repeats the process with the last 3 words of the input, etc.

Slide with link of shiny web ui :

Thank You

Thanks!!!