The objective of this Capstone project is to build a prediction algorithm and provide an interface that can be accessed by others which will take as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.
- Steps taken
- Create a corpus of texts from twitter, news and blogs and clean it
- Tokenize the cleaned corpus into words and convert tokens into N-grams (sequence of words)
- Generate a list of bigrams, trigrams, quadgrams and quintgrams and calculate the relative frequency (Score) according to Stupid Backoff model.
- Develop the prediction algorithm to be used in the Shiny App.