Yoav Pridor
February 2018
Objective: Predict the next word, in context, given some input text.
Method:Initial data, from News, Blogs and Twitter. All texts were cleaned (punctuation and Profanity removed). Deriving nGram tables (list of tokens and their frequencies) from corpora (Using R package Quanteda). Snipping the nGram tables to include 90% of tokens, for app prformance. creation of prediction model (function) based on the Katz Backoff algorithm Creation of shiny app that loads the nGram tables, takes text input, and offers a next word prediction.