The goal for this final project assignment is to implement a learning model for text prediction. Using the text prediction algorithm, create a product in Shiny that provides a user interface that can be accessed by users easily.
Development (Part I)
Preprocessing the data: clean the data and remove profanity words, remove numbers, signal punctuation, extras spaces and others, also tokenize the words.
Exploratory data analysis: calculate the frequencies of words and word pairs
Modeling: built 2-7 grams models to facilitate next word prediction
Development (Part II)
Prediction model: Katz's back-off model was used to predict the next word, which iterates between 7-gram to 2-gram to find matches in the last n-1 words. In case of no matches, the most frequent word ('the') is returned.
Application: development through shiny to easily use of the tool