Next Word Prediction Application

JV
18/11/2020

Introduction

The goal for this final project assignment is to implement a learning model for text prediction. Using the text prediction algorithm, create a product in Shiny that provides a user interface that can be accessed by users easily.

Development (Part I)

  1. Preprocessing the data: clean the data and remove profanity words, remove numbers, signal punctuation, extras spaces and others, also tokenize the words.
  2. Exploratory data analysis: calculate the frequencies of words and word pairs
  3. Modeling: built 2-7 grams models to facilitate next word prediction

Development (Part II)

  1. Prediction model: Katz's back-off model was used to predict the next word, which iterates between 7-gram to 2-gram to find matches in the last n-1 words. In case of no matches, the most frequent word ('the') is returned.
  2. Application: development through shiny to easily use of the tool

Results

  • The Shiny app for prediction can be found here
  • The application need 2 inputs:
    1. Word or phrase to do the prediction
    2. Max number of possible next word predictions
  • The predicted next words will be ordered from the most frequently used to less frequently used