NEXT WORD PREDICTION

E Reddy Diwakar
29/03/2023

Project Instructions

The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:

A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.

A slide deck consisting of no more than 5 slides created with R Studio Presenter (https://support.rstudio.com/hc/en-us/articles/200486468-Authoring-R-Presentations) pitching your algorithm and app as if you were presenting to your boss or an investor.

Overview:

This project involves Natural Language Processing. The critical task is to take a user’s input phrase (group of words) and to output a predicted next word.

Project Deliverables

  1. Next Word Prediction Model, as basis for an app
  2. Next Word Prediction App hosted at shinyapps.io
  3. This presentation hosted at R pubs

Next Word Prediction Model

The next word prediction model uses the principles of “tidy data” applied to text mining in R. Key model steps:

  1. Input: raw text files for model training
  2. Clean training data; separate into 2 word, 3 word, and 4 word n grams, save as tibbles
  3. Sort n grams tibbles by frequency
  4. N grams function: - user supplies an input phrase - model uses last 3, 2, or 1 words to predict the best 4th, 3rd, or 2nd match
  5. Output: next word prediction

Next Word Prediction App

The next word prediction app provides a simple user interface to the next word prediction model.

Key Features:

  1. Text box for user input
  2. Predicted next word outputs dynamically below user input
  3. Side panel with user instructions

Key Benefits:

  1. Fast response
  2. Method allows for large training sets leading to better next word predictions

Final Product - Next Word Predictor

This link takes you to the Word Predictor interface: https://mblackmo.shinyapps.io/ngram_match/