Project presentation

Project description

This project involves Natural Language Processing. The objective of this project is to take an input word or sentence and to give the predicted next word.

The project includes the 2 following outcomes:

  • Text Prediction App hosted at shinyapps.io
  • This presentation hosted at R pubs

Prediction Model

The prediction model uses the principles of “tidy data” applied to text mining in R to predict the next word. This method was used for fast processing of huge volume training data. Key model steps:

  1. Input: raw text files for model training
  2. Clean training data; separate into 2 word, 3 word, and 4 word n grams
  3. Sort n grams by frequency
  4. N grams function: uses a “back-off” type prediction model
    • user supplies an input phrase
    • model uses last 3, 2, or 1 words to predict the best 4th, 3rd, or 2nd ngram match
  5. Output: next word prediction

Next Word Prediction App

The next word prediction app provides a simple user interface to the next word prediction model.

  1. Text box for user input
  2. Predicted next word outputs dynamically below user input
  3. Tabs with plots of most frequent n grams in the data-set
  4. Side panel with user instructions

Shiny App Link

App Instructions

  1. Enter a word or sentence in the text box.
  2. The predicted next word will appear in blue below
  3. No need to hit enter of submit.
  4. A question mark means no prediction is available
  5. Additional tabs show plots of the top bigrams, trigrams and quadgrams in the original dataset

App Screenshot