Project presentation

Project description

This project involves Natural Language Processing. The objective of this project is to take an input word or sentence and to give the predicted next word.

The project includes the 2 following outcomes:

Text Prediction App hosted at shinyapps.io
This presentation hosted at R pubs

Prediction Model

The prediction model uses the principles of “tidy data” applied to text mining in R to predict the next word. This method was used for fast processing of huge volume training data. Key model steps:

Input: raw text files for model training
Clean training data; separate into 2 word, 3 word, and 4 word n grams
Sort n grams by frequency
N grams function: uses a “back-off” type prediction model
- user supplies an input phrase
- model uses last 3, 2, or 1 words to predict the best 4th, 3rd, or 2nd ngram match
Output: next word prediction

Next Word Prediction App

The next word prediction app provides a simple user interface to the next word prediction model.

Text box for user input
Predicted next word outputs dynamically below user input
Tabs with plots of most frequent n grams in the data-set
Side panel with user instructions

Shiny App Link

App Instructions

Enter a word or sentence in the text box.
The predicted next word will appear in blue below
No need to hit enter of submit.
A question mark means no prediction is available
Additional tabs show plots of the top bigrams, trigrams and quadgrams in the original dataset

Project presentation

Project description

Prediction Model

Next Word Prediction App

App Instructions

App Screenshot