N-Gram Word Predictor

Sue Lynn

First Slide

The goal of this exercise is to create a product to hightlight the prediction algorithm that had been built and to provide an interface that can be accessed by others.

  • the shiny app takes as input a phase (multiple words) in text box input and outputs a prediction of the next word.
  • The exercise was divided into 7 sub tasks, such as data cleansing, explatory analysis, the creation of a predictive model and more.
  • All text mining and natural language processing was done with the use of a variety of well-known R packages.

Product Details

A simple program where you can enter a phrase and the software will search the database for the next most common word for that phrase.

How does the program works

After the sentence had been entered, three major steps will occur: 1. If 2 words had been written, the software will search in a database composed by 3-grams.

  1. If 3 words had been written, the software will search in a database composed by 4-grams, if there is less than 3 words for the prediction, the software will transform it into a 2 word phrase, and execute step 1.

  2. If 4 words or more had been written, the software will select the last 4 words and search in a database that were composed by 5-grams, if there is less than 3 words for the prediction, the software will transform to a 3 word phrase and execute step 2.

User Interface

The user just need to key in words into the text box and press enter or click on the predict next to predict the next word. An example sentence had been entered into the text.

To access the shiny app, https://suelynnk.shinyapps.io/Prediction/