Next Word Predictor - Shiny App

Data Science Capstone - Final Project

Felipe Ruiz

Coursera & John Hopkins Data Science Specialization

2024-04-12

Next Word Predictor: Shiny App

Next Word Predictor is a Shiny app that uses a text prediction algorithm to predict the next word(s) based on text entered by a user.

The application will suggest the next word in a sentence using an n-gram algorithm. An n-gram is a contiguous sequence of n words from a given sequence of text.

The text used to build the predictive text model came from a large corpus of blogs, news and twitter data. N-grams were extracted from the corpus and then used to build the predictive text model.

Modelling of the data

The predictive text model was built from a sample of 800,000 lines extracted from the large corpus of blogs, news and twitter data.

The sample data was then tokenized and cleaned using the tm package and a number of regular expressions using the gsub function.

The algorithm iterates from longest n-gram (4-gram) to shortest (2-gram) to detect a match. The predicted next word is considered using the longest, most frequent matching n-gram. The algorithm makes use of a simple back-off strategy.

User Interface

The predicted next word app offers a simple user intereface.

The user needs to type a one word or more, and click submit. That will run the algoritm. Please allow a few seconds for the output to appear.