Next Word Predictor - Shiny App

Data Science Capstone - Final Project

Felipe Ruiz

Coursera & John Hopkins Data Science Specialization

2024-04-12

Next Word Predictor: Shiny App

Next Word Predictor is a Shiny app that uses a text prediction algorithm to predict the next word(s) based on text entered by a user.

The application will suggest the next word in a sentence using an n-gram algorithm. An n-gram is a contiguous sequence of n words from a given sequence of text.

The text used to build the predictive text model came from a large corpus of blogs, news and twitter data. N-grams were extracted from the corpus and then used to build the predictive text model.

Modelling of the data

The predictive text model was built from a sample of 800,000 lines extracted from the large corpus of blogs, news and twitter data.

The sample data was then tokenized and cleaned using the tm package and a number of regular expressions using the gsub function.

The algorithm iterates from longest n-gram (4-gram) to shortest (2-gram) to detect a match. The predicted next word is considered using the longest, most frequent matching n-gram. The algorithm makes use of a simple back-off strategy.

User Interface

The predicted next word app offers a simple user intereface.

The user needs to type a one word or more, and click submit. That will run the algoritm. Please allow a few seconds for the output to appear.

Useful links

Please feel free to check in detail the development of this application.

All the thecnical details that have been ommited for this presentation, can be found in the extra material.

Github repo with the code that generates the app.
Direct access to Shiny Server where the app is deployed.
My personal website: Felipe Ruiz.