Coursera Data Science Specialization Capstone Project

author: Sang Myong Lee date: 2025-02-01

The Project

This project uses Natural Language Processing (NLP).

The critical task is to take a user’s input phrase and output a predicted next word(s).

This presentation features the NLP Next Word Predict application including an instruction tab to

the application user interface and details about the text prediction algorithm.

Project deliverables:

NLP (Next Word) Prediction Model, is basis for this application.
NLP Prediction App hosted at shinyapps.io
This presentation hosted at R pubs

NLP Next Word Prediction Model

The next word prediction model uses the principles of “tidy data” applied to text mining in R. Key model steps:

Input: raw text files for model training
Clean training data; separate into 2 words, 3 words, and 4 words n-grams.
Sort n-grams by frequency, and save them as data repos
N-grams function: uses a “back-off” type prediction model

user supplies an input phrase
model uses the last 3, 2, or 1 word to predict the best 4th, 3rd, or 2nd match in the repos

Output: next word prediction

Benefits: easy to read code; uses “pipes”; fast processing of training data; able to sample up to 25% of original corpus; relatively small output repos

NLP Prediction Application for Next Word

The next word prediction application provides an easy-to-use user interface to the next word prediction model.

Top Features:

Text box for user input
Predicted next word outputs dynamically below user input
Tabs with plots of most frequent n-grams in the data-set
Side panel with user instructions

Overall Benefits:

Fast response
Method allows for large training sets leading to improve next-word predictions and user experience

Demo Application:

NLP Shiny App Link