Next Word Prediction App

Introduction

This project demonstrates a next-word prediction app.
Users input a phrase, and the app predicts the most likely next word.
Built as part of the Data Science Capstone Project using the SwiftKey dataset.
Goal: Mimic real-world mobile keyboard prediction.

Data and Preprocessing

Data sourced from blogs, news, and Twitter (en_US).
Steps taken:
- Lowercasing, punctuation & number removal.
- Profanity filtering.
- Tokenization and n-gram (unigram to trigram) creation.
Sampling used to reduce computational load.

Algorithm Overview

Uses n-gram language modeling (mainly bigram & trigram).
Stupid backoff algorithm:
- Try trigram → backoff to bigram → backoff to unigram.
Predictions ranked by frequency of occurrence.
Fast, lightweight, and interpretable.

Shiny App: How It Works

Built using R Shiny and hosted on shinyapps.io.
Enter a phrase into the text box.
The app returns the predicted next word.
Real-time response or upon clicking submit.
[https://j7f6j5-stephen0ayiah-fletcher.shinyapps.io/project_course/]

Final Notes

The app demonstrates a basic NLP pipeline.
Can be expanded with:
- Deep learning (e.g., LSTMs or transformers)
- User personalization
Clean UI for non-technical users.
[Insert RPubs slide link if needed]

Thank you!