Slide 1 — Project Overview

This project demonstrates a simple next-word prediction system using a Bigram Language Model. The application takes a user-entered phrase and predicts the most likely next word based on patterns learned from large text datasets including blogs, news articles, and Twitter data.

The goal of this project is to showcase basic natural language processing, model building, and deployment using R and Shiny.


Slide 2 — Data Description

The model was trained on three English text sources: - Blogs demonstrated conversational and informal language - News articles provided structured and formal language - Twitter data added short, real-world text patterns

A random sample of 2,000 lines was selected from the combined dataset to keep the model lightweight and efficient for web deployment.


Slide 3 — Model Approach

The prediction model uses a Bigram Language Model: - Text is cleaned and converted to lowercase - Each sentence is split into word pairs (bigrams) - The frequency of each bigram is calculated - When a user enters a phrase, the model finds the most common word that follows the last word entered

If no match is found, a fallback word is returned.


Slide 4 — Shiny App Features

The web application provides: - A text input box for entering a phrase - A prediction button to generate the next word - A real-time display of the predicted word

The app is lightweight and designed to load quickly while maintaining prediction accuracy.


Slide 5 — Results and Future Improvements

This project demonstrates a complete workflow: - Data processing - Model creation - Web deployment

Future improvements may include: - Using trigrams instead of bigrams - Increasing training data size - Adding probability scores for predictions - Improving text cleaning and language handling