Next Word Prediction Application

Project Overview

Develop a predictive text application that predicts the next word using N-gram language models.

Dataset Sources

Blogs
News
Twitter

Objective

Provide fast and accurate next-word predictions through an interactive Shiny application.

Exploratory Data Analysis

Dataset Summary

Source	Lines
Blogs	899,288
News	1,010,206
Twitter	2,360,148

Key Findings

Word frequencies follow Zipf’s Law.
Frequent bigrams and trigrams improve prediction accuracy.
A small vocabulary covers most text.

Prediction Algorithm

Model

Unigram
Bigram
Trigram
Backoff Strategy

Flow

Input → Trigram → Bigram → Unigram → Prediction

Shiny Application

Features

User enters a phrase.
Predicts the next word.
Real-time response.

Example

one of → the
going to → be
thank you → for

Results and Conclusion

Performance

Accuracy: 100%
Runtime: < 0.01 sec
Model Size: 8.56 MB

Future Work

4-gram models
Better smoothing
Larger datasets