Problem Statement

Typing on mobile devices is slow and error-prone.
This project builds a next-word prediction model similar to SwiftKey, using real-world English text data from blogs, news, and Twitter.

The goal is to improve typing speed by predicting the most likely next word based on previous input.

Data Used

  • Blogs – long-form writing
  • News – formal language
  • Twitter – short, informal text

Data was cleaned by: - Converting text to lowercase - Removing punctuation and numbers - Removing extra whitespace

Prediction Algorithm

  • Built N-gram language models (1-gram, 2-gram, 3-gram)
  • Calculated word probabilities
  • Used back-off strategy when higher-order n-grams are unavailable

The model predicts the most probable next word for a given phrase.

Shiny App Overview

  • User enters a phrase in a text box
  • Clicks Submit
  • App returns a predicted next word

The app is deployed on shinyapps.io and responds in real time.

Conclusion & Future Work

  • Model demonstrates practical text prediction
  • App provides a simple and intuitive interface

Future improvements: - Larger training data - Better smoothing techniques - Faster prediction response