09/01/2026

Slide 1: Problem & Motivation

Typing on mobile devices is slow and error-prone. Modern keyboards solve this problem by predicting the next word a user intends to type.

The goal of this project is to build a next-word prediction algorithm and deploy it as a Shiny web application that can be used interactively.

This application demonstrates how statistical language models can improve user experience in everyday typing tasks.

Slide 2: Data & Preprocessing

The prediction algorithm is trained on a large English text corpus containing data from three sources:

  • Blogs
  • News articles
  • Twitter posts

Before modeling, the data were cleaned by: - Converting text to lowercase - Removing punctuation and numbers - Removing stop words - Tokenizing text into n-grams (1-grams, 2-grams, and 3-grams)

These preprocessing steps reduce noise and improve prediction accuracy.

Slide 3: Prediction Algorithm

The application uses an n-gram language model to predict the next word.

How it works: - The user inputs a phrase (one or more words) - The model looks for the most frequent next word based on previously observed word sequences - If no match is found, the model backs off to lower-order n-grams

This approach balances accuracy and speed, allowing the app to return predictions quickly.

Slide 4: Shiny Application Overview

The Shiny application provides a simple and intuitive interface:

  • A text box for entering a phrase
  • A submit button to generate a prediction
  • A displayed prediction of the next word

The app runs on shinyapps.io and is accessible through a public URL. Predictions are generated dynamically based on user input.

The design emphasizes usability and fast response time.

Slide 5: User Experience & Future Work

The application provides accurate predictions for common English phrases from news and social media text.

User experience highlights: - Fast prediction response - Clean and minimal interface - Easy to use without technical knowledge

Future improvements could include: - Predicting multiple candidate words - Improving accuracy with larger datasets - Optimizing memory usage for scalability

This project demonstrates a practical and deployable data product.