2025-09-16

Slide 1 - Title

Next-Word Predictor

A lightweight n-gram based next-word prediction demo app (Shiny)

2025-09-16 20:50:33

Slide 2 - Problem

We want to provide a simple interface for predicting the next word given a short input phrase. This is useful for autocomplete, typing assistance, or as an educational demo for language models.

Slide 3 - Approach & Algorithm

We use a classic n-gram model with backoff: - Build unigram, bigram, trigram counts from a text corpus. - To predict: try matching the last two words (trigram); if none, backoff to last word (bigram); else use most frequent unigram. - Simple, fast, and interpretable. Replace the demo corpus with a larger dataset for better accuracy.

This slide includes a small R computation:

# quick demo: top 5 unigrams from the demo corpus
corpus <- readLines('corpus.txt', warn=FALSE)
library(stringr); toks <- unlist(str_split(tolower(corpus), '\\s+'))
toks <- gsub('[^a-z0-9[:space:]]', '', toks)
toks <- toks[toks != '']
table(sort(toks)) |> sort(decreasing=TRUE) |> head(5)
## 
## the   a new for  to 
##  60  16   9   8   8

Slide 4 - App & Instructions

How to run the Shiny app: 1. Place ui.R, server.R, and corpus.txt in the same directory. 2. In R: setwd('path/to/app'); shiny::runApp() 3. To deploy to shinyapps.io, use rsconnect::deployApp('.')

App features: - Text input box to enter phrase - Predict button (to control when prediction runs) - Documentation and example phrases embedded in the UI

Slide 5 - Demo Results & Next Steps

User experience: straightforward — type phrase, press Predict, see next-word suggestion.

Next steps to improve: - Train on a much larger and diverse corpus (news, tweets). - Implement smoothing (Kneser-Ney) and add probability scores. - Return top-k suggestions with confidence scores. - Deploy and measure real-world performance with user data.