Next-Word Predictor
A lightweight n-gram based next-word prediction demo app (Shiny)
2025-09-16 20:50:33
2025-09-16
A lightweight n-gram based next-word prediction demo app (Shiny)
2025-09-16 20:50:33
We want to provide a simple interface for predicting the next word given a short input phrase. This is useful for autocomplete, typing assistance, or as an educational demo for language models.
We use a classic n-gram model with backoff: - Build unigram, bigram, trigram counts from a text corpus. - To predict: try matching the last two words (trigram); if none, backoff to last word (bigram); else use most frequent unigram. - Simple, fast, and interpretable. Replace the demo corpus with a larger dataset for better accuracy.
This slide includes a small R computation:
# quick demo: top 5 unigrams from the demo corpus corpus <- readLines('corpus.txt', warn=FALSE) library(stringr); toks <- unlist(str_split(tolower(corpus), '\\s+')) toks <- gsub('[^a-z0-9[:space:]]', '', toks) toks <- toks[toks != ''] table(sort(toks)) |> sort(decreasing=TRUE) |> head(5)
## ## the a new for to ## 60 16 9 8 8
How to run the Shiny app: 1. Place ui.R
, server.R
, and corpus.txt
in the same directory. 2. In R: setwd('path/to/app'); shiny::runApp()
3. To deploy to shinyapps.io, use rsconnect::deployApp('.')
App features: - Text input box to enter phrase - Predict button (to control when prediction runs) - Documentation and example phrases embedded in the UI
User experience: straightforward — type phrase, press Predict, see next-word suggestion.
Next steps to improve: - Train on a much larger and diverse corpus (news, tweets). - Implement smoothing (Kneser-Ney) and add probability scores. - Return top-k suggestions with confidence scores. - Deploy and measure real-world performance with user data.