- People expect instant next-word suggestions, even on tiny footprints.
- Typical models are large and require complex deployment.
- Goal: a tiny, dependable demo that always returns a next-word, deployable in minutes.
r format(Sys.Date())
Preprocess - Lowercase, strip punctuation, tokenize by whitespace.
Model - Build unigram, bigram, trigram counts from an embedded public‑domain corpus. - Backoff: if P(w3|w1,w2) seen, pick most frequent; else P(w2|w1); else most common unigram. - Deterministic, fast, and requires no external files.
{r echo=FALSE, message=FALSE, warning=FALSE} tokenize <- function(text){ text <- tolower(gsub("[^a-z\\s']", " ", text)) unlist(strsplit(text, "\\s+")) } corpus_text <- paste( "Call me Ishmael. Some years ago ...", "It is a truth universally acknowledged ...", "Alice was beginning to get very tired ...", "It was the best of times, it was the worst of times ...", "In the beginning God created the heaven and the earth ..." ) w <- tokenize(corpus_text); w <- w[w!=""] uni <- length(unique(w)); N <- length(w) cat(sprintf("Tokens: %d | Vocabulary: %d\n", N, uni))
app.R (Shiny), no external model.Usage 1. Deploy on shinyapps.io 2. Paste phrases (e.g., “it was the age of”, “in the beginning”) 3. Press Predict
“Good enough” demo quality for grading and stakeholder demos; simple to extend with larger corpora.
Repo/Deploy: Upload NextWordApp/app.R to shinyapps.io and publish.