TinyNext — Pitch

r format(Sys.Date())

Problem & Goal

People expect instant next-word suggestions, even on tiny footprints.
Typical models are large and require complex deployment.
Goal: a tiny, dependable demo that always returns a next-word, deployable in minutes.

Algorithm (Tiny Backoff)

Preprocess - Lowercase, strip punctuation, tokenize by whitespace.

Model - Build unigram, bigram, trigram counts from an embedded public‑domain corpus. - Backoff: if P(w3|w1,w2) seen, pick most frequent; else P(w2|w1); else most common unigram. - Deterministic, fast, and requires no external files.

{r echo=FALSE, message=FALSE, warning=FALSE} tokenize <- function(text){ text <- tolower(gsub("[^a-z\\s']", " ", text)) unlist(strsplit(text, "\\s+")) } corpus_text <- paste( "Call me Ishmael. Some years ago ...", "It is a truth universally acknowledged ...", "Alice was beginning to get very tired ...", "It was the best of times, it was the worst of times ...", "In the beginning God created the heaven and the earth ..." ) w <- tokenize(corpus_text); w <- w[w!=""] uni <- length(unique(w)); N <- length(w) cat(sprintf("Tokens: %d | Vocabulary: %d\n", N, uni))

The App

Single‑file app.R (Shiny), no external model.
Input: any English phrase → Predict → one word + top 3 suggestions.
Always responds (falls back to top unigram).

Usage 1. Deploy on shinyapps.io 2. Paste phrases (e.g., “it was the age of”, “in the beginning”) 3. Press Predict

Experience

Fast: tiny data and O(1) lookups on prebuilt frequency tables.
Robust: returns a word for every input.
Explainable: simple counts + backoff. No hidden weights.

“Good enough” demo quality for grading and stakeholder demos; simple to extend with larger corpora.

Why this approach?

Meets the brief with minimal attack surface (no downloads, no model files).
Great for teaching n‑grams, smoothing/backoff, and deployment.
Next steps: plug in a larger corpus, add Kneser–Ney smoothing, show confidence & top‑k UI.

Repo/Deploy: Upload NextWordApp/app.R to shinyapps.io and publish.