Next Word Prediction App

Samuel Tandoh
April 27, 2026

Slide 1: Product Overview

Smart Next-Word Prediction App

Built using Natural Language Processing (NLP)
Trained on English blogs, news, and Twitter text
Uses a 5% sample of the corpus for efficient model development
Predicts the most likely next word from a user-entered phrase
Designed for fast, lightweight deployment in Shiny

Product value: helps users type faster by suggesting likely next words.

Slide 2: How the Model Works

The app uses an optimized n-gram backoff model.

Clean and tokenize the user's phrase
Try a trigram-context match using the last two words
If unavailable, back off to a bigram-context match
If still unavailable, use unigram frequency
Return the top suggested next words

predict_next_words <- function(input_text, top_n = 3) {
  cleaned <- clean_input(input_text)
  words <- unlist(strsplit(cleaned, " "))
  words <- words[words != ""]

  if (length(words) >= 2) {
    context_tri <- paste(tail(words, 2), collapse = " ")
    tri_match <- trigram_dt[context_tri]
    if (nrow(tri_match) > 0) return(head(tri_match$next_word, top_n))
  }

  if (length(words) >= 1) {
    context_bi <- tail(words, 1)
    bi_match <- bigram_dt[context_bi]
    if (nrow(bi_match) > 0) return(head(bi_match$next_word, top_n))
  }

  head(unigram_dt$word, top_n)
}

Slide 3: Quantitative Performance

The model was evaluated on 5,000 held-out test cases.

Metric	Result
Top-1 Accuracy	8.84%
Top-3 Accuracy	13.54%
Perplexity	514.06
Time for 1,000 predictions	1.50 seconds
Approx. seconds per prediction	0.0015
Total model size	28.41 MB

Interpretation: the model is lightweight and fast enough for an interactive Shiny app, while still providing useful next-word suggestions.

Slide 4: Shiny App Demonstration

How the user interacts with the app:

Type a phrase into the text box
Click the prediction button
View the predicted next word and top alternatives

output$prediction <- renderText({
  preds <- predict_next_words(input$user_text, top_n = 3)
  preds[1]
})

output$top_predictions <- renderTable({
  preds <- predict_next_words(input$user_text, top_n = 3)
  data.frame(
    Rank = seq_along(preds),
    Predicted_Word = preds
  )
})

Example:

Input: I am going
Prediction: to

Slide 5: Why This Product is Awesome

This product demonstrates the full data science pipeline:

Real-world text data acquisition and cleaning
Exploratory analysis of word and phrase patterns
Efficient predictive modeling using n-grams and backoff
Quantitative evaluation using accuracy, perplexity, runtime, and size
Deployment-ready Shiny interface

Future improvements:

Better contraction and slang handling
Interpolated n-gram probabilities
Personalized prediction history
Larger training sample or domain-specific data