Next Word Prediction App

Samuel Tandoh
April 27, 2026

Slide 1: Product Overview

Smart Next-Word Prediction App

  • Built using Natural Language Processing (NLP)
  • Trained on English blogs, news, and Twitter text
  • Uses a 5% sample of the corpus for efficient model development
  • Predicts the most likely next word from a user-entered phrase
  • Designed for fast, lightweight deployment in Shiny

Product value: helps users type faster by suggesting likely next words.

Slide 2: How the Model Works

The app uses an optimized n-gram backoff model.

  1. Clean and tokenize the user's phrase
  2. Try a trigram-context match using the last two words
  3. If unavailable, back off to a bigram-context match
  4. If still unavailable, use unigram frequency
  5. Return the top suggested next words
predict_next_words <- function(input_text, top_n = 3) {
  cleaned <- clean_input(input_text)
  words <- unlist(strsplit(cleaned, " "))
  words <- words[words != ""]

  if (length(words) >= 2) {
    context_tri <- paste(tail(words, 2), collapse = " ")
    tri_match <- trigram_dt[context_tri]
    if (nrow(tri_match) > 0) return(head(tri_match$next_word, top_n))
  }

  if (length(words) >= 1) {
    context_bi <- tail(words, 1)
    bi_match <- bigram_dt[context_bi]
    if (nrow(bi_match) > 0) return(head(bi_match$next_word, top_n))
  }

  head(unigram_dt$word, top_n)
}

Slide 3: Quantitative Performance

The model was evaluated on 5,000 held-out test cases.

Metric Result
Top-1 Accuracy 8.84%
Top-3 Accuracy 13.54%
Perplexity 514.06
Time for 1,000 predictions 1.50 seconds
Approx. seconds per prediction 0.0015
Total model size 28.41 MB

Interpretation: the model is lightweight and fast enough for an interactive Shiny app, while still providing useful next-word suggestions.

Slide 4: Shiny App Demonstration

How the user interacts with the app:

  1. Type a phrase into the text box
  2. Click the prediction button
  3. View the predicted next word and top alternatives
output$prediction <- renderText({
  preds <- predict_next_words(input$user_text, top_n = 3)
  preds[1]
})

output$top_predictions <- renderTable({
  preds <- predict_next_words(input$user_text, top_n = 3)
  data.frame(
    Rank = seq_along(preds),
    Predicted_Word = preds
  )
})

Example:

Input: I am going
Prediction: to

Slide 5: Why This Product is Awesome

This product demonstrates the full data science pipeline:

  • Real-world text data acquisition and cleaning
  • Exploratory analysis of word and phrase patterns
  • Efficient predictive modeling using n-grams and backoff
  • Quantitative evaluation using accuracy, perplexity, runtime, and size
  • Deployment-ready Shiny interface

Future improvements:

  • Better contraction and slang handling
  • Interpolated n-gram probabilities
  • Personalized prediction history
  • Larger training sample or domain-specific data