Overview

Our next-word prediction app, built for the Data Science Capstone, offers a user-friendly tool to predict the next word in a phrase. Key features: - Input: Users type a phrase in a text box. - Output: Predicts the next word with a table of top alternatives. - Purpose: Enhances text input for apps like keyboards or chatbots. - Value: Fast, accurate predictions improve user efficiency.

Hosted on shinyapps.io, it’s accessible to all.

Algorithm Description

The app uses a backoff n-gram model trained on en_US.blogs.txt from the HC Corpora dataset: - N-grams: 1 to 4-grams (unigrams to fourgrams) for word sequences. - Process: - Clean input (lowercase, remove punctuation, stopwords). - Match input to n-grams (4-gram first, backoff to 3, 2, 1-gram). - Use frequency-based scoring with backoff penalties (0.4 per step). - Why?: Fast, memory-efficient for shinyapps.io constraints.

# Example: Predict next word
input <- "the quick brown"
last_3 <- "quick brown fox"  # 4-gram match
score <- fourgram_freq["the quick brown fox"] / sum(fourgram_freq)

App Functionality

See it live: [shinyapps.io link placeholder].

How to Use

  1. Visit the app on shinyapps.io.
  2. Enter a phrase in the text box (e.g., “the sun is”).
  3. Click “Predict” to see the next word and top alternatives.
  4. Use predictions to complete sentences or explore suggestions.
## Warning: package 'dplyr' was built under R version 4.5.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Example Prediction
Input Predicted Alternatives
the sun is shining bright, warm, rising

Why Invest?

Hire us to build smarter, faster text prediction tools for your startup!