Our next-word prediction app, built for the Data Science Capstone, offers a user-friendly tool to predict the next word in a phrase. Key features: - Input: Users type a phrase in a text box. - Output: Predicts the next word with a table of top alternatives. - Purpose: Enhances text input for apps like keyboards or chatbots. - Value: Fast, accurate predictions improve user efficiency.
Hosted on shinyapps.io, it’s accessible to all.
The app uses a backoff n-gram model trained on
en_US.blogs.txt from the HC Corpora dataset: -
N-grams: 1 to 4-grams (unigrams to fourgrams) for word
sequences. - Process: - Clean input (lowercase, remove
punctuation, stopwords). - Match input to n-grams (4-gram first, backoff
to 3, 2, 1-gram). - Use frequency-based scoring with backoff penalties
(0.4 per step). - Why?: Fast, memory-efficient for
shinyapps.io constraints.
# Example: Predict next word
input <- "the quick brown"
last_3 <- "quick brown fox" # 4-gram match
score <- fourgram_freq["the quick brown fox"] / sum(fourgram_freq)
See it live: [shinyapps.io link placeholder].
## Warning: package 'dplyr' was built under R version 4.5.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
| Input | Predicted | Alternatives |
|---|---|---|
| the sun is | shining | bright, warm, rising |
Hire us to build smarter, faster text prediction tools for your startup!