Pitch: Next-Word Prediction App

Overview

Our next-word prediction app, built for the Data Science Capstone, offers a user-friendly tool to predict the next word in a phrase. Key features: - Input: Users type a phrase in a text box. - Output: Predicts the next word with a table of top alternatives. - Purpose: Enhances text input for apps like keyboards or chatbots. - Value: Fast, accurate predictions improve user efficiency.

Hosted on shinyapps.io, it’s accessible to all.

Algorithm Description

The app uses a backoff n-gram model trained on en_US.blogs.txt from the HC Corpora dataset: - N-grams: 1 to 4-grams (unigrams to fourgrams) for word sequences. - Process: - Clean input (lowercase, remove punctuation, stopwords). - Match input to n-grams (4-gram first, backoff to 3, 2, 1-gram). - Use frequency-based scoring with backoff penalties (0.4 per step). - Why?: Fast, memory-efficient for shinyapps.io constraints.

# Example: Predict next word
input <- "the quick brown"
last_3 <- "quick brown fox"  # 4-gram match
score <- fourgram_freq["the quick brown fox"] / sum(fourgram_freq)

App Functionality

Interface: Simple Shiny app with a text input box and “Predict” button.
Output: Displays the top predicted word and a table of alternatives.
Example:
- Input: “I love to”
- Output: Predicted word (“eat”), table (e.g., “eat”, “go”, “see”).
Deployment: Hosted on shinyapps.io for public access.
Speed: Lightweight model ensures quick predictions.

See it live: [shinyapps.io link placeholder].

How to Use

Visit the app on shinyapps.io.
Enter a phrase in the text box (e.g., “the sun is”).
Click “Predict” to see the next word and top alternatives.
Use predictions to complete sentences or explore suggestions.

## Warning: package 'dplyr' was built under R version 4.5.1

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Example Prediction
Input	Predicted	Alternatives
the sun is	shining	bright, warm, rising

Why Invest?

Novelty: Simple yet effective n-gram model, optimized for shinyapps.io’s limits.
Impact: Improves text input for mobile apps, chatbots, or accessibility tools (e.g., SwiftKey-like technology).
Scalability: Can extend to other datasets (Twitter, news) or languages.
Team: Combines NLP expertise with user-focused design.

Hire us to build smarter, faster text prediction tools for your startup!