Final Project - Coursera Data Science

2025-09-04

1. Problem & Goal

Writing on devices benefits from predictive text (autocomplete).
Goal: Build a model that predicts the next word in a phrase.
Data: SwiftKey English corpus (blogs, news, Twitter).
Deliverable: A Shiny app for real-time prediction.

2. Data & Algorithm

Preprocessing: sampled data, cleaned, tokenized, profanity filtered.
Model: n-gram language model (bigrams, trigrams, fourgrams).
Backoff: Katz backoff — 4-gram → 3-gram → 2-gram → unigram fallback.
Efficiency: pruned rare n-grams; keyed lookups with data.table.

3. Evaluation

Tested with Twitter/news-style phrases.
The app always returned a prediction (no blanks).
Predictions appear instantly during typing (smooth UX).
Model is small enough for shinyapps.io free tier.
Interface is simple and intuitive (top-1 + suggestion buttons).

4. The App (How to Use)

Type/paste an English phrase and click Predict (or pause briefly).
The app shows the top-1 prediction and up to 4 suggestions to append.
Live App: https://mustafemoh.shinyapps.io/final_project_data_sceince/
Source Code: https://github.com/mustafemoh/mustafe_final_project_ds

5. Why it Matters

Value: practical, lightweight NLP demo; real-time and deployable.
Uses: typing assistants, chat/email composition, mobile UX.