PredType - DS Capstone Final Project

2026-01-03

Synopsis

This project was created for the Developing Data Products course as part of the Data Science Specialization offered through Coursera from Johns Hopkins University.

The source code files for this project can be found on GitHub:

GitHub - AakashaAananda

Course Project

The course project is a two part peer-graded assignment:

Create a Shiny application and deploy it on RStudio’s servers
Use Slidify or RStudio Presenter to prepare a reproducible pitch presentation about your application.

The name of the Shiny application developed for this project is the **PredType App* and is hosted on RStudio’s shinyapps.io hosted service:

PredType - Shiny App

PredType App

Writing on mobile devices and web interfaces often suffers from latency and human error. Our goal was to build a tool that:

Reduces Keystrokes: Anticipates user intent before they finish typing.

Understands Context: Adapts suggestions based on the topic (Sports, Social, etc.).

Preserves Flow: Handles capitalization and grammar naturally.

User Friendly: Provides the top three predicted words for the user to just click on to add while typing.

The Solution: An optimized NLP engine that balances massive linguistic data with real-time performance.

Algorithm

PredType uses a Large-Scale N-Gram Back-off Model trained on over 600MB of diverse text data.

N-Grams: We utilize Trigrams (3-word sequences) and Bigrams (2-word sequences) to find patterns.

Smart Back-off: If a 3-word pattern isn’t found, the app “backs off” to 2-word patterns, and finally to the most frequent 1-word fallbacks.

Contextual Boosting: A unique “Topic Engine” scans your recent words and multiplies the scores of relevant vocabulary (e.g., typing “wife” boosts relationship-related words).

Dictionary Coverage: The model utilizes a curated vocabulary of 7,000 unigrams, capturing approximately 90% of the lexical instances found in the training corpus. .

Algorithm

Small Memory Footprint: a Total of ~120 MB for the application and the n-grams lookup tables loaded.

Low Latency Rate : Less than 150 ms average latency (after ~1 sec initial loading of the app and the tables).

Conclusion

Do try it out once…PredType