2025-08-15

Objective

The purpose of this project is to implement a predictive text mining application.

Essentially, it takes text input and uses the input to determine the most likely next word.

In addition to completing this objective, the project also attempts to minimize both size and runtime in order to provide a reasonable user experience.

Model

This application uses an n-gram model in order to determine the probability of a word given the previous n-1 words. This model was built off of data scraped from news sources, blogs, and Twitter.

Katz’s backoff model is used to estimate the probability of unobserved n-grams so as to account for input not stored in the n-gram model.

Data Processing

Data filtering was performed in order to remove profane language.

N-gram models (up until 7-gram) were built and utilized as a basis for the predictive model.

The prediction table used for this application only retains unique queries and filtered out bigrams that had a frequency of less than 5.

Product Instructions

  1. Type in a word or phrase in the text input box.

  2. Press the “Submit” button below the input box.

  3. The output will be displayed below the submission.