8/6/2021

Method to the Algorithym

The text prediction algorithm uses backoff method to recommend the next word. Because of app space, only 4,3,2, and 1 grams were used.

The backoff method recommends highest ranking gram with the highest frequency.

If no higher gram is observed, the next highest is computed until no Ngram is found. Then a unigram is selected at random from the top 20 most frequent unigrams.

The output is a table of the top 5 most probable predictions.

Computational Time and Accuracy

A test set was set aside from the overall corpus to test the accuracy of the algorythm. 500 random sentences were selected from the test set, then given random words in the sentence to predict. The results are shown below

  • Average Computational Time: 0.03s
  • Accuracy of the Top Test Predictor: 40.23%
  • Accuracy within the Top 5 Predictors: 50.2

How the App Works

The user is to text or put a string into the Enter Text field.

When the user selects predict it will generate the predicted word, a plot of the possible words found in the Ngrams and a table of the probility of selecting that word.

Expansion Plans

I plan to incorporate a more robust prediction that will use the most likely 2 or 3 gram rather than just selecting the highest Ngram. This would fall into the Kat’s Backoff Method where a discount factor is applied to each prediction.