2/14/2022

Motivation

  • Predicting the next word or sequence of words that a user intends to type is a great way to maximize efficiency and provide the best possible experience for the user.

Solution

  • Over time, countless words and strings of words have been written by individuals all around the world and in all sorts of formats (blogging, tweeting, etc.). Analyzing a large body of written sentences to understand how frequent a given word or sequence of words occurs, one can work to predict the next word given the probabilities yielded from such an analysis.

  • The Word Prediction application implements a rudimentary version of this concept.

Utility

  • The usefuleness of such an app requires a high degree of accuracy. Different models were compared in building this app, with different values chosen for how much of the corpora was sampled in training the model, and what proportion of the most frequent n-gram terms generated from the training data were selected. Sampling about 7.5% of the corpora and taking about 30% of the most frequent terms are where the values for these hyperparameters settled.

  • While the Word Prediction app does well for predicting the next word in very frequent terms (up to 90% for the 100 most frequent terms), the accuracy falls below 10% when predicting the next word in infrequent terms.

Specifics

  • This application specifically solicits a word or sequence of words from the user. The user then must press a “Predict” button for this model to make a prediction.

  • The model utilizes either the last 1, 2 or 3 words input by the user. If the user inputs 3 or more words, the model attempts to use 3 words, and then will move down to 2 and then 1 if using more words doesn’t yield a prediction.

  • The app outputs as many as 5 predicted words to increase the chance that the user’s next word is yielded.