Felix G Lopez
September 20th, 2024
This presentation features the Next Word Predict app including an introduction to the application user interface and details about the text prediction algorithm.
The Next Word Predict app is located at:
An n-gram refers to a sequence of n consecutive words within a text.
The predictive model was developed using a substantial dataset comprising blogs, news articles, and tweets. From this dataset, n-grams were extracted and used to train the predictive model.
Various approaches were investigated to enhance both the performance and accuracy of the model, utilizing techniques from Natural Language Processing (NLP) and text mining.
During this process, the data was converted to lowercase, and non-ASCII characters, URLs, email addresses, Twitter handles, hashtags, ordinal numbers, punctuation, and extra whitespace were removed. The cleaned data was then tokenized into n-grams.
When a user enters text, the algorithm processes from the longest n-gram (4-gram) down to the shortest (2-gram) to find a match.
The next word prediction is based on the most frequent, longest matching n-gram. The model utilizes a straightforward back-off mechanism for prediction.
Please allow a few seconds for the predictions to be generated. The slider allows users to select up to three predictions, with the top choice appearing first, followed by the second and third likely next words.
I thank you for taking your time to evaluate this.