2024-08-22

Description

The following is a language model trained on a vast dataset. It’s function is to predict the next word of a given phrase or sentence.

Structure

The model uses n-gram frequency tables from unigrams all the way to fivegrams. These tables contain an ‘add-one smoothing’ probability that is used to calculate the next word.

The datasets were preprocessed by converting all text to lowercase, filtering out profanity, and removing digits and punctuation.

Performance

The model retrieves the last four words of the entered text and then searches for the most probable match. It utilizes a backoff model, first searching for a match in fivegrams and then backing off to lower n-grams if no match is found.

Try it out!

The model is ready for testing. Please allow a few seconds for the word prediction to be generated.