Charles McGuinness, December 2014
I designed the application with several goals in mind:
The application keeps score in a “hangman” like game. To use it, you enter a phrase into the input box and then press the “Predict Next Word” button. After a brief calculation, the program will display the predicted word. The user then compares the word predicted by the program with word from their test case. If they match, the user presses the “Yes” button, if not, the “No” button. A “Let's Start over” button is available to reset the counts if needed.
As the tests progress, the program keeps score and updates a drawing of a “hangman” character. After five tests (the definition of a round), the program either celebrates its success or laments its failure.
The prediction algorithm runs on a pre-computed set of n-grams. The final model's n-grams are derived from all 5- and lower n-grams produced by parsing the entire corpora.
The initial, very large list of n-grams is pruned in three steps:
At run time, a phrase is entered into the user interface and fed to the prediction algorithm, which breaks the phrase into individual words and follows these steps: