The Statistics Behind the Prediction
n-grams are tables of n-words sequences. For example, to create 4-gram table, the source text is broken to sequences of 4 words. For example, “The weather channel website is down.” will be inserted into a table of four columns, where each row keep one sequence of 4 words. Higher frequency x (see below) means higher chances that Word4 will appear right after the sequence of Word1 to Word3.

To check the accuracy of prediction, two tests were applied with the model, both with a testing dataset of 20,000 text rows. One checked the accuracy during typing, comparing any predicted next word to the real next word in the data. The second test, checked the accuracy of predicting only the last word of each row. For the first, the accuracy is ~20%, and for the second it is lower, ~14%, since the model prediction also suggest conjunctions and stop words that cannot fit as last words. Improving the model probably required adding context.