Alan Whitelock-Jones
22 Jan 2016
The Prediction algorithm predicts the most likely next words and presents them as a list of buttons to choose from
The aim is to optimise the benefit of the predictions considering memory and speed.
The application has a vocabulary of the 5000 most common words (from the training set excluding profanities) and predicts the next word based on the 9 previous words typed in the phrase.
As well as predicting the next word, when you start typing the algorithm shows matches from the Vocabulary.
The numbers that gave the optimal output were:
#Parameters
size.vocab <- 5000
size.2.gram <- 10000
size.3.gram <- 5000
size.4.gram <- 4000
size.5.gram <- 3000
size.6.gram <- 2000
size.7.gram <- 1000
size.8.gram <- 1000
size.9.gram <- 1000
size.predicted <- 10
Further efficiencies gained by not storing more than 10 n-grams with the same first (n-1) words (as they would never rank in any prediction anyway)