The algorithm that I used is called Kaltz back off model, this algorithm uses the n-grams frequency tables and trough the probability chain rules of dependent events along with the Markov’s assumption, we could predict our next word given the last one or two words. This algorithm, by default, uses the higher level n-gram to match the one that has the highest probability given by their n-1 previous words. If there is no match, it will reduce the level of the n-gram until it finds a match using n-1 previous words. One of the problems of this kind of methods consist of estimating the probability of unobserved n-grams. In this case, this algorithm will discount a probability mass from observed n-grams and will distribute it to the unobserved n-grams thus avoiding getting zero probabilities which are far away from reality.