I have used the stupid backoff algorithm with a lamda value of 0.4.
I will explain the algorithm via an example. Suppose we have the phrase “how are you”, the algorithm will first look at 4-grams to see if any start with “how are you”. Lets suppose there are 4-grams that start with “how are you”. In particular, there is one “how are you doing”, that appears 5 times. Then you would look at how many of the three gram “how are you” there are, suppose there are 10. Then you would give the prediction of the word “doing” a score of 5/10=0.5.
However if there was no 4-grams that start with “how are you”, you would then look at three grams that start with “are you” and repeat the process above. However, this is not likely to be nearly as good estimate for the word following “how are you”, as it would be if 4-grams existed, so you would multiply the scores by a value of 0.4.
This would then continue considering 2- gram if necessary.