N-gram modeling based on Markov assumptions. [1]
Uses Maximum Likelihood Estimates at its core.
i.e. 5-gram: \(P\left( w_{5} | w_{1}w_{2}w_{3}w_{4}\right)\) ; 2-gram: \(P\left( w_{2} | w_{1}\right)\)
Very sparse at 3-gram and higher.
Uses linear interpolation modeling to smooth out predictions.
i.e. 5-gram: \(P\left( w_{5} | w_{1}w_{2}w_{3}w_{4}\right) = \lambda_{1}P\left( w_{5} | w_{1}w_{2}w_{3}w_{4}\right)\) + \(\lambda_{2}P\left( w_{5} | w_{2}w_{3}w_{4}\right)\) + \(\lambda_{3}P\left( w_{5} | w_{3}w_{4}\right)\) + \(\lambda_{4}P\left( w_{5} | w_{4}\right)\)
This is done at 2-5 words entered until 5+ words are entered, then 5-gram modeling is done for the most recent 5 words entered.