Yevgeny V. Yorkhov
2016/04/17
The example:
“The guy in front of me just bought a pound of bacon, a bouquet, and a case of beer”. After text cleaning we get:
“guy” “front” “just” “bought” “pound” “bacon” “bouquet” “case” “beer”
We could have the following phrases where the words case and beer are connected to each other and can be combined into a bigram:
As well as generating bigrams we could generate tri/quad/penta-grams, but we can violate the time and memory constraints in that case.
Phase #1
Phase #2
\( \\ P(W_{i-4}...W_i) = \lambda_1 C(W_{i-1},W_i) + \lambda_2 C(W_{i-2},W_i) + \lambda_3 C(W_{i-3},W_i) + \\ \\ \lambda_4 C(W_{i-4},W_i) + \lambda_5 P_{smoothing} \\ \)