Mahmoud Elsheikh
19-June-2019
a. The algorithm depends on calculating katz's back-off probabilities for combinations of prewords and current words,the words to be predicted, and store it as a probability matrix as per the below equation:
\[ P_{bo}(w_{i}|w_{i-n+1}...w_{i-1}) = \begin{cases} d_{w_{i-n+1...w_{i}}}\frac{C(w_{i-n+1}...w_{i-1}w_{i})}{C(w_{i-n+1}...w_{i})} \\\\\ if C(w_{w_{i-n+1...wi}}>k)\;\;k =1 \\\\\alpha_{w_{i-n+1}...w_{i-1}}P_{bo}(w_{i}|w_{i-n+2}...w_{i-1}) \\\\ otherwise \end{cases} \]
b. Multiple samples were taken from the data and multiple probability matrices were produced.
c. The probability matrices are incorporated in a word prediction function which in turn will look for the given sentence in the matrices cascade and return the most probable word for this preword as prediction.
d. If the algorithm didn't find a match of the preword it will return the most common word in English language which is “the”.
For more details on how the algorithm was built please visit:
To test the application please visit Single word application