Innuganti
Nov 29, 2018
The Objective of this Capstone Project is to build an application that anticipates next word which can be deployed in a Shiny app. A smart word prediction Shiny app is built using transformed corpus of text files.
This project was the final project of Data Science Specialization by John Hopkins University on Coursera and it is an industry partnership with SwiftKey.
And data can be obtained from here.
class: small-code Invloved the following steps:
Katz Backoff Model:
This is a non-linear method which allows us to calculate the conditional probability of a word against its history. This method follows 'Good Turning Discounting' means redistributing some probability of higher order N-gram to lower-order N-gram.
This algorithm uses quad gram if the evidence is sufficient, otherwise it uses trigram, otherwise bigram then unigram. We continue backing off until we reach a history that has some counts.