author : Brad Allen
This report summarizes the Swiftkey / JHU Coursera Capstone project. The goal of the project is to develop a working predictive text application that is hosted on RStudio's ShinyApps servers.
Having never approached a Natural Language Processing (NLP) problem before, I was very curious about the process of modeling language.
We used the Heliohost HC Corpora as our background text, and I referred frequently to a Stanford NLP Smoothing Tutorial.
The following few slides will go through my approach to the problem. For my solution, I will show:
You can find an interactive companion at my shinyapps page (bradaallen).
This model works by first matching a database of different 'n-grams' to the provided text, and then assigning strengths to different outcomes based on how the match takes place.
For example, take the phrase, “The quick brown fox jumps over the lazy dog.” If I have typed “The quick brown fox jumps…” my backoff model would first look at “brown fox jumps” (a 3-gram), then “fox jumps” (a 2-gram), then “jumps” (a 1-gram) - if a match occurs in those three lookups, that answer is provided.
When visiting the site, enter text into the provided box to see what match might occur.
More detail on these thoughts can be found at the application itself. Thank you!