Enrique Reveron
2016-07-17
Predictor! is a Natural Language Processing (NLP) App that predicts the next word to be typed by the user based on a US dataset of blogs, twitter and news datasets (the Corpora).
This application was built as part of the Capstone Switfkey Project the final stage of the Coursera Data Science Specialization.
The app is available at: https://ereveron.shinyapps.io/Predictor/
The Source Code for the App and all the related files are available on the GitHub repo: https://github.com/EReveron/Coursera—Data-Science—Capstone-Project
The most important aspects related with Predictor! App are:
We use the following Kneser-ney implementation for:
Lowest Order (Unigrams) Equation:
\[ {P_{KN}^{1}(w_{i})} = \frac{N_{1+}({\bullet} w_{i})} { N_{1+}({\bullet} {\bullet})} \]
Second Highest Ngrams to Bigrams Equation:
\[ P_{KN}^{n}(w_{i}{\mid}w_{i-n+1}^{i}) = \frac{max\left\{ N_{1+}(\bullet w_{i-n+1}^{i} )-{\delta}_{n},0\right\} } { N_{1+}(\bullet w_{i-n+1}^{i-1}\bullet)} + \frac{{\delta}_{n}} {N_{1+}(\bullet w_{i-n+1}^{i-1}\bullet)} N_{1+}(w_{i-n+1}^{i-1}\bullet)P_{KN}^{n-1}(w_{i}{\mid}w_{i-n+2}^{i-1}) \]
Highest Order Equation:
\[ P_{KN}^{n}(w_{i}{\mid}w_{i-n+1}^{i}) = \frac{max\left\{c(w_{i-n+1}^{i}) -{\delta}_{n},0\right\} } { \sum_{w'_{i}} c(w_{i-n+1}^{i-1},{w'_{i}})} + \frac{{\delta}_{n}} {\sum_{w'_{i}} c(w_{i-n+1}^{i-1},{w'_{i}})} N_{1+}(w_{i-n+1}^{i-1}\bullet )P_{KN}^{n-1}(w_{i}{\mid}w_{i-n+2}^{i-1}) \]
The app offer to the user several parameters to choose:
And also provide: