Gaurav Garg (gaurav_garg@yahoo.com)
Oct 2016
Prediction Algorithm
We apply Kneser-Ney Smoothing to all the probablities, in the lookup table, instead of absolute discounting to factor for unseen phrases in the training set.
\[ pKN(w_i | w_{i-n+1}^{i-1}) = \frac{max(c(w_{i-n+1}^{i-1}, w_i) - \delta,0)}{\sum\limits_w' c(w_{i-n+1}^{i-1}, w')}+ \delta\frac{|{w':0 < c(w_{i-n+1}^{i-1}, w')}|}{\sum\limits_{w_i} c(w_{i-n+1}^i)} pKN(w_i|w_{i-n+2}^{i-1}) \]
1 https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing 2 http://www.foldl.me/2014/kneser-ney-smoothing/
The MVP proves, we can find patterns in natural unstructured text like news, twitter and blogs with open source, hobbled software with limited resources.
With Free account, we could use less than 1% of the corpus for training. By increasing the volume of training data, we increase the efficiency of our algorithm.
In healthcare industry, clinical notes are the treasure trove of information. In order to increase our performance, we need:
Request for budget: $30,000