agrou
June 7th, 2017
Aim: Build app that predicts next word based on the user input
hello how are ...
how are ...
are ...
Do frequent words have higher probability of being next word?
How many paramethers should we consider?
How do we adjust for the context in which the word appears?
Probability-based algorithm based on the n-gram model
From 5 to 1 preceding words to predict the next word.
Score based on a Stupid-backoff index or weight of prediction1
| n-gram | Score |
|---|---|
| hello how are you | 1 |
| how are you | 0.4 |
| are you | 0.16 |
| you | 0.02 |
Returns list ordered by score: 1st word has highest score
[1]: Brants, T., Popat, A. C., Xu, P., Och, F. J., and Dean, J. (2007). Large language models in machine translation. In EMNLP/CoNLL 2007.
| Corpus sample size | App Responsiveness in seconds | Training Accuracy | Testing Accuracy |
|---|---|---|---|
| 10% | 0.02 | 40% | 30% |
Accuracy: Measured in 99% and 1% of the sample corpus
Accuracy is low compared to latest Swiftkey dashboard developments.
Future developments: The ideal algorithm uses a bigger sample size and learns with the user input. This could require a lot of memory usage that a shiny server is not suitable to handle.
Check it out here
Features
User experience:
Server answer:
Coursera's Data Science Specialization community: