Brynjólfur Gauti Jónsson
2018-02-10
The app is based on markov models. Given any state (input text) it tries to predict the next state (next text sequence).
Trained on a huge amount of data, made availiable by Swiftkey, using the map-reduce philosophy. A large dataset was split into smaller subsets. Each subset was preprocessed and then added back into one whole. This circumvents the problem of limited RAM any computer might have.
Simple but effective text prediction model. Because of its smaller chunk-sized preprocessing it can easily be distributed on any system, even by small laptops.
| word1 | word2 | word3 | word4 | word5 | word6 | n |
|---|---|---|---|---|---|---|
| at | the | end | of | the | day | 1044 |
| on | the | other | side | of | the | 516 |
| in | the | middle | of | the | night | 480 |
| all | you | have | to | do | is | 362 |
| this | is | going | to | be | a | 329 |
| thank | you | so | much | for | the | 303 |
| could | not | be | reached | for | comment | 302 |
| let | me | know | what | you | think | 299 |
| by | the | end | of | the | year | 254 |
| vested | interests | vested | interests | vested | interests | 250 |
| interests | vested | interests | vested | interests | vested | 249 |
| happy | mother’s | day | to | all | the | 240 |
| for | the | first | time | in | a | 237 |
| for | the | rest | of | my | life | 236 |
| rock | and | roll | hall | of | fame | 233 |
| for | the | rest | of | the | day | 228 |
| there | is | no | such | thing | as | 228 |
| happy | mothers | day | to | all | the | 225 |
| at | the | end | of | the | month | 219 |
| and | is | subject | to | change | or | 214 |
| as | is | and | is | subject | to | 214 |
| certain | content | that | appears | on | this | 214 |
| change | or | removal | at | any | time | 214 |
| content | is | provided | as | is | and | 214 |
| is | and | is | subject | to | change | 214 |
| is | provided | as | is | and | is | 214 |
| is | subject | to | change | or | removal | 214 |
| provided | as | is | and | is | subject | 214 |
| subject | to | change | or | removal | at | 214 |
| this | content | is | provided | as | is | 214 |
| to | change | or | removal | at | any | 214 |
| a | means | for | sites | to | earn | 213 |
| a | participant | in | the | amazon | services | 213 |
| advertising | and | linking | to | amazon.com | amazon.ca | 213 |
| advertising | fees | by | advertising | and | linking | 213 |
| amazon | eu | associates | programmes | designed | to | 213 |
| amazon | eu | this | content | is | provided | 213 |
| amazon | services | llc | and | amazon | eu | 213 |
| amazon | services | llc | and | or | amazon | 213 |
| amazon.ca | amazon.co.uk | amazon.de | amazon.fr | amazon.it | and | 213 |
| amazon.co.uk | amazon.de | amazon.fr | amazon.it | and | amazon.es | 213 |
| amazon.com | amazon.ca | amazon.co.uk | amazon.de | amazon.fr | amazon.it | 213 |
| amazon.de | amazon.fr | amazon.it | and | amazon.es | certain | 213 |
| amazon.es | certain | content | that | appears | on | 213 |
| amazon.fr | amazon.it | and | amazon.es | certain | content | 213 |
| amazon.it | and | amazon.es | certain | content | that | 213 |
| and | amazon | eu | associates | programmes | designed | 213 |
| and | amazon.es | certain | content | that | appears | 213 |
| and | linking | to | amazon.com | amazon.ca | amazon.co.uk | 213 |
| and | or | amazon | eu | this | content | 213 |
| word1 | word2 | word3 | word4 | word5 | n |
|---|---|---|---|---|---|
| at | the | end | of | the | 3624 |
| in | the | middle | of | the | 1809 |
| for | the | first | time | in | 1608 |
| the | end | of | the | day | 1298 |
| by | the | end | of | the | 1238 |
| for | the | rest | of | the | 1139 |
| thank | you | so | much | for | 1038 |
| is | going | to | be | a | 1020 |
| there | are | a | lot | of | 967 |
| it’s | going | to | be | a | 861 |
| thanks | for | the | shout | out | 858 |
| to | be | a | part | of | 833 |
| is | one | of | the | most | 811 |
| let | me | know | if | you | 793 |
| the | other | side | of | the | 786 |
| for | the | first | time | since | 769 |
| can’t | wait | to | see | you | 726 |
| the | end | of | the | year | 718 |
| at | the | top | of | the | 702 |
| this | is | going | to | be | 698 |
| i | can’t | wait | to | see | 692 |
| on | the | other | side | of | 684 |
| thanks | so | much | for | the | 631 |
| thank | you | for | the | follow | 619 |
| i | love | you | so | much | 610 |
| and | the | rest | of | the | 599 |
| for | those | of | you | who | 590 |
| has | nothing | to | do | with | 577 |
| in | the | middle | of | a | 571 |
| keep | up | the | good | work | 566 |
| but | at | the | same | time | 543 |
| i | thought | it | would | be | 541 |
| at | the | time | of | the | 534 |
| hope | you | have | a | great | 524 |
| the | middle | of | the | night | 517 |
| is | one | of | the | best | 513 |
| the | rest | of | the | world | 508 |
| at | the | beginning | of | the | 506 |
| at | the | bottom | of | the | 505 |
| to | be | one | of | the | 504 |
| this | is | one | of | the | 500 |
| the | rest | of | the | day | 496 |
| for | a | chance | to | win | 491 |
| to | figure | out | how | to | 487 |
| in | the | bottom | of | the | 483 |
| the | end | of | the | month | 482 |
| i | have | no | idea | what | 481 |
| if | you | would | like | to | 477 |
| thanks | for | the | follow | i | 461 |
| happy | mother’s | day | to | all | 445 |
If there is an arrow pointing from one word to the next, those words are likely to appear in sequence. The darker the arrow the higher the likelihood.
For any input text, it chooses the darkest arrow it can find and returns the word that it is pointing to.