Capstone project text prediction with Swiftkey

Derek Corcoran
January 24, 2016

plot of chunk unnamed-chunk-2

To analize the data the quanteda package was used the main reasons to use this package were

Bellow an example of the frequency of the most comon fivegrams, fourgrams, and trigrams.

The model will count the number of words (n) of the written sentence
see what is the most frequent sentences that start exactly the same with n+1 words and give those words as probable next words
If there is no such sentence it will take out the first words of the sentence (eg from “i won a”“ to "won a”) and see the same algorithm.
If not even one word matches the first word of a bigram (Last Chance) it will recomend the 6 most used words in english