Data Science Capstone

Juan Jose Suarez Estrada
July 13th, 2020

Problem

Predict the next word when someone is typing on a mobile device.

Examples:

“I love” -> “you”

“It is assumed” -> “that”

Solution

Use the most frequent combinations of two, three and four words. (n-grams)

Here we can see the most common word pairs:

    feature   w1  w2 frequency rank docfreq group
1    of_the   of the      7065    1    5653   all
2    in_the   in the      6490    2    5448   all
3    to_the   to the      3347    3    3026   all
4    on_the   on the      2909    4    2661   all
5   for_the  for the      2722    5    2533   all
6     to_be   to  be      2444    6    2236   all
7    at_the   at the      2100    7    1962   all
8   and_the  and the      2005    8    1851   all
9      in_a   in   a      1936    9    1805   all
10 with_the with the      1709   10    1602   all

Most common words

Around 8.000 different words make up for the 90% percent of words we type in. This are generally articles, pronouns and prepositions that do not add much meaning to a sentence.

plot of chunk unnamed-chunk-3

How can this help you?

This product can predict 3 out of 20 words you type in.

You don't have to manually type in 15% of the words, saving up time and allowing you to focus on the specifics of your text rather than on grammatical details.

You can use a live demo on Shiny Apps where you can enter a sentence and get a prediction.