The "Next Word" App

Shaddyjr
10/11/16

Description

This light-weight app accepts a user's string input and produces a prediction of the 'next word' that would most likely follow.

How it works

Data collected from blogs, twitter, and news reports
- English, German, Finnish, and Russian
Used bigrams from the corpora to form a simple predictive model
- The most frequent 'next words' are used
Only a fraction of the data practical for making a predictive model
- Therefore, low accuracy (3.725%)

      BigramTerms         Count GoodTuringCounts
 [1,] "color being"       "308" "0"             
 [2,] "color codes"       "208" "0"             
 [3,] "colorful resident" "124" "1"             
 [4,] "come a"            "234" "0"             
 [5,] "come to"           "757" "0"             
 [6,] "comes from"        "115" "0"             
 [7,] "comes off"         "123" "0"             
 [8,] "comes on"          "374" "0"             
 [9,] "coming out"        "234" "0"             
[10,] "coming up"         "330" "0"             
[11,] "commercial hehe"   "69"  "0"             
[12,] "commercial or"     "210" "1"

Practical Applications

Most useful for mobile apps

Users want quick predictions
App's main data file only uses ~190 kb

Functionality could also include:

Filling in missing content (like a “Rosetta Stone”)
Creating random, but proper sentences

Future Plans

Using trigrams would yeild a more accurate prediction
Using a Good Turing estimation would also increase model accuracy
Including all languages, made possible with UTF-8 conversion