We are going to produce R code that mimics word-prediction algorithms for mobile text messaging. The joint venture between John Hopkins University and Swiftkey provided training data composed of Twitter posts, blogs, and news feeds. The files are also in the following languages: American English, Finnish, German, and Russian. The algorithm presented below will emphasize speed, and yet will hopefully yield similar results---that is, still predict what would the user wants to type next---as the current, memory-intensive methods.
English | Finnish | ||||||
---|---|---|---|---|---|---|---|
blogs | news | blogs | news | ||||
Lines | 898384 | 77258 | 2302307 | 439715 | 485758 | 278943 | |
Size (Mb) | 205 | 200 | 163 | 105 | 92 | 24 | |
German | Russian | ||||||
blogs | news | blogs | news | ||||
Lines | 181909 | 244739 | 929660 | 337075 | 196360 | 875002 | |
Size (Mb) | 83 | 93 | 73 | 114 | 116 | 102 |