- Data is from a corpus called HC Corpora
- Consists of text files collected from publicly available sources by a web crawler
- English language files that were gathered from Twitter and different blogs and news sources
- Should give a rather good mix of general language used today
- Predicting based on previous two words and giving five suggestions, the app shows right word 76% of time
5suggestions 3suggestions 1suggestions
2-gram 0.76 0.73 0.65
1-gram 0.72 0.68 0.58