PredictWord - an application and algorithm to predict word

LIM Chee Kong
10 August 2015

Purpose of PredictWord

The purpose of this application is to make it easy for people to write texts or/and messages in English. Using the app, when somebody writes a phrase, or an incomplete sentence, the next word will be predicted.

Why PredictWord?

Image taken from: http://blog.84444.com/activities-mobile/

* We spend a lot time on texting

* PredictWord saves time!

How does PredictWord work? (Tokenization)

1. 2 files, containing US news and US twitter, are read

2. A sample of the files are drawn, cleaned and stemmed into a corpus file

3. The corpus file is tokenized to create 3 files: unigram, bigram and trigram

  • For this phrase, “to be or not to be”, the following will be created for the tokenized files
  • unigram: to, be, or, not, to, be
  • bigram: to be, be or, or not, not to, to be
  • trigram: to be or, be or not, or not to, not to be

How does PredictWord work? (Creating N gram databases)

4. The frequecies of the one-word, two-words and three-words are created and sorted for the 3 tokenized files respectively

5. The 3 tokenized files are further processed to create 3 N gram databases

6. A predict.word function is created to predict the next word from the above databases

7. The app is available here: https://cheekong004.shinyapps.io/PredictText