Text Prediction App

Calvin Seto
April 24, 2016

Problem Description

  • By 2017, over a third of the world's population is projected to own a smartphone, almost 2.6 billion smartphone users in the world.
  • Many will use it to send messages to friends and family, but the text prediction feature doesn't work as expected as shown here. Apartment Hunt

  • This app tries to create a better algorithm so that billions of users can communicate clearly with their smartphones.

Prediction Algorithm

The corpus used for the prediction model needed to provide adequate coverage of common words in English, efficiently integrate into the application, and predict the next word given a sentence as accurately as possible.

I used a random sample size of 25% of the three news, blogs, and Twitter data provided by SwiftKey. It contained 640,451 words in 15,276,685 lines of text which were used to create 311,875 unigrams, 965,584 bigrams, and 1,112,810 trigrams and the associated probabilities.

Using the previous one or two words, all possible bigrams or trigrams are searched for in the model. If the trigrams or the bigrams are found, the top three with the highest probabilities are shown as predictions. If nothing is found, the top three unigrams with the highest probabilities are randomly picked and shown as predictions.

How to Use the App

  1. Click My Text Prediction App to load my app and wait 10-20 seconds for the model to load.

  2. In the box labeled Your Input, enter your text and hit return or click Submit.

  3. Look for your prediction in the box labeled Prediction.

  4. Repeat steps 2 and 3 to make more predictions.

  5. Click Documentation to read about the app.

Conclusion

I ran my prediction model using a next word prediction benchmark and here are the results.

  • Overall top-3 score: 13.39 %
  • Overall top-1 precision: 9.99 %
  • Overall top-3 precision: 16.32 %
  • Average runtime: 63.36 msec
  • Number of predictions: 28,464
  • Total memory used: 220.78 MB

Test set “blogs” (599 lines, 14,587 words) Score: 13.44 %, Top-1 precision: 10.11 %, Top-3 precision: 16.32 %

Test set “tweets” (793 lines, 14,071 words) Score: 13.34 %, Top-1 precision: 9.87 %, Top-3 precision: 16.32 %