2023-02-18

Application Purpose

  • People are spending more time using their mobile devices
  • Typing with mobile devices is not always easy
  • Tech companies create applications that help people type more easily
  • Providing further text suggestions to the application user is achieved with predictive text modeling
  • For example, if we have a sentence:

Mary had a little …

  • Google search text prediction function tells us the next most likely word is lamb
  • The purpose of this project is to create an application that would predict the following words in a similar way
  • Three files with texts from Twitter messages, blogs and internet news in US English provided by SwiftKey were used

Model Explanation

  • The next word is predicted using n-grams - contiguous sequences of last n words from given phrases
  • This assumption of the next word depending only on a smaller number of previous words comes from the concept of Markov chains
  • For the given n-gram, most frequent next words are selected
  • If there are no such words, we “back-off” to lower n values
  • Katz back-off is used to calculate next word probabilities
  • It is used with absolute discounting to save some probability mass for unseen words/n-grams
  • Absolute discounting means subtracting a fixed discount from each count

Performance Summary

  • For each data source, accuracy rates were calculated for both train and test data
  • It was checked if the actual next word is among top n (1, 3, 5, 10) words predicted by the algorithm
  • For example, 11.3 for Twitter top 3 train predictions means that in 11.3% of tested cases for train data, the actual next word was among top 3 predictions for the given phrase
  • The higher the n value, the higher the accuracy rates
Accuracy.Type Twitter News Blogs
Top 1 Train 7.3 7.4 4.7
Top 1 Test 6.5 6.9 4.8
Top 3 Train 11.3 11.2 8.0
Top 3 Test 10.3 10.5 8.0
Top 5 Train 13.3 13.0 10.0
Top 5 Test 12.4 12.5 10.0
Top 10 Train 17.1 16.2 13.2
Top 10 Test 15.9 15.7 13.0

Usage Explained

  • Enter the phrase for which you want to predict the next word
  • Select the number of predicted words (default is 1)
  • Choose if you want to see prediction probabilities
  • Push “Show Results” button