2023-02-18
Application Purpose
- People are spending more time using their mobile devices
- Typing with mobile devices is not always easy
- Tech companies create applications that help people type more easily
- Providing further text suggestions to the application user is achieved with predictive text modeling
- For example, if we have a sentence:
Mary had a little …
- Google search text prediction function tells us the next most likely word is lamb
- The purpose of this project is to create an application that would predict the following words in a similar way
- Three files with texts from Twitter messages, blogs and internet news in US English provided by SwiftKey were used
Model Explanation
- The next word is predicted using n-grams - contiguous sequences of last n words from given phrases
- This assumption of the next word depending only on a smaller number of previous words comes from the concept of Markov chains
- For the given n-gram, most frequent next words are selected
- If there are no such words, we “back-off” to lower n values
- Katz back-off is used to calculate next word probabilities
- It is used with absolute discounting to save some probability mass for unseen words/n-grams
- Absolute discounting means subtracting a fixed discount from each count
Performance Summary
- For each data source, accuracy rates were calculated for both train and test data
- It was checked if the actual next word is among top n (1, 3, 5, 10) words predicted by the algorithm
- For example, 11.3 for Twitter top 3 train predictions means that in 11.3% of tested cases for train data, the actual next word was among top 3 predictions for the given phrase
- The higher the n value, the higher the accuracy rates
|
Accuracy.Type
|
Twitter
|
News
|
Blogs
|
|
Top 1 Train
|
7.3
|
7.4
|
4.7
|
|
Top 1 Test
|
6.5
|
6.9
|
4.8
|
|
Top 3 Train
|
11.3
|
11.2
|
8.0
|
|
Top 3 Test
|
10.3
|
10.5
|
8.0
|
|
Top 5 Train
|
13.3
|
13.0
|
10.0
|
|
Top 5 Test
|
12.4
|
12.5
|
10.0
|
|
Top 10 Train
|
17.1
|
16.2
|
13.2
|
|
Top 10 Test
|
15.9
|
15.7
|
13.0
|
Usage Explained
- Enter the phrase for which you want to predict the next word
- Select the number of predicted words (default is 1)
- Choose if you want to see prediction probabilities
- Push “Show Results” button
