Predictive Text Analysis: Swiftkey

Edwin Tam (https://skybe077.shinyapps.io/capstone_v1/)

The app looks like...

https://skybe077.shinyapps.io/capstone_v1/

Instructions

  1. Type your words into the textbox.
  2. The app should return an ordered list of likely words with probabilities.

PS: Give a little while to load up.

How it works

The app uses 3-gram language model with stupid backoff model (alpha = 0.4)

Method

  1. Take text input. Looks at the last 2 words. If there are matches, return those matches with calculated probability.
    • E.g. “a sunny day” –> look for matches on “sunny day” only
  2. If none are found, then it “backs off” to 2-gram language model
    • E.g. no matches on “sunny day” –> look for matches on “day” only
  3. This continues until there are no more words left. In this case, it returns a 1-gram list of words (as good as not entering anything).

Phrases & Performance

I've tested this model on 5 phrases from Twitter & news.

  1. out as a bit of (fun)
    • Predicted: a, the, an. fun came in on #11 on the list
  2. that have begun to be (left)
    • Predicted: a, the, in.
  3. you can tell the difference (between)
    • Predicted: between, in, is
  4. possible reasons the crash (occurred)
    • Predicted: of, and, site
  5. food poisoning was fairly (mild)
    • Predicted: simple, well, easy

Results

The app isn't very good at predicting the next word.

Likely due to using only 3-gram language model. Contextual information for each phrase occurs near the beginning. This implies that I'll need to investigate a 5-gram language model.

Future Versions

This app is quite crude. It doesn't learn from mistakes, nor can users teach the app. It is slow when loading & creating n-grams.

Versions

  1. Basic text & prediction shiny app (what you're seeing now)
  2. Optimise loading datasets, corpus & n-gram creation with freq counts. Currently it takes too much memory and time.
    • E.g. 1-gram >> 13.6MB
    • 2-gram >> 1.7 GB
    • 3-gram >> 5 GB
    • 4-gram >> out of memory
  3. Create a way for users to train the model. Implement a 5-gram language model.
  4. Add UI functions
    • History of text inputs
    • Y/N buttons for user training
  5. Speed up Shiny app startup.

Made by...