Word Predictor App Presentation

akselix
2016-04-24

Word Predictor - What does it do?

  • Predict next word given the previous words
  • Proof-of-concept Shiny web application
  • Prediction model could be further developed and used in many different business cases

Instructions

  • https://arttu.shinyapps.io/word_predictor
  • Write input sentence and the app will predict a list of words
  • Slider controls the number of suggested words
  • Word cloud on the right hand side visualises the frequencies
  • Table on the left hand side shows predictions and contains a search box for fnding specific words

Data and Accuracy

  • Data is from a corpus called HC Corpora
  • Consists of text files collected from publicly available sources by a web crawler
  • English language files that were gathered from Twitter and different blogs and news sources
  • Should give a rather good mix of general language used today
  • Predicting based on previous two words and giving five suggestions, the app shows right word 76% of time
       5suggestions 3suggestions 1suggestions
2-gram         0.76         0.73         0.65
1-gram         0.72         0.68         0.58

Model

  • Based on the stupid backoff -algorithm from the Markov family of probalistic models:
    1. Take the input and use the same text transformations as for the training data and return last two words.
    2. Search for two first input words in the 3-grams training data and if matched, predict the third word. If no match, then next step.
    3. Search with only the last input word in the first word of 2-grams training data. If matched, predict the second word. If no match, then next step.
    4. Predict the most common words in the 1-gram data.
  • Despite its name, it actually performs quite well given very large data.

Future Development

  • Could us different data sets for domain specific application or another language
  • Try out more advanced models like Markov chains
  • UI could be improved