TextPredictor

What is TextPredictor?

TextPredictor is an text prediction application.This presentation describes:

  • How TextPredictor works
  • How the App works
  • Assessing Performance

How TextPredictor Works

  1. Data frames of trigrams and bigrams with associated conditional probabilities have been formed from large corpi of twitter, blogs and news sources
  2. Given two words the trigram data frame is searched for a match and the word with the largest probability is returned
  3. If there is no match in the trigram data frame, the first word is ignored and bigram dataframe is searched
           wordAB    wordC pCgivenAB
67575      a back   injury 0.5000000
67576 a bachelors   degree 1.0000000
67577      a baby     bird 0.3333333
67578      a baby     ctfu 0.3333333
67579      a baby    youre 0.3333333
67580         a a somewhat 1.0000000

How the App works

The input section of the app is on the left hand side and output section is on the right. When text is entered into the input section following outputs are displayed:

  • A prediction of the next world
  • The type of model used to make the prediction
  • The associated probability
  • Any warnings (such as entering a word not in the apps corpus)
  • A section of the underlying dataframe showing alternative predictions

Assessing Performance

Perplexity has been used to assess the performance of the model

  1. Training set and test set have were formed
  2. Conditional probability models are calculated using the training set
  3. The perplexity is then calculated as below, using using the sentences s1,..,sm from the test data

\( Perplexity = 2^{-l} \)

\( l = \frac{1}{M} \sum\limits_{i=1}^{m}\log(p(s_{i})) \) (M = Number of words in training set)

A corpus of 5863 words gave a bigram perplexity of 130.2