Star Text Predictor

Luis Luévano
August 23th, 2015

Capstone Project from the Data Science specialization given by the Johns Hopkins University on Coursera.org.

About this app

Nowadays, app users like to have a wide range of applications like instant messaging, social networks and news feed; mainly in their mobile devices. Stayed informed, in touched with other users and be able to quickly respond are behaviours well defined in them; especially new generations. Text Prediction aims to make easy and faster the use of text input apps.

Text Prediction has been developed since decades ago but with the creation of smaller and more powerful devices we can make use of it.

The Star Text Predictor is a Shiny app that lets you predict the next word given a text. It will suggests possible words that an algorithm has learned from an extensive english text data set.

How it works

The algorithm is based on what is called Markov Chain Model. This states probability events that are serially dependant. In the case of Text Prediction, is the probability of seeing a new word given a serie of other ones. For example, the probability of see “to” given “I have” would be high.

In order to get these probabilities, extensive text data sets were used to built a Language Model using what is called n-grams. For this Language Model, we used up to 5 order gram which means using up to 5 words in sequence. Example: In “me know what you think”, we get the probability of “think” given “me know what you”.

This Language Model is later improved by smoothing the probabilities with interpolation against low order n-grams. The app's algorithm then tries to get the highest n-gram order prediction and if it does not find any, then looks in the next lower n-gram. This is a back-off method.

Strengths and To Do's

Strengths

  • The app is fast enough that gives feedback while user is typing.
  • Able to suggest incomplete words.
  • It handles abbreviations like “I'm” or “you're”.
  • Extensive vocabulary having 29,017 words.
  • Takes into account unknown words.

To Do's

  • To include capital letters after an end of sentence punctuation mark.
  • To have better accuracy by adding intelligence out of the probabilistic method.

Directions

The Star Text Predictor is located at: https://luisluevano.shinyapps.io/StarPredictiveText

  1. Select from one up to five predictive words to show. By default three is selected.

  2. Start writing your text and you will see the suggested words below. The app takes into account unfinished words in case the suggested words do not match with what you want. Start typing that word and immediately the app will start looking from the very first letter. It does this by looking for new words until a space is entered.

  3. If your desired word is among the ones suggested, select it and automatically your text will be completed with the given word. Also a new space will be put in place in order to suggest the next words. If you decide to end the sentence with a punctuation mark the app will delete the last space.