NextWordApp

Victor Garcia
January 15th, 2016

This presentation is the summary of the Coursera Data Science Specialization Capstone Project. The purpose of the project is to build a shiny app that predicts the next word in a sentence.

1. Predictive model structure

  • The predictive model consists of five data structures. Four of the data structures are hashtables built from n-grams and the remaining structure is a word-frequency table.
  • N-gram structures:
    • The keys are (n-1)-grams (last word of each n-gram removed). Ex: key hash(“going_to_the”) for 4-gram:“going_to_the_cinema”
    • The value for each key is a list of probabilities with the words that could finish the n-gram (words removed in the key). Ex: Key=“going_to_the”. Value=[<“cinema”,0.3>, <“market”,0.25>,<“bed”,0.25>,<“room”,0.2>].
  • Word-frequency table: Ordered table with the frequency and the probability of appearence for the most used words.

2. Predictive model methodology

  • Based on a backoff schema.
  • First, we look for the (n-1)-gram in the 5-gram structure. If the key exists, we return the most likely word.
  • If the key does not exist, we perform the same query in the 4-gram structure.
  • The last two steps are repeated until quering the 2-gram structure.
  • If no key was found in any n-gram, we return the most frequent word found in the word-frequency table.

3. Predictive model performance

  • For each time that the model has to query one of the n-gram structures, we need to compute:
    • O(1) to find a value in a hashtable.
    • O(1) to find the most probable word in the list (Sorted list, so the first element is returned).
  • In the worst case, this needs to be done for each n-gram structure: 4 x O(1)=O(1).
  • In the worst case, we need to look for the most frequent word in the word-frequency table: O(1).
  • Hence, the cost of the predictive algorithm is O(1).

3. Shiny app functionality

  • The shiny app can be accessed in: https://vbarna.shinyapps.io/NextWordApp
  • The user has to enter a sentence in the input text form and press the Submit button.
  • One word prediction is shown as a main functionality.
  • Bellow, the app includes an extra functionality, where it shows a table with other possible predictions and their probability.