NextWordApp

Victor Garcia
January 15th, 2016

This presentation is the summary of the Coursera Data Science Specialization Capstone Project. The purpose of the project is to build a shiny app that predicts the next word in a sentence.

1. Predictive model structure

The predictive model consists of five data structures. Four of the data structures are hashtables built from n-grams and the remaining structure is a word-frequency table.
N-gram structures:
- The keys are (n-1)-grams (last word of each n-gram removed). Ex: key hash(“going_to_the”) for 4-gram:“going_to_the_cinema”
- The value for each key is a list of probabilities with the words that could finish the n-gram (words removed in the key). Ex: Key=“going_to_the”. Value=[<“cinema”,0.3>, <“market”,0.25>,<“bed”,0.25>,<“room”,0.2>].
Word-frequency table: Ordered table with the frequency and the probability of appearence for the most used words.

2. Predictive model methodology

Based on a backoff schema.
First, we look for the (n-1)-gram in the 5-gram structure. If the key exists, we return the most likely word.
If the key does not exist, we perform the same query in the 4-gram structure.
The last two steps are repeated until quering the 2-gram structure.
If no key was found in any n-gram, we return the most frequent word found in the word-frequency table.

3. Predictive model performance

For each time that the model has to query one of the n-gram structures, we need to compute:
- O(1) to find a value in a hashtable.
- O(1) to find the most probable word in the list (Sorted list, so the first element is returned).
In the worst case, this needs to be done for each n-gram structure: 4 x O(1)=O(1).
In the worst case, we need to look for the most frequent word in the word-frequency table: O(1).
Hence, the cost of the predictive algorithm is O(1).

3. Shiny app functionality

The shiny app can be accessed in: https://vbarna.shinyapps.io/NextWordApp
The user has to enter a sentence in the input text form and press the Submit button.
One word prediction is shown as a main functionality.
Bellow, the app includes an extra functionality, where it shows a table with other possible predictions and their probability.