Stephanie Stallworth
May 16, 2017
InstaChat is a fast and easy app that suggests words based on user input.
Features include:
Predictive Algorithm
This app uses an N-gram backoff model trained with English text from three sources - news, blogs, and tweets.
The zip file can be download from: https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip
After cleaning the data, exploratory analysis and Natural Language Processing techniques were applied to build n-gram frequency tables for 1-word, 2-word, and 3-word combinations (known as unigram, bigram, and trigrams).
The algorithm then uses the resulting data frames to predict the next word based on user inputted text and the frequencies of underlying n-gram tables.
InstaChat App
When the user inputs text, the algorithm will perform the following procedure:
1. Search the trigram model to predict the next word.
2. If no matching 3-word combinations exist, the algorithm will roll back to the bigram model.
3. If still no matches, output most frequent unigram.
This process is repeated to output most probable word based on each source type: Twitter, blogs, and news.
Complete code and related documentation can be viewed via: https://github.com/StephanieStallworth/InstaChat_Word_Prediction_App
1. Run App: https://stephaniestallworth.shinyapps.io/instachat_word_prediction_app/
2. Input word or phrase in text box
Parameters:
3. Click 'Predict'
Predictive Performance
Future Development
InstaChat is performant both in speed and accuracy, but that is just the beginning!
Possible features to implement in future releases: