Data Science Capstone Project

Text Prediction Application Deck

Ravishankar Doejode

Text Prediction Application Goals

  1. To predict the next word for a 1-3 words phrase
  2. Use Twitter and News data to build the model
  3. Host the site on Shinyapps.io
  4. Have the application return the predicted word in a reasonable amount of time

How did I go about building the application

  1. Used 30% of sampled Twitter and News data
  2. Built unigram, bigram, trigram and 4-grams
  3. Retained features with a minimum occurence of 5 times in 5 docs for unigram and bigram
  4. Retained features with a minimum occurence of 2 times 2 docs for trigram and 4-grams
  5. Used Maximum Likelihood Estimation (MLE) to show the best most likely next word
  6. Used MLE to show the next most likely four words
  7. Kept the application and data simple and nimble for performance
  8. Used data tables to store the n-grams for speed

Text Prediction Application Details

  1. Application url is https://rdoejode.shinyapps.io/shinyapp/
  2. Data gets loaded in a few seconds - Note for the user to wait a few secs
  3. Text input box for 1-3 words phrase
  4. Submit button to submit the input pharse
  5. One text window shows the next word with the highest MLE
  6. Second text window shows the next four words with the highest MLE

Highlights and wrap

  1. Kept the application simple and nimble for performance
  2. Nothing too fancy by design

###Just glad to be done with the Data Science Track###

Thank you