December 5, 2017

Background

  • BS Accounting
  • Accountant
  • MAcc
  • Marketing Analyst
  • Product Entry Lead
  • Financial Analyst
  • Data Scientist

JHU Data Science Specialization

Capstone Project

Objective: Create word prediction app similar to Swiftkey on mobile phones

Data Sources: Twitter, news stories, blogs

Data Processing: Tokenize, remove stopwords, punctuation, numbers, symbols, and stem words. Separate into ngrams (1, 2, 3, 4) and sort by most frequent. Unigrams: take top 5k. Bigrams, trigrams, and quadrigrams; take top 5 million.

Data Modelling: Using input (word or phrase), process in same way as dataset. Take last 3 words of phrase, and find most frequent quadrigram that starts with those 3 words. Use that to predict next word. If no matches, try to match trigram, then bigram. If still no matches, take word from unigram.

https://debmartin06.shinyapps.io/Capstone_Word_Predictor/