- BS Accounting
- Accountant
- MAcc
- Marketing Analyst
- Product Entry Lead
- Financial Analyst
- Data Scientist
December 5, 2017
https://www.coursera.org/specializations/jhu-data-science
Objective: Create word prediction app similar to Swiftkey on mobile phones
Data Sources: Twitter, news stories, blogs
Data Processing: Tokenize, remove stopwords, punctuation, numbers, symbols, and stem words. Separate into ngrams (1, 2, 3, 4) and sort by most frequent. Unigrams: take top 5k. Bigrams, trigrams, and quadrigrams; take top 5 million.
Data Modelling: Using input (word or phrase), process in same way as dataset. Take last 3 words of phrase, and find most frequent quadrigram that starts with those 3 words. Use that to predict next word. If no matches, try to match trigram, then bigram. If still no matches, take word from unigram.