Ch. 1 - Quick introduction to the workflow
Why learn topic modeling
Topics as word contexts
Topic prevalence
Probabilities of words belonging to topics
Counting words
Removal of punctuation marks
Word frequencies
Our first LDA model
Displaying frequencies with ggplot
Simple LDA model
Ch. 2 - Wordclouds, stopwords, and control arguments
Random nature of LDA algorithm
Probabilities of words in topics
Effect of argument alpha
Manipulating the vocabulary
Making a dtm - refresher
Removing stopwords
Keeping the needed words
Word clouds
Wordcloud of term frequency
History of the Byzantine Empire
LDA model fitting - first iteration
Capturing the actions - dtm with verbs
Making a chart
Use wordclouds
Ch. 3 - Named entity recognition as unsupervised classification
Using topic models as classifiers
Same k, different alpha
Probabilities of words in topics
From word windows to dtm
Regex patterns for entity matching
Making a corpus
From dtm to topic model
Corpus alignment and classification
Train a topic model
Align corpus
Classify test data
Explore the results
Ch. 4 - How many topics is enough?
Finding the best number of topics
Preparing the dtm
Filtering by word frequency
Fitting one model
Using perplexity to find the best k
Topic models fitted to novels
Generating chunk numbers
Inner join and cast dtm
Finding the best value for k
Locking topics by using seed words
Topics without seedwords
Topics with seedwords
Final words (and more things to learn)
About Michael Mallari
Michael is a hybrid thinker and doer—a byproduct of being a StrengthsFinder “Learner” over time. With nearly 20 years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.
Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.
LinkedIn | Twitter | michaelmallari.com