Ch. 1 - Quick introduction to the workflow

Why learn topic modeling

Topics as word contexts

Topic prevalence

Probabilities of words belonging to topics

Counting words

Removal of punctuation marks

Word frequencies

Our first LDA model

Displaying frequencies with ggplot

Simple LDA model


Ch. 2 - Wordclouds, stopwords, and control arguments

Random nature of LDA algorithm

Probabilities of words in topics

Effect of argument alpha

Manipulating the vocabulary

Making a dtm - refresher

Removing stopwords

Keeping the needed words

Word clouds

Wordcloud of term frequency

History of the Byzantine Empire

LDA model fitting - first iteration

Capturing the actions - dtm with verbs

Making a chart

Use wordclouds


Ch. 3 - Named entity recognition as unsupervised classification

Using topic models as classifiers

Same k, different alpha

Probabilities of words in topics

From word windows to dtm

Regex patterns for entity matching

Making a corpus

From dtm to topic model

Corpus alignment and classification

Train a topic model

Align corpus

Classify test data

Explore the results


Ch. 4 - How many topics is enough?

Finding the best number of topics

Preparing the dtm

Filtering by word frequency

Fitting one model

Using perplexity to find the best k

Topic models fitted to novels

Generating chunk numbers

Inner join and cast dtm

Finding the best value for k

Locking topics by using seed words

Topics without seedwords

Topics with seedwords

Final words (and more things to learn)


About Michael Mallari

Michael is a hybrid thinker and doer—a byproduct of being a StrengthsFinder “Learner” over time. With nearly 20 years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.

Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.

LinkedIn | Twitter | michaelmallari.com