Ch. 1 - Light My Fire: Starting To Use Spark With dplyr Syntax
Getting Started
Made for each other
Here be dragons
The connect-work-disconnect pattern
Copying data into Spark
Big data, tiny tibble
Exploring the structure of tibbles
Selecting columns
Filtering rows
Arranging rows
Mutating columns
Summarizing columns
Ch. 3 - Going Native: Use The Native Interface to Manipulate Spark DataFrames
Two new interfaces
Popcorn double feature
Transforming continuous variables to logical
Transforming continuous variables into categorical (1)
Transforming continuous variables into categorical (2)
More than words: tokenization (1)
More than words: tokenization (2)
More than words: tokenization (3)
Sorting vs. arranging
Exploring Spark data types
Shrinking the data by sampling
Training/testing partitions
Ch. 4 - Case Study: Learning to be a Machine: Running Machine Learning Models on Spark
Machine Learning on Spark
Machine learning functions
(Hey you) What’s that sound?
Working with parquet files
Come together
Partitioning data with a group effect
Gradient boosted trees: modeling
Gradient boosted trees: prediction
Gradient boosted trees: visualization
Random Forest: modeling
Random Forest: prediction
Random Forest: visualization
Comparing model performance
An interview with Javier Luraschi and Kevin Ushey
About Michael Mallari
Michael is a hybrid thinker and doer—a byproduct of being a StrengthsFinder “Learner” over time. With 20+ years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.
Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.
LinkedIn | Twitter | michaelmallari.com