Ch. 4 - Preprocessing your data
Median imputation
Median imputation vs. omitting rows
Apply median imputation
KNN imputation
Comparing KNN imputation to median imputation
Use KNN imputation
Compare KNN and median imputation
Multiple preprocessing methods
Order of operations
Combining preprocessing methods
Handling low-information predictors
Why remove near zero variance predictors?
Remove near zero variance predictors
preProcess() and nearZeroVar()
Fit model on reduced blood-brain data
Principle components analysis (PCA)
Using PCA as an alternative to nearZeroVar()
Ch. 5 - Selecting models: a case study in churn prediction
Reusing a trainControl
Why reuse a trainControl?
Make custom train/test indices
Reintroducing glmnet
glmnet as a baseline model
Fit the baseline model
Reintroducing random forest
Random forest drawback
Random forest with custom trainControl
Comparing models
Matching train/test indices
Create a resamples object
More on resamples
Create a box-and-whisker plot
Create a scatterplot
Ensembling models
Summary
About Michael Mallari
Michael is a hybrid thinker and doer—a byproduct of being a StrengthsFinder “Learner” over time. With 20+ years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.
Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.
LinkedIn | Twitter | www.michaelmallari.com/data | www.columbia.edu/~mm5470