Ch. 1 - Introduction and exploring raw data
Introduction to Cleaning Data in R
The data cleaning process
Here’s what messy data look like
Here’s what clean data look like
Exploring raw data
Getting a feel for your data
Viewing the structure of your data
Exploring raw data (part 2)
Looking at your data
Visualizing your data
Ch. 2 - Tidying data
Introduction to tidy data
Principles of tidy data
Common symptoms of messy data
Introduction to tidyr
What kind of messy are the BMI data?
Gathering columns into key-value pairs
Spreading key-value pairs into columns
Introduction to tidyr (part 2)
Functions in tidyr
Separating columns
Uniting columns
Column headers are values, not variable names
Variables are stored in both rows and columns
Multiple values are stored in one column
Ch. 3 - Preparing data for analysis
Type conversions
Types of variables in R
Common type conversions
Working with dates
String manipulation
Trimming and padding strings
Upper and lower case
Finding and replacing strings
Missing and special values
Types of missing and special values in R
Finding missing values
Dealing with missing values
Outliers and obvious errors
Identifying outliers and obvious errors
Dealing with outliers and obvious errors
Another look at strange values
Ch. 4 - Putting it all together
Time to put it all together!
Get a feel for the data
Summarize the data
Take a closer look
Let’s tidy the data
Column names are values
Values are variable names
Prepare the data for analysis
Clean up dates
A closer look at column types
Column type conversions
Missing, extreme, and unexpected values
Find missing values
An obvious error
Another obvious error
Check other extreme values
Finishing touches
Your data are clean!
About Michael Mallari
Michael is a hybrid thinker and doer—a byproduct of being a StrengthsFinder “Learner” over time. With nearly 20 years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.
Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.
LinkedIn | Twitter | michaelmallari.com