Ch. 1 - Introduction and exploring raw data

Introduction to Cleaning Data in R

The data cleaning process

Here’s what messy data look like

Here’s what clean data look like

Exploring raw data

Getting a feel for your data

Viewing the structure of your data

Exploring raw data (part 2)

Looking at your data

Visualizing your data


Ch. 2 - Tidying data

Introduction to tidy data

Principles of tidy data

Common symptoms of messy data

Introduction to tidyr

What kind of messy are the BMI data?

Gathering columns into key-value pairs

Spreading key-value pairs into columns

Introduction to tidyr (part 2)

Functions in tidyr

Separating columns

Uniting columns

Column headers are values, not variable names

Variables are stored in both rows and columns

Multiple values are stored in one column


Ch. 3 - Preparing data for analysis

Type conversions

Types of variables in R

Common type conversions

Working with dates

String manipulation

Trimming and padding strings

Upper and lower case

Finding and replacing strings

Missing and special values

Types of missing and special values in R

Finding missing values

Dealing with missing values

Outliers and obvious errors

Identifying outliers and obvious errors

Dealing with outliers and obvious errors

Another look at strange values


Ch. 4 - Putting it all together

Time to put it all together!

Get a feel for the data

Summarize the data

Take a closer look

Let’s tidy the data

Column names are values

Values are variable names

Prepare the data for analysis

Clean up dates

A closer look at column types

Column type conversions

Missing, extreme, and unexpected values

Find missing values

An obvious error

Another obvious error

Check other extreme values

Finishing touches

Your data are clean!


About Michael Mallari

Michael is a hybrid thinker and doer—a byproduct of being a StrengthsFinder “Learner” over time. With nearly 20 years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.

Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.

LinkedIn | Twitter | michaelmallari.com