[1] NA
Data manipulation isn’t just about cleaning; it’s about re-coding information so that R (and humans) can understand it better. This process is often called the “Tidy Data” workflow.
Phase 1: Dealing with the “Unknown” (Missing Data)
Before doing any math, you must decide what to do with missing values (NA). If you ignore them, your calculations will fail.
Using na.rm = TRUE inside a function. This is a “quick fix” for a single calculation.
Phase 2: Structural Tidying (Select & Rename)
A pro-learner always simplifies. Don’t carry 30 columns if you only need 4.
Phase 3: Normalization and Filtering
Data often comes in scales that aren’t useful (like cm instead of meters) or contains categories you don’t need for a specific study.
Phase 4: Advanced Recoding (Value Transformation)
Sometimes values are too long or inconsistently named. We use recode() to map old values to new ones.
Code
# A tibble: 6 × 3
name sex gsex
<chr> <chr> <chr>
1 Luke Skywalker male m
2 Darth Vader male m
3 Leia Organa female f
4 Owen Lars male m
5 Beru Whitesun Lars female f
6 Biggs Darklighter male m
Phase 5: Feature Engineering (Creating New Variables)
This is where the “Pro” level begins. We create new variables based on logical conditions.
Creating a TRUE/FALSE variable based on physical attributes.
Code
# A tibble: 6 × 2
name size_logic
<chr> <lgl>
1 Luke Skywalker TRUE
2 Darth Vader TRUE
3 Leia Organa FALSE
4 Owen Lars TRUE
5 Beru Whitesun Lars FALSE
6 Biggs Darklighter TRUE
Turning logic into human-readable labels using if_else().
Code
# A tibble: 6 × 4
name height weight size
<chr> <dbl> <dbl> <chr>
1 Luke Skywalker 1.72 77 Big
2 Darth Vader 2.02 136 Big
3 Leia Organa 1.5 49 Small
4 Owen Lars 1.78 120 Big
5 Beru Whitesun Lars 1.65 75 Small
6 Biggs Darklighter 1.83 84 Big
The Master Pipeline: The “Pro” Way
In professional R coding, we write this entire process as one continuous, logical “story.”
Code
final_sw <- starwars %>%
# 1. Selection & Renaming
select(name, height, mass, sex) %>%
rename(weight = mass) %>%
# 2. Cleaning
na.omit() %>%
filter(sex %in% c("male", "female")) %>%
# 3. Recoding & Unit Conversion
mutate(height = height / 100,
gsex = recode(sex, "male" = "m", "female" = "f")) %>%
# 4. Feature Engineering
mutate(is_big = height > 1 & weight > 75,
size = if_else(is_big, "Big", "Small"))
# view(final_sw) # To see the final masterpiece🎓 Summary for Learners
| Technique | Function | Purpose |
| Handling NAs | na.omit() |
Cleans the data “skeleton” before analysis. |
| Normalizing | mutate() |
Ensures units (m, kg) are standard. |
| Recoding | recode() |
Standardizes labels for better grouping. |
| Logic | if_else() |
Creates new categorical insights from numbers. |
Pro Tip: Always check your data types with str() or glimpse() after recoding to ensure your new columns are factors or characters as intended!
Courses that contain short and easy to digest video content are available at premieranalytics.com.bd Each lessons uses data that is built into R or comes with installed packages so you can replicated the work at home. premieranalytics.com.bd also includes teaching on statistics and research methods.