# Must be a Primate AND weigh more than 20kgheavy_primates <- msleep %>%select(name, order, bodywt, sleep_total) %>%filter(order =="Primates", bodywt >20)
Rows must meet at least one condition.
Code
# Is either a Primate OR weighs more than 20kgprimates_or_heavy <- msleep %>%select(name, order, bodywt, sleep_total) %>%filter(order =="Primates"| bodywt >20)
Phase 3: Filtering Categorical Lists
When searching for multiple names or categories, typing | repeatedly is inefficient. Instead, use the In-operator (%in%).
# Searching for specific names one by onemanual_search <- msleep %>%select(name, sleep_total) %>%filter(name =="Rabbit"| name =="Tiger"| name =="Horse")
Use a “concatenate” list to search for multiple values elegantly
Code
# Search using a vector listpro_search <- msleep %>%select(name, sleep_total) %>%filter(name %in%c("Rabbit", "Tiger", "Horse"))
Phase 4: Range and Proximity Filtering
Sometimes you don’t need an exact number, but rather a “neighborhood” of values.
# Sleep total between 10 and 16 hours (inclusive)mid_sleepers <- msleep %>%select(name, sleep_total) %>%filter(between(sleep_total, 10, 16))
Finds values close to a target within a specified tolerance.
Code
# Target 17 hours, with a 0.5 hour buffer (16.5 to 17.5)approx_sleepers <- msleep %>%select(name, sleep_total) %>%filter(near(sleep_total, 17, tol =0.5))
Phase 5: Handling Missing Data (NA)
Missing values are a unique “state” in R. You cannot use == NA; you must use the is.na() function.
# Show mammals with missing conservation statusmissing_info <- msleep %>%select(name, conservation, sleep_total) %>%filter(is.na(conservation))
Find rows where information is complete.
Code
# Keep only rows where conservation status is knowncomplete_info <- msleep %>%select(name, conservation, sleep_total) %>%filter(!is.na(conservation))
🎓 Systemic Summary for Learners
Operator/Function
Systemic Role
Meaning
==
Equality
Must match exactly.
!=
Inequality
Must NOT match.
, or &
Intersection
Both conditions must be TRUE.
|
Union
Either condition can be TRUE.
%in%
Set Membership
Value must be in the provided list.
is.na()
Null Check
Identifies missing values.
between()
Range Check
Between \(x\) and \(y\) (inclusive).
Pro-Tip: Filtering is the first step in Data Cleaning. Always check the number of rows remaining with nrow() after filtering to ensure you haven’t accidentally deleted your entire dataset!
Courses that contain short and easy to digest video content are available at premieranalytics.com.bd Each lessons uses data that is built into R or comes with installed packages so you can replicated the work at home. premieranalytics.com.bd also includes teaching on statistics and research methods.