Hunger on Campus: Filtering

Demonstration
Solution 1
Solution 2

Be careful when you redefine the hunger data by filtering out rows (survey respondents). Once you redefine the data, the filtered out respondents are permanently eliminated from the data set. This may not be what you want since a respondent who didn’t answer a question (a NA) may have answered other questions (not a NA for other questions). You still want them for analyzing other questions.

Demonstration

Suppose that you are investigating Q22 (Have you experienced hunger last year?). Visualizing Q22 shows that about one of five people experienced hunger last year. Now that you have this information, you want to dig further and want to see if Q8 (If you do not have a meal plan please indicate why) plays a role. But there are many NAs in Q8, which you decided to filter out.

This decision has an unintended consequences in the other charts below. As an example, compare the charts of Q28 before and after NAs in Q8 were filtered out from the hunger data set. They are different even though they have the same code. This is not what you want!

# Import data
hunger <- read.csv("hunger.csv") 

ggplot(hunger, aes(x = Q28)) +
  geom_bar()


ggplot(hunger, aes(x = Q22)) +
  geom_bar()


# Plot proportion of Q3, conditional on Q4
ggplot(hunger, aes(x = Q22, fill = Q8)) + 
  geom_bar()


# Remove Q8 level
hunger <- hunger %>%
  filter(!is.na(Q8)) %>%
  droplevels()

# Plot proportion of Q3, conditional on Q4
ggplot(hunger, aes(x = Q22, fill = Q8)) + 
  geom_bar()


ggplot(hunger, aes(x = Q28)) +
  geom_bar()

See the difference in the chart of Q28 before and after NAs in Q8 were filtered out from the hunger data set. The chart of Q28 is incorrect because it doesn’t include many valid responses for Q28 becasue they were filtered out earlier.

Solution 1

Rename the hunger data set when filtering so that you can keep the original hunger data set.

# Clean up
rm(list = ls(all = TRUE))

# Import data
hunger <- read.csv("hunger.csv")

ggplot(hunger, aes(x = Q28)) +
  geom_bar()


ggplot(hunger, aes(x = Q22)) +
  geom_bar()


# Plot proportion of Q3, conditional on Q4
ggplot(hunger, aes(x = Q22, fill = Q8)) + 
  geom_bar()


# Remove Q8 level and save the resulting object in hunger2 in order to keep the original hunger data
hunger2 <- hunger %>%
  filter(!is.na(Q8)) %>%
  droplevels()

# Plot proportion of Q3, conditional on Q4
ggplot(hunger2, aes(x = Q22, fill = Q8)) + 
  geom_bar()


ggplot(hunger, aes(x = Q28)) +
  geom_bar()

Note that there is no difference in the chart of Q28 before and after NAs in Q8 were filtered out.

Solution 2

Use pipe operators

# Clean up
rm(list = ls(all = TRUE))

# Import data
hunger <- read.csv("hunger.csv")

ggplot(hunger, aes(x = Q28)) +
  geom_bar()


ggplot(hunger, aes(x = Q22)) +
  geom_bar()


# Plot proportion of Q3, conditional on Q4
ggplot(hunger, aes(x = Q22, fill = Q8)) + 
  geom_bar()


# Use pipe operators to keep the original hunger data
hunger %>%
  filter(!is.na(Q8)) %>%
  ggplot(aes(x = Q22, fill = Q8)) + 
  geom_bar()


ggplot(hunger, aes(x = Q28)) +
  geom_bar()

Note that there is no difference in the chart of Q28.

Hunger on Campus: Filtering

Daniel Lee

Demonstration

Solution 1

Solution 2