Directions: Watch the “Data wrangling with R in 27 minutes” video from Lake Forest College professor Andrew Gard’s Equitable Equations YouTube channel. The video’s URL is:

https://youtu.be/oXImkptBpqc?si=v9VksA4uwy58gqOq

You’ll have to copy and paste the URL into a browser if you want to open the video from here. YouTube won’t let RPubs connect directly.

Use what you learn from the video to determine what each of the code blocks shown below would do if run on the DataWrangling.csv data set. Use the companion D2L quiz to indicate your response to each code block’s question.

Each block runs some code on a data frame called “mydata” that contains the following variables. The data frame has seven columns (one for each of the following variables) and 95 rows (one for each Tennessee county). The data were extracted from the U.S. Census Bureau’s 2021 American Community Survey five-year dataset. If you want a look at the data, see:

https://docs.google.com/spreadsheets/d/1XeAuHIBz6AAZxU6M1Bz6RvJEvRRv7OLrokEKMzE9MHs/edit?usp=sharing

From left to right in the data frame, the variables are:

County: The name of the county. “Anderson,” “Bedford,” “Benton,” etc. In all, there are 95 counties in the data frame.

Region: The Tennessee region in which the county is located. There are three regions: “West,” “Middle,” and “East.”

Med_HH_Income: Each county’s median household income.

Households: The number of households in each county.

Pct_BB: The percentage of households in each county that have broadband internet access.

Pct_College: The percentage of residents in each county who have a four-year college degree or higher (like a master’s degree, a law degree, a Ph.D., or a medical degree).

Land_area: Each county’s land area, in square miles. Land area is area in the county that is not covered by a river, lake, or other body of water.

It is not necessary to run each block of code to learn what it does. All the information your need is contained in the video. However, if you want to run the code, just to be sure, you can do so after running the code in this block, which will read the data from a file on my website, save the file on your computer, install and activate the tidyverse library, and load the data file into memory as a data frame called “mydata.”

# Read the data from the web
FetchedData <- read.csv("https://drkblake.com/wp-content/uploads/2023/11/DataWrangling.csv")
# Save the data on your computer
write.csv(FetchedData, "DataWrangling.csv", row.names=FALSE)
# remove the data from the environment
rm (FetchedData)

# Installing required packages
if (!require("tidyverse"))
  install.packages("tidyverse")
library(tidyverse)

# Read the data
mydata <- read.csv("DataWrangling.csv")

Q1: What would the code shown below do? (Answer using the choices for Q1 on the D2L quiz)

NewData <- filter(mydata, Region == "Middle")

Q2: What would the code shown below do? (Answer using the choices for Q2 on the D2L quiz)

NewData <- filter(mydata, Pct_BB > 75.0)

Q3: What would the code shown below do? (Answer using the choices for Q3 on the D2L quiz)

NewData <- filter(mydata,
                       Pct_BB > 75.0,
                       Region == "Middle")

Q4: What would the code shown below do? (Answer using the choices for Q4 on the D2L quiz)

NewData <- filter(mydata,
                       Pct_BB > 75.0|
                       Region == "Middle")

Q5: What would the code shown below do? (Answer using the choices for Q5 on the D2L quiz)

NewData <- select(mydata,County, Region, Pct_BB)

Q6: What would the code shown below do? (Answer using the choices for Q6 on the D2L quiz)

NewData <- select(mydata,Region, County, Pct_BB)

Q7: What would the code shown below do? (Answer using the choices for Q7 on the D2L quiz)

NewData <- select(mydata, contains("Pct_"))

Q8: What would the code shown below do? (Answer using the choices for Q8 on the D2L quiz)

?select

Q9: What would the code shown below do? (Answer using the choices for Q9 on the D2L quiz)

NewData <- select(mydata, -Region)

Q10: What would the code shown below do? (Answer using the choices for Q10 on the D2L quiz)

NewData <- mydata %>% 
  select(-Region)

Q11: What would the code shown below do? (Answer using the choices for Q11 on the D2L quiz)

NewData <- mydata %>% 
  arrange(Region)

Q12: What would the code shown below do? (Answer using the choices for Q12 on the D2L quiz)

NewData <- mydata %>% 
  arrange(Pct_BB)

Q13: What would the code shown below do? (Answer using the choices for Q13 on the D2L quiz)

NewData <- mydata %>% 
  arrange(Region,
          Pct_BB)

Q14: What would the code shown below do? (Answer using the choices for Q14 on the D2L quiz)

NewData <- mydata %>% 
  arrange(desc(Pct_BB))

Q15: What would the code shown below do? (Answer using the choices for Q15 on the D2L quiz)

NewData <- mydata %>% 
  mutate(Density = Households / Land_area)

Q16: What would the code shown below do? (Answer using the choices for Q16 on the D2L quiz)

NewData <- mydata %>% 
  mutate(Density = Households / Land_area,
         High_Access = Pct_BB > 75.0)

Q17: What would the code shown below do? (Answer using the choices for Q17 on the D2L quiz)

mydata %>% 
  group_by(Region) %>% 
  summarize(mean(Pct_BB))

Q18: What would the code shown below do? (Answer using the choices for Q18 on the D2L quiz)

mydata %>% 
  group_by(Region) %>% 
  summarize(Avg_Access = mean(Pct_BB))

Q19: What would the code shown below do? (Answer using the choices for Q19 on the D2L quiz)

mydata %>% 
  group_by(Region) %>% 
  summarize(Avg_Access = mean(Pct_BB),
            SD = sd(Pct_BB))

Q20: What would the code shown below do? (Answer using the choices for Q20 on the D2L quiz)

mydata %>% 
  group_by(Region,
           High_Access = Pct_BB > 75.0) %>% 
  summarize(Avg_Access = mean(Pct_BB),
            SD = sd(Pct_BB),
            count = n())