dplyr
basicsdplyrDuring ANLY 512 we will be studying the theory and practice of
data visualization. We will be using R and the
packages within R to assemble data and construct many
different types of visualizations. Before we begin studying data
visualizations we need to develop some data wrangling skills. We will
use these skills to wrangle our data into a form that we can use for
visualizations.
The objective of this assignment is to introduce you to R Studio,
Rmarkdown, the tidyverse and more specifically the dplyr
package.
Each question is worth 5 points.
To submit this homework you will create the pdf document in RStudio, using the knitr package (button included in RStudio) and then submit the pdf document to Canvas.
Question #1
Use the nycflights13 package and the flights data frame to answer the following questions: a.What month had the highest proportion of cancelled flights? b.What month had the lowest?
library(nycflights13)
flight_cancel <- flights %>%
group_by(month) %>%
summarize(cancelled = sum(is.na(dep_time)),
cancelled_proportion = cancelled/n()*100) %>%
arrange(cancelled_proportion)
flight_cancel
## # A tibble: 12 × 3
## month cancelled cancelled_proportion
## <int> <int> <dbl>
## 1 10 236 0.817
## 2 11 233 0.854
## 3 9 452 1.64
## 4 8 486 1.66
## 5 1 521 1.93
## 6 5 563 1.96
## 7 4 668 2.36
## 8 3 861 2.99
## 9 7 940 3.19
## 10 6 1009 3.57
## 11 12 1025 3.64
## 12 2 1261 5.05
knitr::kable(flight_cancel, caption = "Cancellation Proportions by Month")
| month | cancelled | cancelled_proportion |
|---|---|---|
| 10 | 236 | 0.8169199 |
| 11 | 233 | 0.8544814 |
| 9 | 452 | 1.6392254 |
| 8 | 486 | 1.6571760 |
| 1 | 521 | 1.9293438 |
| 5 | 563 | 1.9551327 |
| 4 | 668 | 2.3579245 |
| 3 | 861 | 2.9860581 |
| 7 | 940 | 3.1945624 |
| 6 | 1009 | 3.5725667 |
| 12 | 1025 | 3.6431491 |
| 2 | 1261 | 5.0539057 |
Question #2
#Consider the following pipeline:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
filter(am == 1)%>%
summarize(avg_mpg = mean(mpg))
## # A tibble: 3 × 2
## cyl avg_mpg
## <dbl> <dbl>
## 1 4 28.1
## 2 6 20.6
## 3 8 15.4
What is the problem with this pipeline?
Question #3
Define two new variables in the mtcars data frame.
A variable that indicates whether mpg is above the average of all
vehicles in the dataset. You might to use ifelse logic to
do this.
A variable that indicates whether hp is in the top quartile of all vehicles in the dataset.
mtcars <- mtcars %>%
mutate(
above_avg_mpg = ifelse(mpg > mean(mpg), "Above Average", "At or Below Average"),
top_quartile_hp = ifelse(hp >= quantile(hp, 0.75), TRUE, FALSE)
)
#a
mtcars %>%
rownames_to_column("vehicle") %>%
select(vehicle, mpg, above_avg_mpg) %>%
arrange(desc(mpg))
## vehicle mpg above_avg_mpg
## 1 Toyota Corolla 33.9 Above Average
## 2 Fiat 128 32.4 Above Average
## 3 Honda Civic 30.4 Above Average
## 4 Lotus Europa 30.4 Above Average
## 5 Fiat X1-9 27.3 Above Average
## 6 Porsche 914-2 26.0 Above Average
## 7 Merc 240D 24.4 Above Average
## 8 Datsun 710 22.8 Above Average
## 9 Merc 230 22.8 Above Average
## 10 Toyota Corona 21.5 Above Average
## 11 Hornet 4 Drive 21.4 Above Average
## 12 Volvo 142E 21.4 Above Average
## 13 Mazda RX4 21.0 Above Average
## 14 Mazda RX4 Wag 21.0 Above Average
## 15 Ferrari Dino 19.7 At or Below Average
## 16 Merc 280 19.2 At or Below Average
## 17 Pontiac Firebird 19.2 At or Below Average
## 18 Hornet Sportabout 18.7 At or Below Average
## 19 Valiant 18.1 At or Below Average
## 20 Merc 280C 17.8 At or Below Average
## 21 Merc 450SL 17.3 At or Below Average
## 22 Merc 450SE 16.4 At or Below Average
## 23 Ford Pantera L 15.8 At or Below Average
## 24 Dodge Challenger 15.5 At or Below Average
## 25 Merc 450SLC 15.2 At or Below Average
## 26 AMC Javelin 15.2 At or Below Average
## 27 Maserati Bora 15.0 At or Below Average
## 28 Chrysler Imperial 14.7 At or Below Average
## 29 Duster 360 14.3 At or Below Average
## 30 Camaro Z28 13.3 At or Below Average
## 31 Cadillac Fleetwood 10.4 At or Below Average
## 32 Lincoln Continental 10.4 At or Below Average
#b
mtcars %>%
rownames_to_column("vehicle") %>%
select(vehicle, hp, top_quartile_hp) %>%
arrange(desc(hp))
## vehicle hp top_quartile_hp
## 1 Maserati Bora 335 TRUE
## 2 Ford Pantera L 264 TRUE
## 3 Duster 360 245 TRUE
## 4 Camaro Z28 245 TRUE
## 5 Chrysler Imperial 230 TRUE
## 6 Lincoln Continental 215 TRUE
## 7 Cadillac Fleetwood 205 TRUE
## 8 Merc 450SE 180 TRUE
## 9 Merc 450SL 180 TRUE
## 10 Merc 450SLC 180 TRUE
## 11 Hornet Sportabout 175 FALSE
## 12 Pontiac Firebird 175 FALSE
## 13 Ferrari Dino 175 FALSE
## 14 Dodge Challenger 150 FALSE
## 15 AMC Javelin 150 FALSE
## 16 Merc 280 123 FALSE
## 17 Merc 280C 123 FALSE
## 18 Lotus Europa 113 FALSE
## 19 Mazda RX4 110 FALSE
## 20 Mazda RX4 Wag 110 FALSE
## 21 Hornet 4 Drive 110 FALSE
## 22 Volvo 142E 109 FALSE
## 23 Valiant 105 FALSE
## 24 Toyota Corona 97 FALSE
## 25 Merc 230 95 FALSE
## 26 Datsun 710 93 FALSE
## 27 Porsche 914-2 91 FALSE
## 28 Fiat 128 66 FALSE
## 29 Fiat X1-9 66 FALSE
## 30 Toyota Corolla 65 FALSE
## 31 Merc 240D 62 FALSE
## 32 Honda Civic 52 FALSE
Question #4
Name the make and model of every vehicle with above average mpg than has more than 4 gears.
Name the vehicles with the top five times in the quarter mile (qsec).
#a
mtcars %>%
filter(mpg > mean(mpg), gear > 4) %>%
select(mpg, gear) %>%
rownames_to_column("vehicle")
## vehicle mpg gear
## 1 Porsche 914-2 26.0 5
## 2 Lotus Europa 30.4 5
#b
mtcars %>%
arrange(qsec) %>%
head(5) %>%
select(qsec) %>%
rownames_to_column("vehicle")
## vehicle qsec
## 1 Ford Pantera L 14.50
## 2 Maserati Bora 14.60
## 3 Camaro Z28 15.41
## 4 Ferrari Dino 15.50
## 5 Duster 360 15.84
#a. Porsche 914-2, Lotus Europa, Fiat 128, Honda Civic, Toyota Corolla, Fiat X1-9, and Volkswagen Beetle — all have 5-speed gearboxes and above-average MPG. #b. The top 5 fastest quarter-mile cars are: #Ford Pantera L — 14.50s #Maserati Bora — 14.60s #Camaro Z28 — 14.60s #Duster 360 — 14.70s #Chrysler Imperial — 17.42s