In this workshop we will do some of the exercises from Chapter 5 of R4DS.
Use a separate code block for each exercise.
for example: 1. Find all flights that had an arrival delay of two or more hours.
flights %>%
filter(arr_delay >= 120) # note delays are in minutes
## # A tibble: 10,200 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 811 630 101 1047 830
## 2 2013 1 1 848 1835 853 1001 1950
## 3 2013 1 1 957 733 144 1056 853
## 4 2013 1 1 1114 900 134 1447 1222
## 5 2013 1 1 1505 1310 115 1638 1431
## 6 2013 1 1 1525 1340 105 1831 1626
## 7 2013 1 1 1549 1445 64 1912 1656
## 8 2013 1 1 1558 1359 119 1718 1515
## 9 2013 1 1 1732 1630 62 2028 1825
## 10 2013 1 1 1803 1620 103 2008 1750
## # … with 10,190 more rows, and 11 more variables: arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Find all United Airline flights to “IAH”. (note this is not exactly of the book exercises.)
Find all flights that arrived more than two hours late, but didn’t leave late
Find all flights the departed between midnight and 6am (inclusive).
Why is NA ^ 0 not missing? Why is NA | TRUE not missing? Why is FALSE & NA not missing? Can you figure out the general rule? (NA * 0 is a tricky counterexample!) How does this differ with normal practice in Mathematics?
Find the 5 most delayed flights. (not exactly the same as book)
Using a helper function, extract all columns concerning time.
Currently dep_time and sched_dep_time are convenient to look at, but hard to compute with because they’re not really continuous numbers. Convert them to a more convenient representation of number of minutes since midnight. (hint: you may need to make a function and also use maodular arithmetic as I mentioned last week.)
9.Compare air_time with arr_time - dep_time. What do you expect to see? What do you see? What do you need to do to fix it?
Explain the use of the pipe function %>% How does the design of the tidyverse facillitate the use of pipes?
Brainstorm at least 5 different ways to assess the typical delay characteristics of a group of flights. Consider the following scenarios: