HW 2

Section 5.2.4

1a.

Delay of two or more hours:

filter(flights, arr_delay >= 120)

## # A tibble: 10,200 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     1     1      811        630     101    1047     830     137 MQ     
##  2  2013     1     1      848       1835     853    1001    1950     851 MQ     
##  3  2013     1     1      957        733     144    1056     853     123 UA     
##  4  2013     1     1     1114        900     134    1447    1222     145 UA     
##  5  2013     1     1     1505       1310     115    1638    1431     127 EV     
##  6  2013     1     1     1525       1340     105    1831    1626     125 B6     
##  7  2013     1     1     1549       1445      64    1912    1656     136 EV     
##  8  2013     1     1     1558       1359     119    1718    1515     123 EV     
##  9  2013     1     1     1732       1630      62    2028    1825     123 EV     
## 10  2013     1     1     1803       1620     103    2008    1750     138 MQ     
## # … with 10,190 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

1b.

Flew to Houston:

filter(flights, dest == "IAH" | dest == "HOU")

## # A tibble: 9,313 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     1     1      517        515       2     830     819      11 UA     
##  2  2013     1     1      533        529       4     850     830      20 UA     
##  3  2013     1     1      623        627      -4     933     932       1 UA     
##  4  2013     1     1      728        732      -4    1041    1038       3 UA     
##  5  2013     1     1      739        739       0    1104    1038      26 UA     
##  6  2013     1     1      908        908       0    1228    1219       9 UA     
##  7  2013     1     1     1028       1026       2    1350    1339      11 UA     
##  8  2013     1     1     1044       1045      -1    1352    1351       1 UA     
##  9  2013     1     1     1114        900     134    1447    1222     145 UA     
## 10  2013     1     1     1205       1200       5    1503    1505      -2 UA     
## # … with 9,303 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

1c.

Flights operated by United (UA), American (AA), or Delta (DL)

filter(flights, carrier %in% c("AA", "DL", "UA"))

## # A tibble: 139,504 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     1     1      517        515       2     830     819      11 UA     
##  2  2013     1     1      533        529       4     850     830      20 UA     
##  3  2013     1     1      542        540       2     923     850      33 AA     
##  4  2013     1     1      554        600      -6     812     837     -25 DL     
##  5  2013     1     1      554        558      -4     740     728      12 UA     
##  6  2013     1     1      558        600      -2     753     745       8 AA     
##  7  2013     1     1      558        600      -2     924     917       7 UA     
##  8  2013     1     1      558        600      -2     923     937     -14 UA     
##  9  2013     1     1      559        600      -1     941     910      31 AA     
## 10  2013     1     1      559        600      -1     854     902      -8 UA     
## # … with 139,494 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

1d.

Departed in July, August, or September

filter(flights, month >= 7, month >= 9)

## # A tibble: 111,866 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013    10     1      447        500     -13     614     648     -34 US     
##  2  2013    10     1      522        517       5     735     757     -22 UA     
##  3  2013    10     1      536        545      -9     809     855     -46 AA     
##  4  2013    10     1      539        545      -6     801     827     -26 UA     
##  5  2013    10     1      539        545      -6     917     933     -16 B6     
##  6  2013    10     1      544        550      -6     912     932     -20 B6     
##  7  2013    10     1      549        600     -11     653     716     -23 EV     
##  8  2013    10     1      550        600     -10     648     700     -12 US     
##  9  2013    10     1      550        600     -10     649     659     -10 US     
## 10  2013    10     1      551        600      -9     727     730      -3 UA     
## # … with 111,856 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

1e.

Arrived more than two hours late but left on time

filter(flights, arr_delay > 120, dep_delay <=0)

## # A tibble: 29 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     1    27     1419       1420      -1    1754    1550     124 MQ     
##  2  2013    10     7     1350       1350       0    1736    1526     130 EV     
##  3  2013    10     7     1357       1359      -2    1858    1654     124 AA     
##  4  2013    10    16      657        700      -3    1258    1056     122 B6     
##  5  2013    11     1      658        700      -2    1329    1015     194 VX     
##  6  2013     3    18     1844       1847      -3      39    2219     140 UA     
##  7  2013     4    17     1635       1640      -5    2049    1845     124 MQ     
##  8  2013     4    18      558        600      -2    1149     850     179 AA     
##  9  2013     4    18      655        700      -5    1213     950     143 AA     
## 10  2013     5    22     1827       1830      -3    2217    2010     127 MQ     
## # … with 19 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

1f.

Delayed by at least an hour, but made up over 30 minutes in flight

filter(flights, dep_delay - arr_delay > 30, dep_delay >= 60)

## # A tibble: 1,844 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     1     1     2205       1720     285      46    2040     246 AA     
##  2  2013     1     1     2326       2130     116     131      18      73 B6     
##  3  2013     1     3     1503       1221     162    1803    1555     128 UA     
##  4  2013     1     3     1839       1700      99    2056    1950      66 AA     
##  5  2013     1     3     1850       1745      65    2148    2120      28 AA     
##  6  2013     1     3     1941       1759     102    2246    2139      67 UA     
##  7  2013     1     3     1950       1845      65    2228    2227       1 B6     
##  8  2013     1     3     2015       1915      60    2135    2111      24 9E     
##  9  2013     1     3     2257       2000     177      45    2224     141 9E     
## 10  2013     1     4     1917       1700     137    2135    1950     105 AA     
## # … with 1,834 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

1g.

Departed between midnight and 6 am

filter(flights, dep_time <= 600 | dep_time == 2400)

## # A tibble: 9,373 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     1     1      517        515       2     830     819      11 UA     
##  2  2013     1     1      533        529       4     850     830      20 UA     
##  3  2013     1     1      542        540       2     923     850      33 AA     
##  4  2013     1     1      544        545      -1    1004    1022     -18 B6     
##  5  2013     1     1      554        600      -6     812     837     -25 DL     
##  6  2013     1     1      554        558      -4     740     728      12 UA     
##  7  2013     1     1      555        600      -5     913     854      19 B6     
##  8  2013     1     1      557        600      -3     709     723     -14 EV     
##  9  2013     1     1      557        600      -3     838     846      -8 B6     
## 10  2013     1     1      558        600      -2     753     745       8 AA     
## # … with 9,363 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

The between function helps simplify statements that deal with >= and <=, such as what we used in 1d above.

filter(flights, between(month, 7, 9))

## # A tibble: 86,326 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     7     1        1       2029     212     236    2359     157 B6     
##  2  2013     7     1        2       2359       3     344     344       0 B6     
##  3  2013     7     1       29       2245     104     151       1     110 B6     
##  4  2013     7     1       43       2130     193     322      14     188 B6     
##  5  2013     7     1       44       2150     174     300     100     120 AA     
##  6  2013     7     1       46       2051     235     304    2358     186 B6     
##  7  2013     7     1       48       2001     287     308    2305     243 VX     
##  8  2013     7     1       58       2155     183     335      43     172 B6     
##  9  2013     7     1      100       2146     194     327      30     177 B6     
## 10  2013     7     1      100       2245     135     337     135     122 B6     
## # … with 86,316 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Number of flights with a missing dep_time:

filter(flights, is.na(dep_time))

## # A tibble: 8,255 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     1     1       NA       1630      NA      NA    1815      NA EV     
##  2  2013     1     1       NA       1935      NA      NA    2240      NA AA     
##  3  2013     1     1       NA       1500      NA      NA    1825      NA AA     
##  4  2013     1     1       NA        600      NA      NA     901      NA B6     
##  5  2013     1     2       NA       1540      NA      NA    1747      NA EV     
##  6  2013     1     2       NA       1620      NA      NA    1746      NA EV     
##  7  2013     1     2       NA       1355      NA      NA    1459      NA EV     
##  8  2013     1     2       NA       1420      NA      NA    1644      NA EV     
##  9  2013     1     2       NA       1321      NA      NA    1536      NA EV     
## 10  2013     1     2       NA       1545      NA      NA    1910      NA AA     
## # … with 8,245 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

It appears that the arr_time variable is missing from these flights, possibly indicating that these flights were cancelled.

5.3.1

Most delayed flight:

arrange(flights, desc(dep_delay))

## # A tibble: 336,776 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     1     9      641        900    1301    1242    1530    1272 HA     
##  2  2013     6    15     1432       1935    1137    1607    2120    1127 MQ     
##  3  2013     1    10     1121       1635    1126    1239    1810    1109 MQ     
##  4  2013     9    20     1139       1845    1014    1457    2210    1007 AA     
##  5  2013     7    22      845       1600    1005    1044    1815     989 MQ     
##  6  2013     4    10     1100       1900     960    1342    2211     931 DL     
##  7  2013     3    17     2321        810     911     135    1020     915 DL     
##  8  2013     6    27      959       1900     899    1236    2226     850 DL     
##  9  2013     7    22     2257        759     898     121    1026     895 DL     
## 10  2013    12     5      756       1700     896    1058    2020     878 AA     
## # … with 336,766 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Earliest delayed flight:

arrange(flights, dep_delay)

## # A tibble: 336,776 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013    12     7     2040       2123     -43      40    2352      48 B6     
##  2  2013     2     3     2022       2055     -33    2240    2338     -58 DL     
##  3  2013    11    10     1408       1440     -32    1549    1559     -10 EV     
##  4  2013     1    11     1900       1930     -30    2233    2243     -10 DL     
##  5  2013     1    29     1703       1730     -27    1947    1957     -10 F9     
##  6  2013     8     9      729        755     -26    1002     955       7 MQ     
##  7  2013    10    23     1907       1932     -25    2143    2143       0 EV     
##  8  2013     3    30     2030       2055     -25    2213    2250     -37 MQ     
##  9  2013     3     2     1431       1455     -24    1601    1631     -30 9E     
## 10  2013     5     5      934        958     -24    1225    1309     -44 B6     
## # … with 336,766 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Fastest flights (in terms of shortest flight). (Not sure if you were looking for fastest in terms of speed or time?)

head(arrange(flights, air_time))

## # A tibble: 6 × 19
##    year month   day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##   <int> <int> <int>    <int>       <int>   <dbl>   <int>   <int>   <dbl> <chr>  
## 1  2013     1    16     1355        1315      40    1442    1411      31 EV     
## 2  2013     4    13      537         527      10     622     628      -6 EV     
## 3  2013    12     6      922         851      31    1021     954      27 EV     
## 4  2013     2     3     2153        2129      24    2247    2224      23 EV     
## 5  2013     2     5     1303        1315     -12    1342    1411     -29 EV     
## 6  2013     2    12     2123        2130      -7    2211    2225     -14 EV     
## # … with 9 more variables: flight <int>, tailnum <chr>, origin <chr>,
## #   dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
## #   time_hour <dttm>, and abbreviated variable names ¹sched_dep_time,
## #   ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `colnames()` to see all variable names

Traveled the longest:

arrange(flights, desc(distance))

## # A tibble: 336,776 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     1     1      857        900      -3    1516    1530     -14 HA     
##  2  2013     1     2      909        900       9    1525    1530      -5 HA     
##  3  2013     1     3      914        900      14    1504    1530     -26 HA     
##  4  2013     1     4      900        900       0    1516    1530     -14 HA     
##  5  2013     1     5      858        900      -2    1519    1530     -11 HA     
##  6  2013     1     6     1019        900      79    1558    1530      28 HA     
##  7  2013     1     7     1042        900     102    1620    1530      50 HA     
##  8  2013     1     8      901        900       1    1504    1530     -26 HA     
##  9  2013     1     9      641        900    1301    1242    1530    1272 HA     
## 10  2013     1    10      859        900      -1    1449    1530     -41 HA     
## # … with 336,766 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Traveled the shortest:

arrange(flights, distance)

## # A tibble: 336,776 × 19
##     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
##    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
##  1  2013     7    27       NA        106      NA      NA     245      NA US     
##  2  2013     1     3     2127       2129      -2    2222    2224      -2 EV     
##  3  2013     1     4     1240       1200      40    1333    1306      27 EV     
##  4  2013     1     4     1829       1615     134    1937    1721     136 EV     
##  5  2013     1     4     2128       2129      -1    2218    2224      -6 EV     
##  6  2013     1     5     1155       1200      -5    1241    1306     -25 EV     
##  7  2013     1     6     2125       2129      -4    2224    2224       0 EV     
##  8  2013     1     7     2124       2129      -5    2212    2224     -12 EV     
##  9  2013     1     8     2127       2130      -3    2304    2225      39 EV     
## 10  2013     1     9     2126       2129      -3    2217    2224      -7 EV     
## # … with 336,766 more rows, 9 more variables: flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
## #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Section 5.4.1

Brainstorm:

dplyr::select(flights, dep_time, dep_delay, arr_time, arr_delay)

## # A tibble: 336,776 × 4
##    dep_time dep_delay arr_time arr_delay
##       <int>     <dbl>    <int>     <dbl>
##  1      517         2      830        11
##  2      533         4      850        20
##  3      542         2      923        33
##  4      544        -1     1004       -18
##  5      554        -6      812       -25
##  6      554        -4      740        12
##  7      555        -5      913        19
##  8      557        -3      709       -14
##  9      557        -3      838        -8
## 10      558        -2      753         8
## # … with 336,766 more rows
## # ℹ Use `print(n = ...)` to see more rows

dplyr::select(flights, starts_with("dep_"), starts_with("arr_"))

## # A tibble: 336,776 × 4
##    dep_time dep_delay arr_time arr_delay
##       <int>     <dbl>    <int>     <dbl>
##  1      517         2      830        11
##  2      533         4      850        20
##  3      542         2      923        33
##  4      544        -1     1004       -18
##  5      554        -6      812       -25
##  6      554        -4      740        12
##  7      555        -5      913        19
##  8      557        -3      709       -14
##  9      557        -3      838        -8
## 10      558        -2      753         8
## # … with 336,766 more rows
## # ℹ Use `print(n = ...)` to see more rows

brainstorm <- c("dep_time", "dep_delay", "arr_time", "arr_delay")
dplyr::select(flights, brainstorm)

## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(brainstorm)` instead of `brainstorm` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

## # A tibble: 336,776 × 4
##    dep_time dep_delay arr_time arr_delay
##       <int>     <dbl>    <int>     <dbl>
##  1      517         2      830        11
##  2      533         4      850        20
##  3      542         2      923        33
##  4      544        -1     1004       -18
##  5      554        -6      812       -25
##  6      554        -4      740        12
##  7      555        -5      913        19
##  8      557        -3      709       -14
##  9      557        -3      838        -8
## 10      558        -2      753         8
## # … with 336,766 more rows
## # ℹ Use `print(n = ...)` to see more rows

Given code:

dplyr::select(flights, contains("TIME"))

## # A tibble: 336,776 × 6
##    dep_time sched_dep_time arr_time sched_arr_time air_time time_hour          
##       <int>          <int>    <int>          <int>    <dbl> <dttm>             
##  1      517            515      830            819      227 2013-01-01 05:00:00
##  2      533            529      850            830      227 2013-01-01 05:00:00
##  3      542            540      923            850      160 2013-01-01 05:00:00
##  4      544            545     1004           1022      183 2013-01-01 05:00:00
##  5      554            600      812            837      116 2013-01-01 06:00:00
##  6      554            558      740            728      150 2013-01-01 05:00:00
##  7      555            600      913            854      158 2013-01-01 06:00:00
##  8      557            600      709            723       53 2013-01-01 06:00:00
##  9      557            600      838            846      140 2013-01-01 06:00:00
## 10      558            600      753            745      138 2013-01-01 06:00:00
## # … with 336,766 more rows
## # ℹ Use `print(n = ...)` to see more rows

From the code, it appears that with using contain, case will be ignored by default. If we wanted to change that, we would have to add to our code to specify that request.

Example:

dplyr::select(flights, contains("TIME", ignore.case = FALSE))

## # A tibble: 336,776 × 0
## # ℹ Use `print(n = ...)` to see more rows

HW 2

Bilton Fieldsend

2022-09-01

Section 5.2.4

5.3.1

Section 5.4.1