library(tidyverse)
library(openintro)
data(nycflights)

Exercise 1

Look carefully at these three histograms. How do they compare? Are features revealed in one that are obscured in another?

ggplot(data = nycflights, aes(x = dep_delay)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data = nycflights, aes(x = dep_delay)) +
  geom_histogram(binwidth = 15)

ggplot(data = nycflights, aes(x = dep_delay)) +
  geom_histogram(binwidth = 150)

These plots cut the same data into bins of different width. The larger bins are perhaps easier to read, but the smaller bins reveal with greater specificity that most flights left with a delay of 15 minutes or less. In the large binwidth graph, it’s unclear that many flights leave a bit early.

Exercise 2

Create a new data frame that includes flights headed to SFO in February, and save this data frame as sfo_feb_flights. How many flights meet these criteria?

sfo_feb_flights <- nycflights %>%
  filter(month == 2,
         dest == 'SFO')

68 Flights meet the criteria.

Exercise 3

Describe the distribution of the arrival delays of these flights using a histogram and appropriate summary statistics. Hint: The summary statistics you use should depend on the shape of the distribution.

ggplot(sfo_feb_flights, aes(x = arr_delay)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

sfo_stats <- sfo_feb_flights %>% 
  summarise(mean_arr_del = mean(arr_delay),
            med_arr_del = median(arr_delay),
            sd_arr_del = sd(arr_delay),
            var_arr_del = var(arr_delay),
            iqr_arr_del = IQR(arr_delay))

This group is distributed monomodally and skewed right. The majority of observations arrived early.

Exercise 4

Calculate the median and interquartile range for arr_delays of flights in in the sfo_feb_flights data frame, grouped by carrier. Which carrier has the most variable arrival delays?

sfo_feb_flights %>%
  group_by(carrier) %>%
  summarize(var_arr_delay = mean(var(arr_delay))) %>%
  arrange(desc(var_arr_delay))
## # A tibble: 5 x 2
##   carrier var_arr_delay
##   <chr>           <dbl>
## 1 UA              2335.
## 2 VX              1669.
## 3 AA               868.
## 4 DL               485.
## 5 B6               121.

United Airlines has the highest variance for its arrival times.

Exercise 5

Suppose you really dislike departure delays and you want to schedule your travel in a month that minimizes your potential departure delay leaving NYC. One option is to choose the month with the lowest mean departure delay. Another option is to choose the month with the lowest median departure delay. What are the pros and cons of these two choices?

The mean will give you an average amount of waiting time for the whole set, but the median can tell you more about how likely it is for a flight to be delayed for a given amount of time.

Exercies 6

If you were selecting an airport simply based on on time departure percentage, which NYC airport would you choose to fly out of?

nycflights %>% 
  group_by(origin) %>%
  summarize(avg_departure = mean(dep_delay)) %>%
  arrange(avg_departure)
## # A tibble: 3 x 2
##   origin avg_departure
##   <chr>          <dbl>
## 1 LGA             10.1
## 2 JFK             12.3
## 3 EWR             15.3

I would select La Guardia!

Exercise 7

Mutate the data frame so that it includes a new variable that contains the average speed, avg_speed traveled by the plane for each flight (in mph). Hint: Average speed can be calculated as distance divided by number of hours of travel, and note that air_time is given in minutes.

nycflights <- nycflights %>%
  mutate(avg_speed = distance / air_time * 60)

Exercise 8

Make a scatterplot of avg_speed vs. distance. Describe the relationship between average speed and distance. Hint: Use geom_point().

nycflights %>% ggplot() +
  geom_point(aes(x = avg_speed, y = distance, color = carrier))

The relationship is non-linear and skewed left.

Exercise 9

Replicate the following plot. Hint: The data frame plotted only contains flights from American Airlines, Delta Airlines, and United Airlines, and the points are colored by carrier. Once you replicate the plot, determine (roughly) what the cutoff point is for departure delays where you can still expect to get to your destination on time.

special_group <- nycflights %>% 
  filter(carrier == c('AA', 'DL', 'UA'))
## Warning in carrier == c("AA", "DL", "UA"): longer object length is not a
## multiple of shorter object length
special_group %>%
  ggplot() +
  geom_point(aes(x = dep_delay, y = arr_delay, color = carrier))

special_group %>%
  filter(arr_delay <= 0) %>%
  group_by(carrier) %>%
  summarize(cutoff = mean(dep_delay)) %>%
  arrange(desc(cutoff))
## # A tibble: 3 x 2
##   carrier cutoff
##   <chr>    <dbl>
## 1 UA      -0.274
## 2 DL      -2.17 
## 3 AA      -2.88

It appears that these three companies only tend to arrive on time if they leave early.

LS0tCnRpdGxlOiAiTGFiIDI6IEludHJvZHVjdGlvbiB0byBEYXRhIgphdXRob3I6ICJTYW0gUmVldmVzIgpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiCm91dHB1dDogb3BlbmludHJvOjpsYWJfcmVwb3J0Ci0tLQoKYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkob3BlbmludHJvKQpkYXRhKG55Y2ZsaWdodHMpCmBgYAoKIyMgRXhlcmNpc2UgMQoKTG9vayBjYXJlZnVsbHkgYXQgdGhlc2UgdGhyZWUgaGlzdG9ncmFtcy4gSG93IGRvIHRoZXkgY29tcGFyZT8gQXJlIGZlYXR1cmVzIHJldmVhbGVkIGluIG9uZSB0aGF0IGFyZSBvYnNjdXJlZCBpbiBhbm90aGVyPwoKYGBge3IgY29kZS1jaHVuay1sYWJlbH0KZ2dwbG90KGRhdGEgPSBueWNmbGlnaHRzLCBhZXMoeCA9IGRlcF9kZWxheSkpICsKICBnZW9tX2hpc3RvZ3JhbSgpCgpnZ3Bsb3QoZGF0YSA9IG55Y2ZsaWdodHMsIGFlcyh4ID0gZGVwX2RlbGF5KSkgKwogIGdlb21faGlzdG9ncmFtKGJpbndpZHRoID0gMTUpCgpnZ3Bsb3QoZGF0YSA9IG55Y2ZsaWdodHMsIGFlcyh4ID0gZGVwX2RlbGF5KSkgKwogIGdlb21faGlzdG9ncmFtKGJpbndpZHRoID0gMTUwKQpgYGAKClRoZXNlIHBsb3RzIGN1dCB0aGUgc2FtZSBkYXRhIGludG8gYmlucyBvZiBkaWZmZXJlbnQgd2lkdGguICBUaGUgbGFyZ2VyIGJpbnMgYXJlIHBlcmhhcHMgZWFzaWVyIHRvIHJlYWQsIGJ1dCB0aGUgc21hbGxlciBiaW5zIHJldmVhbCB3aXRoIGdyZWF0ZXIgc3BlY2lmaWNpdHkgdGhhdCBtb3N0IGZsaWdodHMgbGVmdCB3aXRoIGEgZGVsYXkgb2YgMTUgbWludXRlcyBvciBsZXNzLiAgSW4gdGhlIGxhcmdlIGJpbndpZHRoIGdyYXBoLCBpdCdzIHVuY2xlYXIgdGhhdCBtYW55IGZsaWdodHMgbGVhdmUgYSBiaXQgZWFybHkuCgojIyBFeGVyY2lzZSAyCgpDcmVhdGUgYSBuZXcgZGF0YSBmcmFtZSB0aGF0IGluY2x1ZGVzIGZsaWdodHMgaGVhZGVkIHRvIFNGTyBpbiBGZWJydWFyeSwgYW5kIHNhdmUgdGhpcyBkYXRhIGZyYW1lIGFzIHNmb19mZWJfZmxpZ2h0cy4gSG93IG1hbnkgZmxpZ2h0cyBtZWV0IHRoZXNlIGNyaXRlcmlhPwoKYGBge3J9CnNmb19mZWJfZmxpZ2h0cyA8LSBueWNmbGlnaHRzICU+JQogIGZpbHRlcihtb250aCA9PSAyLAogICAgICAgICBkZXN0ID09ICdTRk8nKQpgYGAKCjY4IEZsaWdodHMgbWVldCB0aGUgY3JpdGVyaWEuCgojIyBFeGVyY2lzZSAzCgpEZXNjcmliZSB0aGUgZGlzdHJpYnV0aW9uIG9mIHRoZSBhcnJpdmFsIGRlbGF5cyBvZiB0aGVzZSBmbGlnaHRzIHVzaW5nIGEgaGlzdG9ncmFtIGFuZCBhcHByb3ByaWF0ZSBzdW1tYXJ5IHN0YXRpc3RpY3MuIEhpbnQ6IFRoZSBzdW1tYXJ5IHN0YXRpc3RpY3MgeW91IHVzZSBzaG91bGQgZGVwZW5kIG9uIHRoZSBzaGFwZSBvZiB0aGUgZGlzdHJpYnV0aW9uLgoKYGBge3J9CmdncGxvdChzZm9fZmViX2ZsaWdodHMsIGFlcyh4ID0gYXJyX2RlbGF5KSkgKyBnZW9tX2hpc3RvZ3JhbSgpCgpzZm9fc3RhdHMgPC0gc2ZvX2ZlYl9mbGlnaHRzICU+JSAKICBzdW1tYXJpc2UobWVhbl9hcnJfZGVsID0gbWVhbihhcnJfZGVsYXkpLAogICAgICAgICAgICBtZWRfYXJyX2RlbCA9IG1lZGlhbihhcnJfZGVsYXkpLAogICAgICAgICAgICBzZF9hcnJfZGVsID0gc2QoYXJyX2RlbGF5KSwKICAgICAgICAgICAgdmFyX2Fycl9kZWwgPSB2YXIoYXJyX2RlbGF5KSwKICAgICAgICAgICAgaXFyX2Fycl9kZWwgPSBJUVIoYXJyX2RlbGF5KSkKYGBgClRoaXMgZ3JvdXAgaXMgZGlzdHJpYnV0ZWQgbW9ub21vZGFsbHkgYW5kIHNrZXdlZCByaWdodC4gIFRoZSBtYWpvcml0eSBvZiBvYnNlcnZhdGlvbnMgYXJyaXZlZCBlYXJseS4KCiMjIEV4ZXJjaXNlIDQKCkNhbGN1bGF0ZSB0aGUgbWVkaWFuIGFuZCBpbnRlcnF1YXJ0aWxlIHJhbmdlIGZvciBhcnJfZGVsYXlzIG9mIGZsaWdodHMgaW4gaW4gdGhlIHNmb19mZWJfZmxpZ2h0cyBkYXRhIGZyYW1lLCBncm91cGVkIGJ5IGNhcnJpZXIuIFdoaWNoIGNhcnJpZXIgaGFzIHRoZSBtb3N0IHZhcmlhYmxlIGFycml2YWwgZGVsYXlzPwoKYGBge3J9CnNmb19mZWJfZmxpZ2h0cyAlPiUKICBncm91cF9ieShjYXJyaWVyKSAlPiUKICBzdW1tYXJpemUodmFyX2Fycl9kZWxheSA9IG1lYW4odmFyKGFycl9kZWxheSkpKSAlPiUKICBhcnJhbmdlKGRlc2ModmFyX2Fycl9kZWxheSkpCmBgYApVbml0ZWQgQWlybGluZXMgaGFzIHRoZSBoaWdoZXN0IHZhcmlhbmNlIGZvciBpdHMgYXJyaXZhbCB0aW1lcy4KCiMjIEV4ZXJjaXNlIDUKClN1cHBvc2UgeW91IHJlYWxseSBkaXNsaWtlIGRlcGFydHVyZSBkZWxheXMgYW5kIHlvdSB3YW50IHRvIHNjaGVkdWxlIHlvdXIgdHJhdmVsIGluIGEgbW9udGggdGhhdCBtaW5pbWl6ZXMgeW91ciBwb3RlbnRpYWwgZGVwYXJ0dXJlIGRlbGF5IGxlYXZpbmcgTllDLiBPbmUgb3B0aW9uIGlzIHRvIGNob29zZSB0aGUgbW9udGggd2l0aCB0aGUgbG93ZXN0IG1lYW4gZGVwYXJ0dXJlIGRlbGF5LiBBbm90aGVyIG9wdGlvbiBpcyB0byBjaG9vc2UgdGhlIG1vbnRoIHdpdGggdGhlIGxvd2VzdCBtZWRpYW4gZGVwYXJ0dXJlIGRlbGF5LiBXaGF0IGFyZSB0aGUgcHJvcyBhbmQgY29ucyBvZiB0aGVzZSB0d28gY2hvaWNlcz8KClRoZSBtZWFuIHdpbGwgZ2l2ZSB5b3UgYW4gYXZlcmFnZSBhbW91bnQgb2Ygd2FpdGluZyB0aW1lIGZvciB0aGUgd2hvbGUgc2V0LCBidXQgdGhlIG1lZGlhbiBjYW4gdGVsbCB5b3UgbW9yZSBhYm91dCBob3cgbGlrZWx5IGl0IGlzIGZvciBhIGZsaWdodCB0byBiZSBkZWxheWVkIGZvciBhIGdpdmVuIGFtb3VudCBvZiB0aW1lLgoKIyMgRXhlcmNpZXMgNgoKSWYgeW91IHdlcmUgc2VsZWN0aW5nIGFuIGFpcnBvcnQgc2ltcGx5IGJhc2VkIG9uIG9uIHRpbWUgZGVwYXJ0dXJlIHBlcmNlbnRhZ2UsIHdoaWNoIE5ZQyBhaXJwb3J0IHdvdWxkIHlvdSBjaG9vc2UgdG8gZmx5IG91dCBvZj8KCmBgYHtyfQpueWNmbGlnaHRzICU+JSAKICBncm91cF9ieShvcmlnaW4pICU+JQogIHN1bW1hcml6ZShhdmdfZGVwYXJ0dXJlID0gbWVhbihkZXBfZGVsYXkpKSAlPiUKICBhcnJhbmdlKGF2Z19kZXBhcnR1cmUpCiAgCmBgYApJIHdvdWxkIHNlbGVjdCBMYSBHdWFyZGlhIQoKIyMgRXhlcmNpc2UgNwoKTXV0YXRlIHRoZSBkYXRhIGZyYW1lIHNvIHRoYXQgaXQgaW5jbHVkZXMgYSBuZXcgdmFyaWFibGUgdGhhdCBjb250YWlucyB0aGUgYXZlcmFnZSBzcGVlZCwgYXZnX3NwZWVkIHRyYXZlbGVkIGJ5IHRoZSBwbGFuZSBmb3IgZWFjaCBmbGlnaHQgKGluIG1waCkuIEhpbnQ6IEF2ZXJhZ2Ugc3BlZWQgY2FuIGJlIGNhbGN1bGF0ZWQgYXMgZGlzdGFuY2UgZGl2aWRlZCBieSBudW1iZXIgb2YgaG91cnMgb2YgdHJhdmVsLCBhbmQgbm90ZSB0aGF0IGFpcl90aW1lIGlzIGdpdmVuIGluIG1pbnV0ZXMuCgpgYGB7cn0KbnljZmxpZ2h0cyA8LSBueWNmbGlnaHRzICU+JQogIG11dGF0ZShhdmdfc3BlZWQgPSBkaXN0YW5jZSAvIGFpcl90aW1lICogNjApCmBgYAoKIyMgRXhlcmNpc2UgOAoKTWFrZSBhIHNjYXR0ZXJwbG90IG9mIGF2Z19zcGVlZCB2cy4gZGlzdGFuY2UuIERlc2NyaWJlIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBhdmVyYWdlIHNwZWVkIGFuZCBkaXN0YW5jZS4gSGludDogVXNlIGdlb21fcG9pbnQoKS4KCmBgYHtyfQpueWNmbGlnaHRzICU+JSBnZ3Bsb3QoKSArCiAgZ2VvbV9wb2ludChhZXMoeCA9IGF2Z19zcGVlZCwgeSA9IGRpc3RhbmNlLCBjb2xvciA9IGNhcnJpZXIpKQpgYGAKVGhlIHJlbGF0aW9uc2hpcCBpcyBub24tbGluZWFyIGFuZCBza2V3ZWQgbGVmdC4KCiMjIEV4ZXJjaXNlIDkKClJlcGxpY2F0ZSB0aGUgZm9sbG93aW5nIHBsb3QuIEhpbnQ6IFRoZSBkYXRhIGZyYW1lIHBsb3R0ZWQgb25seSBjb250YWlucyBmbGlnaHRzIGZyb20gQW1lcmljYW4gQWlybGluZXMsIERlbHRhIEFpcmxpbmVzLCBhbmQgVW5pdGVkIEFpcmxpbmVzLCBhbmQgdGhlIHBvaW50cyBhcmUgY29sb3JlZCBieSBjYXJyaWVyLiBPbmNlIHlvdSByZXBsaWNhdGUgdGhlIHBsb3QsIGRldGVybWluZSAocm91Z2hseSkgd2hhdCB0aGUgY3V0b2ZmIHBvaW50IGlzIGZvciBkZXBhcnR1cmUgZGVsYXlzIHdoZXJlIHlvdSBjYW4gc3RpbGwgZXhwZWN0IHRvIGdldCB0byB5b3VyIGRlc3RpbmF0aW9uIG9uIHRpbWUuCgpgYGB7cn0Kc3BlY2lhbF9ncm91cCA8LSBueWNmbGlnaHRzICU+JSAKICBmaWx0ZXIoY2FycmllciA9PSBjKCdBQScsICdETCcsICdVQScpKQoKc3BlY2lhbF9ncm91cCAlPiUKICBnZ3Bsb3QoKSArCiAgZ2VvbV9wb2ludChhZXMoeCA9IGRlcF9kZWxheSwgeSA9IGFycl9kZWxheSwgY29sb3IgPSBjYXJyaWVyKSkKCnNwZWNpYWxfZ3JvdXAgJT4lCiAgZmlsdGVyKGFycl9kZWxheSA8PSAwKSAlPiUKICBncm91cF9ieShjYXJyaWVyKSAlPiUKICBzdW1tYXJpemUoY3V0b2ZmID0gbWVhbihkZXBfZGVsYXkpKSAlPiUKICBhcnJhbmdlKGRlc2MoY3V0b2ZmKSkKYGBgCkl0IGFwcGVhcnMgdGhhdCB0aGVzZSB0aHJlZSBjb21wYW5pZXMgb25seSB0ZW5kIHRvIGFycml2ZSBvbiB0aW1lIGlmIHRoZXkgbGVhdmUgZWFybHku