library(tidyverse)
library(openintro)
library(ggplot2)

Exercise 1

Observing the 3 histograms we can see with bin size set to 30 and 150 not much is revealed. However, when we set our bin size to 15 the first bin does not have the most common frequency. The most common frequency is around 0, while there is a smaller set of times less than 0.

ggplot(data = nycflights, aes(x = dep_delay)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data = nycflights, aes(x = dep_delay)) +
  geom_histogram(binwidth = 15)

ggplot(data = nycflights, aes(x = dep_delay)) +
  geom_histogram(binwidth = 150)

### Exercise 2

3,563 flights meet this criteria.

sfo_feb_flights <- nycflights %>%
  filter(dest == "SFO" | month == 2)

sfo_feb_flights %>%
  summarise(count = n())
## # A tibble: 1 x 1
##   count
##   <int>
## 1  3563

Exercise 3

Interestingly the distribution peaks < 0. For summary statistics I want to look at the median because there is a long tail with larger arrival delays. The median comes out to be -4 which is around what I expect.

ggplot(data = sfo_feb_flights, aes(x = arr_delay)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

sfo_feb_flights %>%
  summarise(
    mean_ad=mean(arr_delay),
    median_ad=median(arr_delay)
  )
## # A tibble: 1 x 2
##   mean_ad median_ad
##     <dbl>     <dbl>
## 1    4.60        -4

Exercise 4

ExpressJet Airlines Inc. is the most variable carrier with an IQR of 38. Endeavor Air Inc. is a close second with an IQR of 37.

sfo_feb_flights %>%
  group_by(carrier) %>%
  summarise(
    median_dd = median(dep_delay), 
    iqr_dd = IQR(dep_delay), 
    n_flights = n())
## # A tibble: 15 x 4
##    carrier median_dd iqr_dd n_flights
##  * <chr>       <dbl>  <dbl>     <int>
##  1 9E           -1    37          133
##  2 AA           -2    13.8        358
##  3 AS           -6     7            6
##  4 B6            0    19          481
##  5 DL           -3     6          504
##  6 EV            0    38          333
##  7 F9           -2.5   6.25         6
##  8 FL           -2    10           28
##  9 HA           11    13            2
## 10 MQ           -4    17          181
## 11 UA            0    14         1067
## 12 US           -5     7          136
## 13 VX           -1    13          237
## 14 WN            0     9           84
## 15 YV           -6    16.5          7

Exercise 5

Median: This differences in the median across the months aren’t very large. Pro: You can basically choose any month and travel to locations based on which is the best time to go. Con: There is a higher likelihood you get put on a flight with a large delay

Mean: Pro: Choosing the month you travel based on mean will reduce the likelihood you get put on a flight with a long departure delay. Con: You do not get to travel to places during their best months. For example cold locations in the summer.

nycflights %>%
  group_by(month) %>%
  summarise(mean_dd = mean(dep_delay), median_dd = median(dep_delay)) %>%
  arrange(desc(mean_dd))
## # A tibble: 12 x 3
##    month mean_dd median_dd
##    <int>   <dbl>     <dbl>
##  1     7   20.8          0
##  2     6   20.4          0
##  3    12   17.4          1
##  4     4   14.6         -2
##  5     3   13.5         -1
##  6     5   13.3         -1
##  7     8   12.6         -1
##  8     2   10.7         -2
##  9     1   10.2         -2
## 10     9    6.87        -3
## 11    11    6.10        -2
## 12    10    5.88        -3

Exercise 6

I would choose LGA airport. It has a 73% “on time” success rate.

nycflights <- nycflights %>%
  mutate(dep_type = ifelse(dep_delay < 5, "on time", "delayed"))

nycflights %>%
  group_by(origin) %>%
  summarise(ot_dep_rate = sum(dep_type == "on time") / n()) %>%
  arrange(desc(ot_dep_rate))
## # A tibble: 3 x 2
##   origin ot_dep_rate
##   <chr>        <dbl>
## 1 LGA          0.728
## 2 JFK          0.694
## 3 EWR          0.637

Exercise 7

Need to divide air_time by 60 to convert into hours.

nycflights <- nycflights %>%
  mutate(avg_speed = distance / (air_time/60) )

Exercise 8

Average Speed plateaus at distances over 1,000 miles.

ggplot(data=nycflights, aes(y=avg_speed, x=distance)) + geom_point()

Exercise 9

You can depart, roughly, around 50 minutes late and still arive on scehdule.

nycflights %>%
  filter(carrier=="AA" | carrier=="DL" | carrier=="UA") %>%
  ggplot(aes(y=arr_delay, x=dep_delay, color = carrier)) + geom_point()

LS0tCnRpdGxlOiAiTGFiIDE6IEludHJvIHRvIFIiCmF1dGhvcjogIkF1dGhvciBOYW1lIgpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiCm91dHB1dDogb3BlbmludHJvOjpsYWJfcmVwb3J0Ci0tLQoKYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkob3BlbmludHJvKQpsaWJyYXJ5KGdncGxvdDIpCmBgYAoKIyMjIEV4ZXJjaXNlIDEKCk9ic2VydmluZyB0aGUgMyBoaXN0b2dyYW1zIHdlIGNhbiBzZWUgd2l0aCBiaW4gc2l6ZSBzZXQgdG8gMzAgYW5kIDE1MCBub3QgbXVjaCBpcyByZXZlYWxlZC4gSG93ZXZlciwgd2hlbiB3ZSBzZXQgb3VyIGJpbiBzaXplIHRvIDE1IHRoZSBmaXJzdCBiaW4gZG9lcyBub3QgaGF2ZSB0aGUgbW9zdCBjb21tb24gZnJlcXVlbmN5LiBUaGUgbW9zdCBjb21tb24gZnJlcXVlbmN5IGlzIGFyb3VuZCAwLCB3aGlsZSB0aGVyZSBpcyBhIHNtYWxsZXIgc2V0IG9mIHRpbWVzIGxlc3MgdGhhbiAwLgoKYGBge3J9CmdncGxvdChkYXRhID0gbnljZmxpZ2h0cywgYWVzKHggPSBkZXBfZGVsYXkpKSArCiAgZ2VvbV9oaXN0b2dyYW0oKQpgYGAKYGBge3J9CmdncGxvdChkYXRhID0gbnljZmxpZ2h0cywgYWVzKHggPSBkZXBfZGVsYXkpKSArCiAgZ2VvbV9oaXN0b2dyYW0oYmlud2lkdGggPSAxNSkKYGBgCgpgYGB7cn0KZ2dwbG90KGRhdGEgPSBueWNmbGlnaHRzLCBhZXMoeCA9IGRlcF9kZWxheSkpICsKICBnZW9tX2hpc3RvZ3JhbShiaW53aWR0aCA9IDE1MCkKYGBgCiMjIyBFeGVyY2lzZSAyCgozLDU2MyBmbGlnaHRzIG1lZXQgdGhpcyBjcml0ZXJpYS4KCmBgYHtyfQpzZm9fZmViX2ZsaWdodHMgPC0gbnljZmxpZ2h0cyAlPiUKICBmaWx0ZXIoZGVzdCA9PSAiU0ZPIiB8IG1vbnRoID09IDIpCgpzZm9fZmViX2ZsaWdodHMgJT4lCiAgc3VtbWFyaXNlKGNvdW50ID0gbigpKQpgYGAKIyMjIEV4ZXJjaXNlIDMKCkludGVyZXN0aW5nbHkgdGhlIGRpc3RyaWJ1dGlvbiBwZWFrcyA8IDAuIEZvciBzdW1tYXJ5IHN0YXRpc3RpY3MgSSB3YW50IHRvIGxvb2sgYXQgdGhlIG1lZGlhbiBiZWNhdXNlIHRoZXJlIGlzIGEgbG9uZyB0YWlsIHdpdGggbGFyZ2VyIGFycml2YWwgZGVsYXlzLiBUaGUgbWVkaWFuIGNvbWVzIG91dCB0byBiZSAtNCB3aGljaCBpcyBhcm91bmQgd2hhdCBJIGV4cGVjdC4KCmBgYHtyfQpnZ3Bsb3QoZGF0YSA9IHNmb19mZWJfZmxpZ2h0cywgYWVzKHggPSBhcnJfZGVsYXkpKSArCiAgZ2VvbV9oaXN0b2dyYW0oKQpgYGAKYGBge3J9CnNmb19mZWJfZmxpZ2h0cyAlPiUKICBzdW1tYXJpc2UoCiAgICBtZWFuX2FkPW1lYW4oYXJyX2RlbGF5KSwKICAgIG1lZGlhbl9hZD1tZWRpYW4oYXJyX2RlbGF5KQogICkKYGBgCgojIyMgRXhlcmNpc2UgNAoKRXhwcmVzc0pldCBBaXJsaW5lcyBJbmMuIGlzIHRoZSBtb3N0IHZhcmlhYmxlIGNhcnJpZXIgd2l0aCBhbiBJUVIgb2YgMzguIEVuZGVhdm9yIEFpciBJbmMuIGlzIGEgY2xvc2Ugc2Vjb25kIHdpdGggYW4gSVFSIG9mIDM3LgoKYGBge3J9CnNmb19mZWJfZmxpZ2h0cyAlPiUKICBncm91cF9ieShjYXJyaWVyKSAlPiUKICBzdW1tYXJpc2UoCiAgICBtZWRpYW5fZGQgPSBtZWRpYW4oZGVwX2RlbGF5KSwgCiAgICBpcXJfZGQgPSBJUVIoZGVwX2RlbGF5KSwgCiAgICBuX2ZsaWdodHMgPSBuKCkpCmBgYAoKCiMjIyBFeGVyY2lzZSA1CgoqKk1lZGlhbjoqKgpUaGlzIGRpZmZlcmVuY2VzIGluIHRoZSBtZWRpYW4gYWNyb3NzIHRoZSBtb250aHMgYXJlbid0IHZlcnkgbGFyZ2UuIApQcm86IFlvdSBjYW4gYmFzaWNhbGx5IGNob29zZSBhbnkgbW9udGggYW5kIHRyYXZlbCB0byBsb2NhdGlvbnMgYmFzZWQgb24gd2hpY2ggaXMgdGhlIGJlc3QgdGltZSB0byBnby4KQ29uOiBUaGVyZSBpcyBhIGhpZ2hlciBsaWtlbGlob29kIHlvdSBnZXQgcHV0IG9uIGEgZmxpZ2h0IHdpdGggYSBsYXJnZSBkZWxheQoKKipNZWFuOioqClBybzogQ2hvb3NpbmcgdGhlIG1vbnRoIHlvdSB0cmF2ZWwgYmFzZWQgb24gbWVhbiB3aWxsIHJlZHVjZSB0aGUgbGlrZWxpaG9vZCB5b3UgZ2V0IHB1dCBvbiBhIGZsaWdodCB3aXRoIGEgbG9uZyBkZXBhcnR1cmUgZGVsYXkuCkNvbjogWW91IGRvIG5vdCBnZXQgdG8gdHJhdmVsIHRvIHBsYWNlcyBkdXJpbmcgdGhlaXIgYmVzdCBtb250aHMuIEZvciBleGFtcGxlIGNvbGQgbG9jYXRpb25zIGluIHRoZSBzdW1tZXIuCgpgYGB7ciBjb3VudC1jb21wYXJlfQpueWNmbGlnaHRzICU+JQogIGdyb3VwX2J5KG1vbnRoKSAlPiUKICBzdW1tYXJpc2UobWVhbl9kZCA9IG1lYW4oZGVwX2RlbGF5KSwgbWVkaWFuX2RkID0gbWVkaWFuKGRlcF9kZWxheSkpICU+JQogIGFycmFuZ2UoZGVzYyhtZWFuX2RkKSkKYGBgCgoKIyMjIEV4ZXJjaXNlIDYKCkkgd291bGQgY2hvb3NlIExHQSBhaXJwb3J0LiBJdCBoYXMgYSA3MyUgIm9uIHRpbWUiIHN1Y2Nlc3MgcmF0ZS4KCmBgYHtyfQpueWNmbGlnaHRzIDwtIG55Y2ZsaWdodHMgJT4lCiAgbXV0YXRlKGRlcF90eXBlID0gaWZlbHNlKGRlcF9kZWxheSA8IDUsICJvbiB0aW1lIiwgImRlbGF5ZWQiKSkKCm55Y2ZsaWdodHMgJT4lCiAgZ3JvdXBfYnkob3JpZ2luKSAlPiUKICBzdW1tYXJpc2Uob3RfZGVwX3JhdGUgPSBzdW0oZGVwX3R5cGUgPT0gIm9uIHRpbWUiKSAvIG4oKSkgJT4lCiAgYXJyYW5nZShkZXNjKG90X2RlcF9yYXRlKSkKYGBgCgoKIyMjIEV4ZXJjaXNlIDcKCk5lZWQgdG8gZGl2aWRlIGFpcl90aW1lIGJ5IDYwIHRvIGNvbnZlcnQgaW50byBob3Vycy4KCmBgYHtyfQpueWNmbGlnaHRzIDwtIG55Y2ZsaWdodHMgJT4lCiAgbXV0YXRlKGF2Z19zcGVlZCA9IGRpc3RhbmNlIC8gKGFpcl90aW1lLzYwKSApCmBgYAoKIyMjIEV4ZXJjaXNlIDgKCkF2ZXJhZ2UgU3BlZWQgcGxhdGVhdXMgYXQgZGlzdGFuY2VzIG92ZXIgMSwwMDAgbWlsZXMuCgpgYGB7cn0KZ2dwbG90KGRhdGE9bnljZmxpZ2h0cywgYWVzKHk9YXZnX3NwZWVkLCB4PWRpc3RhbmNlKSkgKyBnZW9tX3BvaW50KCkKYGBgCgoKIyMjIEV4ZXJjaXNlIDkKCllvdSBjYW4gZGVwYXJ0LCByb3VnaGx5LCBhcm91bmQgNTAgbWludXRlcyBsYXRlIGFuZCBzdGlsbCBhcml2ZSBvbiBzY2VoZHVsZS4KCmBgYHtyfQpueWNmbGlnaHRzICU+JQogIGZpbHRlcihjYXJyaWVyPT0iQUEiIHwgY2Fycmllcj09IkRMIiB8IGNhcnJpZXI9PSJVQSIpICU+JQogIGdncGxvdChhZXMoeT1hcnJfZGVsYXksIHg9ZGVwX2RlbGF5LCBjb2xvciA9IGNhcnJpZXIpKSArIGdlb21fcG9pbnQoKQpgYGA=