library(tidyverse)
library(openintro)
library(mosaic)
library(ggformula)
library(nycflights23)
data(flights) 
NYCflights<-flights%>%drop_na()
# dropped 12,534 after this
names(NYCflights)
##  [1] "year"           "month"          "day"            "dep_time"      
##  [5] "sched_dep_time" "dep_delay"      "arr_time"       "sched_arr_time"
##  [9] "arr_delay"      "carrier"        "flight"         "tailnum"       
## [13] "origin"         "dest"           "air_time"       "distance"      
## [17] "hour"           "minute"         "time_hour"
?flights
glimpse(NYCflights)
## Rows: 422,818
## Columns: 19
## $ year           <int> 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2…
## $ month          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ day            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ dep_time       <int> 1, 18, 31, 33, 36, 503, 520, 524, 537, 547, 549, 551, 5…
## $ sched_dep_time <int> 2038, 2300, 2344, 2140, 2048, 500, 510, 530, 520, 545, …
## $ dep_delay      <dbl> 203, 78, 47, 173, 228, 3, 10, -6, 17, 2, -10, -9, -7, -…
## $ arr_time       <int> 328, 228, 500, 238, 223, 808, 948, 645, 926, 845, 905, …
## $ sched_arr_time <int> 3, 135, 426, 2352, 2252, 815, 949, 710, 818, 852, 901, …
## $ arr_delay      <dbl> 205, 53, 34, 166, 211, -7, -1, -25, 68, -7, 4, -13, -14…
## $ carrier        <chr> "UA", "DL", "B6", "B6", "UA", "AA", "B6", "AA", "UA", "…
## $ flight         <int> 628, 393, 371, 1053, 219, 499, 996, 981, 206, 225, 800,…
## $ tailnum        <chr> "N25201", "N830DN", "N807JB", "N265JB", "N17730", "N925…
## $ origin         <chr> "EWR", "JFK", "JFK", "JFK", "EWR", "EWR", "JFK", "EWR",…
## $ dest           <chr> "SMF", "ATL", "BQN", "CHS", "DTW", "MIA", "BQN", "ORD",…
## $ air_time       <dbl> 367, 108, 190, 108, 80, 154, 192, 119, 258, 157, 164, 1…
## $ distance       <dbl> 2500, 760, 1576, 636, 488, 1085, 1576, 719, 1400, 1065,…
## $ hour           <dbl> 20, 23, 23, 21, 20, 5, 5, 5, 5, 5, 5, 6, 5, 6, 6, 6, 6,…
## $ minute         <dbl> 38, 0, 44, 40, 48, 0, 10, 30, 20, 45, 59, 0, 59, 0, 0, …
## $ time_hour      <dttm> 2023-01-01 20:00:00, 2023-01-01 23:00:00, 2023-01-01 2…

Exercise 1

How delayed were the flights to your chosen destination?

Answer: Over half of the flight leaving from NYC to BOS departed early as shown by the overall median being -3. The mean of this was larger at 13.99152 because there were outliers skewed heavily to the right. In total of all planes leaving from NYC and heading to BOS 32.1% of flights were delayed by more than 10 minutes. Depending on the airport some had a larger percentage than others. LGA has a 27.3% rate of flights being delayed by more than 10 minutes while JFK was 36.2% and EWR was 35.0%.

In order to answer this question we first filtered the data set to only show the flights departing from New York to Boston, using the Filter() command. Then a histogram was created with the departure delay variable along the x axis in order to visually see the trends related to how many flights were delayed and by how much. It was observed that there is a right skew in the data and that the data is unimodal, with the peak being right before 0. In order to further analysis the data, the summarise() and group_by() commands were used to split the data into 3 groups, based on the 3 airports in New York, and to display the mean, median, and number of data points for each airport. Then using the group_by() Summarise() and arrange() commands, the data was put into groups based on which airport the flight departed from, the data was analyzed to find the percentage of flights from each airport were delayed by more than 10 minutes, and to arrange these numbers in descending order. Finally, in order to answer the question, the data was grouped into only flights that were going from New York to Boston, and a similar code sequence was used to find the percent that had a delay of 10 minutes or more.

bos_flights <- NYCflights %>%
  filter(dest == "BOS")

gf_histogram(~dep_delay,data=bos_flights,binwidth=5,color="black")

bos_flights %>%
  group_by(origin)%>%
  summarise(mean_dd   = mean(dep_delay), 
            median_dd = median(dep_delay), 
            n         = n())
## # A tibble: 3 × 4
##   origin mean_dd median_dd     n
##   <chr>    <dbl>     <dbl> <int>
## 1 EWR       13.7        -2  4219
## 2 JFK       17.9        -2  6228
## 3 LGA       11.1        -4  7940
bos_flights %>%
  group_by(origin) %>%
  summarise(ot_dep_rate = sum(dep_delay > "10") / n()) %>%
  arrange(desc(ot_dep_rate))
## # A tibble: 3 × 2
##   origin ot_dep_rate
##   <chr>        <dbl>
## 1 JFK          0.362
## 2 EWR          0.350
## 3 LGA          0.273
bos_flights %>%
  group_by(dest) %>%
  summarise(ot_dep_rate = sum(dep_delay > "10") / n()) %>%
  arrange(desc(ot_dep_rate))
## # A tibble: 1 × 2
##   dest  ot_dep_rate
##   <chr>       <dbl>
## 1 BOS         0.321

Exercise 2

Replicate the plot

Answer: Near the 100 minute departure delay mark was when the planes stopped having on time arrivals. It still wouldn’t be 100% guaranteed that their flights would arrive on time but there are previous flights that did arrive on time. According to the graph it was very likely that if the flight departed early then it would most likely arrive early. This likelihood diminished as the planes would depart later.

In order to create the scatter plot it was first necessary to filter out the carriers we did not need. By using the filter() command we were able to do this and only display data from the three selected carriers. Then using gf_point, the appropriate information was input to create a scatter plot and further code was used to set a graph title, axes titles, and color code the graph. Now a scatter plot was created with departure delay in minutes on the X axis and arrival delay in minutes on the Y axis. This graph shows a strong positive correlation between the variables arrival delay and departure delay with delta airlines seeming to have the longest departure and arrival delays.

NYCscatter <- NYCflights %>%
    filter(carrier %in% c("B6", "DL", "WN"))
gf_point(arr_delay ~ dep_delay, data = NYCscatter, color = ~carrier) %>%
   gf_labs(               title = "How Departure Delay Impacts Arrival Delay in Minutes",
              x = "Departure Delay ",
               y = "Arrival Delay ",
              color = "Carrier"               )

Exercise 3

What was the longest and shortest airtime that it took a plane to go from NYC to Boston?

Answer: The quickest flight was 29 minutes while the longest flight was 79 minutes. On average the flights had around 40 minutes of airtime.

In order to answer this question it was necessary to investigate the variable air_time for Boston flights. A histogram was created, which created a visual of the data and showed that the data has a right skew and that a majority of the flights from New York to Boston fall into the category of around 35 minutes of air time with the shortest falling around 30 minutes and the longest being around 80 minutes. Finally, in order to get a more precise answer the favstats() command was used to find that minimum and maximum values.

gf_histogram(~air_time,data=bos_flights,binwidth=1,color="black")

favstats(~air_time, data=bos_flights)
##  min Q1 median Q3 max     mean       sd     n missing
##   29 36     39 43  79 39.68875 5.204413 18387       0
LS0tCnRpdGxlOiAiTGFiIDEiCmF1dGhvcjogIkxhdXJhIGFuZCBNYWtlbm5hIgpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiCm91dHB1dDogb3BlbmludHJvOjpsYWJfcmVwb3J0Ci0tLQoKYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkob3BlbmludHJvKQpsaWJyYXJ5KG1vc2FpYykKbGlicmFyeShnZ2Zvcm11bGEpCmxpYnJhcnkobnljZmxpZ2h0czIzKQpkYXRhKGZsaWdodHMpIApOWUNmbGlnaHRzPC1mbGlnaHRzJT4lZHJvcF9uYSgpCiMgZHJvcHBlZCAxMiw1MzQgYWZ0ZXIgdGhpcwpuYW1lcyhOWUNmbGlnaHRzKQo/ZmxpZ2h0cwpnbGltcHNlKE5ZQ2ZsaWdodHMpCmBgYAoKIyMjIEV4ZXJjaXNlIDEKIyMjIyBIb3cgZGVsYXllZCB3ZXJlIHRoZSBmbGlnaHRzIHRvIHlvdXIgY2hvc2VuIGRlc3RpbmF0aW9uPwoKCgpBbnN3ZXI6IE92ZXIgaGFsZiBvZiB0aGUgZmxpZ2h0IGxlYXZpbmcgZnJvbSBOWUMgdG8gQk9TIGRlcGFydGVkIGVhcmx5IGFzIHNob3duIGJ5IHRoZSBvdmVyYWxsIG1lZGlhbiBiZWluZyAtMy4gVGhlIG1lYW4gb2YgdGhpcyB3YXMgbGFyZ2VyIGF0IDEzLjk5MTUyIGJlY2F1c2UgdGhlcmUgd2VyZSBvdXRsaWVycyBza2V3ZWQgaGVhdmlseSB0byB0aGUgcmlnaHQuIEluIHRvdGFsIG9mIGFsbCBwbGFuZXMgbGVhdmluZyBmcm9tIE5ZQyBhbmQgaGVhZGluZyB0byBCT1MgMzIuMSUgb2YgZmxpZ2h0cyB3ZXJlIGRlbGF5ZWQgYnkgbW9yZSB0aGFuIDEwIG1pbnV0ZXMuIERlcGVuZGluZyBvbiB0aGUgYWlycG9ydCBzb21lIGhhZCBhIGxhcmdlciBwZXJjZW50YWdlIHRoYW4gb3RoZXJzLiBMR0EgaGFzIGEgMjcuMyUgcmF0ZSBvZiBmbGlnaHRzIGJlaW5nIGRlbGF5ZWQgYnkgbW9yZSB0aGFuIDEwIG1pbnV0ZXMgd2hpbGUgSkZLIHdhcyAzNi4yJSBhbmQgRVdSIHdhcyAzNS4wJS4gCgoKSW4gb3JkZXIgdG8gYW5zd2VyIHRoaXMgcXVlc3Rpb24gd2UgZmlyc3QgZmlsdGVyZWQgdGhlIGRhdGEgc2V0IHRvIG9ubHkgc2hvdyB0aGUgZmxpZ2h0cyBkZXBhcnRpbmcgZnJvbSBOZXcgWW9yayB0byBCb3N0b24sIHVzaW5nIHRoZSBGaWx0ZXIoKSBjb21tYW5kLiBUaGVuIGEgaGlzdG9ncmFtIHdhcyBjcmVhdGVkIHdpdGggdGhlIGRlcGFydHVyZSBkZWxheSB2YXJpYWJsZSBhbG9uZyB0aGUgeCBheGlzIGluIG9yZGVyIHRvIHZpc3VhbGx5IHNlZSB0aGUgdHJlbmRzIHJlbGF0ZWQgdG8gaG93IG1hbnkgZmxpZ2h0cyB3ZXJlIGRlbGF5ZWQgYW5kIGJ5IGhvdyBtdWNoLiBJdCB3YXMgb2JzZXJ2ZWQgdGhhdCB0aGVyZSBpcyBhIHJpZ2h0IHNrZXcgaW4gdGhlIGRhdGEgYW5kIHRoYXQgdGhlIGRhdGEgaXMgdW5pbW9kYWwsIHdpdGggdGhlIHBlYWsgYmVpbmcgcmlnaHQgYmVmb3JlIDAuIEluIG9yZGVyIHRvIGZ1cnRoZXIgYW5hbHlzaXMgdGhlIGRhdGEsIHRoZSBzdW1tYXJpc2UoKSBhbmQgZ3JvdXBfYnkoKSBjb21tYW5kcyB3ZXJlIHVzZWQgdG8gc3BsaXQgdGhlIGRhdGEgaW50byAzIGdyb3VwcywgYmFzZWQgb24gdGhlIDMgYWlycG9ydHMgaW4gTmV3IFlvcmssIGFuZCB0byBkaXNwbGF5IHRoZSBtZWFuLCBtZWRpYW4sIGFuZCBudW1iZXIgb2YgZGF0YSBwb2ludHMgZm9yIGVhY2ggYWlycG9ydC4gVGhlbiB1c2luZyB0aGUgZ3JvdXBfYnkoKSBTdW1tYXJpc2UoKSBhbmQgYXJyYW5nZSgpIGNvbW1hbmRzLCB0aGUgZGF0YSB3YXMgcHV0IGludG8gZ3JvdXBzIGJhc2VkIG9uIHdoaWNoIGFpcnBvcnQgdGhlIGZsaWdodCBkZXBhcnRlZCBmcm9tLCB0aGUgZGF0YSB3YXMgYW5hbHl6ZWQgdG8gZmluZCB0aGUgcGVyY2VudGFnZSBvZiBmbGlnaHRzIGZyb20gZWFjaCBhaXJwb3J0IHdlcmUgZGVsYXllZCBieSBtb3JlIHRoYW4gMTAgbWludXRlcywgYW5kIHRvIGFycmFuZ2UgdGhlc2UgbnVtYmVycyBpbiBkZXNjZW5kaW5nIG9yZGVyLiBGaW5hbGx5LCBpbiBvcmRlciB0byBhbnN3ZXIgdGhlIHF1ZXN0aW9uLCB0aGUgZGF0YSB3YXMgZ3JvdXBlZCBpbnRvIG9ubHkgZmxpZ2h0cyB0aGF0IHdlcmUgZ29pbmcgZnJvbSBOZXcgWW9yayB0byBCb3N0b24sIGFuZCBhIHNpbWlsYXIgY29kZSBzZXF1ZW5jZSB3YXMgdXNlZCB0byBmaW5kIHRoZSBwZXJjZW50IHRoYXQgaGFkIGEgZGVsYXkgb2YgMTAgbWludXRlcyBvciBtb3JlLiAKCgpgYGB7ciBjb2RlLWNodW5rLWxhYmVsMX0KYm9zX2ZsaWdodHMgPC0gTllDZmxpZ2h0cyAlPiUKICBmaWx0ZXIoZGVzdCA9PSAiQk9TIikKCmdmX2hpc3RvZ3JhbSh+ZGVwX2RlbGF5LGRhdGE9Ym9zX2ZsaWdodHMsYmlud2lkdGg9NSxjb2xvcj0iYmxhY2siKQoKCmJvc19mbGlnaHRzICU+JQogIGdyb3VwX2J5KG9yaWdpbiklPiUKICBzdW1tYXJpc2UobWVhbl9kZCAgID0gbWVhbihkZXBfZGVsYXkpLCAKICAgICAgICAgICAgbWVkaWFuX2RkID0gbWVkaWFuKGRlcF9kZWxheSksIAogICAgICAgICAgICBuICAgICAgICAgPSBuKCkpCgoKCmJvc19mbGlnaHRzICU+JQogIGdyb3VwX2J5KG9yaWdpbikgJT4lCiAgc3VtbWFyaXNlKG90X2RlcF9yYXRlID0gc3VtKGRlcF9kZWxheSA+ICIxMCIpIC8gbigpKSAlPiUKICBhcnJhbmdlKGRlc2Mob3RfZGVwX3JhdGUpKQoKYm9zX2ZsaWdodHMgJT4lCiAgZ3JvdXBfYnkoZGVzdCkgJT4lCiAgc3VtbWFyaXNlKG90X2RlcF9yYXRlID0gc3VtKGRlcF9kZWxheSA+ICIxMCIpIC8gbigpKSAlPiUKICBhcnJhbmdlKGRlc2Mob3RfZGVwX3JhdGUpKQpgYGAKCiMjIyBFeGVyY2lzZSAyCiMjIyMgUmVwbGljYXRlIHRoZSBwbG90CgoKQW5zd2VyOiBOZWFyIHRoZSAxMDAgbWludXRlIGRlcGFydHVyZSBkZWxheSBtYXJrIHdhcyB3aGVuIHRoZSBwbGFuZXMgc3RvcHBlZCBoYXZpbmcgb24gdGltZSBhcnJpdmFscy4gSXQgc3RpbGwgd291bGRuJ3QgYmUgMTAwJSBndWFyYW50ZWVkIHRoYXQgdGhlaXIgZmxpZ2h0cyB3b3VsZCBhcnJpdmUgb24gdGltZSBidXQgdGhlcmUgYXJlIHByZXZpb3VzIGZsaWdodHMgdGhhdCBkaWQgYXJyaXZlIG9uIHRpbWUuIEFjY29yZGluZyB0byB0aGUgZ3JhcGggaXQgd2FzIHZlcnkgbGlrZWx5IHRoYXQgaWYgdGhlIGZsaWdodCBkZXBhcnRlZCBlYXJseSB0aGVuIGl0IHdvdWxkIG1vc3QgbGlrZWx5IGFycml2ZSBlYXJseS4gVGhpcyBsaWtlbGlob29kIGRpbWluaXNoZWQgYXMgdGhlIHBsYW5lcyB3b3VsZCBkZXBhcnQgbGF0ZXIuIAoKSW4gb3JkZXIgdG8gY3JlYXRlIHRoZSBzY2F0dGVyIHBsb3QgaXQgd2FzIGZpcnN0IG5lY2Vzc2FyeSB0byBmaWx0ZXIgb3V0IHRoZSBjYXJyaWVycyB3ZSBkaWQgbm90IG5lZWQuIEJ5IHVzaW5nIHRoZSBmaWx0ZXIoKSBjb21tYW5kIHdlIHdlcmUgYWJsZSB0byBkbyB0aGlzIGFuZCBvbmx5IGRpc3BsYXkgZGF0YSBmcm9tIHRoZSB0aHJlZSBzZWxlY3RlZCBjYXJyaWVycy4gVGhlbiB1c2luZyBnZl9wb2ludCwgdGhlIGFwcHJvcHJpYXRlIGluZm9ybWF0aW9uIHdhcyBpbnB1dCB0byBjcmVhdGUgYSBzY2F0dGVyIHBsb3QgYW5kIGZ1cnRoZXIgY29kZSB3YXMgdXNlZCB0byBzZXQgYSBncmFwaCB0aXRsZSwgYXhlcyB0aXRsZXMsIGFuZCBjb2xvciBjb2RlIHRoZSBncmFwaC4gTm93IGEgc2NhdHRlciBwbG90IHdhcyBjcmVhdGVkIHdpdGggZGVwYXJ0dXJlIGRlbGF5IGluIG1pbnV0ZXMgb24gdGhlIFggYXhpcyBhbmQgYXJyaXZhbCBkZWxheSBpbiBtaW51dGVzIG9uIHRoZSBZIGF4aXMuIFRoaXMgZ3JhcGggc2hvd3MgYSBzdHJvbmcgcG9zaXRpdmUgY29ycmVsYXRpb24gYmV0d2VlbiB0aGUgdmFyaWFibGVzIGFycml2YWwgZGVsYXkgYW5kIGRlcGFydHVyZSBkZWxheSB3aXRoIGRlbHRhIGFpcmxpbmVzIHNlZW1pbmcgdG8gaGF2ZSB0aGUgbG9uZ2VzdCBkZXBhcnR1cmUgYW5kIGFycml2YWwgZGVsYXlzLgoKCgpgYGB7ciBjb2RlLWNodW5rLWxhYmVsMn0KTllDc2NhdHRlciA8LSBOWUNmbGlnaHRzICU+JQogICAgZmlsdGVyKGNhcnJpZXIgJWluJSBjKCJCNiIsICJETCIsICJXTiIpKQpnZl9wb2ludChhcnJfZGVsYXkgfiBkZXBfZGVsYXksIGRhdGEgPSBOWUNzY2F0dGVyLCBjb2xvciA9IH5jYXJyaWVyKSAlPiUKICAgZ2ZfbGFicyggICAgICAgICAgICAgICB0aXRsZSA9ICJIb3cgRGVwYXJ0dXJlIERlbGF5IEltcGFjdHMgQXJyaXZhbCBEZWxheSBpbiBNaW51dGVzIiwKICAgICAgICAgICAgICB4ID0gIkRlcGFydHVyZSBEZWxheSAiLAogICAgICAgICAgICAgICB5ID0gIkFycml2YWwgRGVsYXkgIiwKICAgICAgICAgICAgICBjb2xvciA9ICJDYXJyaWVyIiAgICAgICAgICAgICAgICkKYGBgCgojIyMgRXhlcmNpc2UgMwoKIyMjIyBXaGF0IHdhcyB0aGUgbG9uZ2VzdCBhbmQgc2hvcnRlc3QgYWlydGltZSB0aGF0IGl0IHRvb2sgYSBwbGFuZSB0byBnbyBmcm9tIE5ZQyB0byBCb3N0b24/CgoKQW5zd2VyOiBUaGUgcXVpY2tlc3QgZmxpZ2h0IHdhcyAyOSBtaW51dGVzIHdoaWxlIHRoZSBsb25nZXN0IGZsaWdodCB3YXMgNzkgbWludXRlcy4gT24gYXZlcmFnZSB0aGUgZmxpZ2h0cyBoYWQgYXJvdW5kIDQwIG1pbnV0ZXMgb2YgYWlydGltZS4gCgpJbiBvcmRlciB0byBhbnN3ZXIgdGhpcyBxdWVzdGlvbiBpdCB3YXMgbmVjZXNzYXJ5IHRvIGludmVzdGlnYXRlIHRoZSB2YXJpYWJsZSBhaXJfdGltZSBmb3IgQm9zdG9uIGZsaWdodHMuIEEgaGlzdG9ncmFtIHdhcyBjcmVhdGVkLCB3aGljaCBjcmVhdGVkIGEgdmlzdWFsIG9mIHRoZSBkYXRhIGFuZCBzaG93ZWQgdGhhdCB0aGUgZGF0YSBoYXMgYSByaWdodCBza2V3IGFuZCB0aGF0IGEgbWFqb3JpdHkgb2YgdGhlIGZsaWdodHMgZnJvbSBOZXcgWW9yayB0byBCb3N0b24gZmFsbCBpbnRvIHRoZSBjYXRlZ29yeSBvZiBhcm91bmQgMzUgbWludXRlcyBvZiBhaXIgdGltZSB3aXRoIHRoZSBzaG9ydGVzdCBmYWxsaW5nIGFyb3VuZCAzMCBtaW51dGVzIGFuZCB0aGUgbG9uZ2VzdCBiZWluZyBhcm91bmQgODAgbWludXRlcy4gRmluYWxseSwgaW4gb3JkZXIgdG8gZ2V0IGEgbW9yZSBwcmVjaXNlIGFuc3dlciB0aGUgZmF2c3RhdHMoKSAgY29tbWFuZCB3YXMgdXNlZCB0byBmaW5kIHRoYXQgbWluaW11bSBhbmQgbWF4aW11bSB2YWx1ZXMuIAoKCmBgYHtyIGNvZGUtY2h1bmstbGFiZWwzfQpnZl9oaXN0b2dyYW0ofmFpcl90aW1lLGRhdGE9Ym9zX2ZsaWdodHMsYmlud2lkdGg9MSxjb2xvcj0iYmxhY2siKQpmYXZzdGF0cyh+YWlyX3RpbWUsIGRhdGE9Ym9zX2ZsaWdodHMpCgoKYGBgCg==