This homework assignment uses the flights dataset from
the nycflights13 package, which contains real-world data on
over 336,000 flights departing from New York City airports (JFK, LGA,
EWR) in 2013. The dataset includes variables such as departure and
arrival times (with date components), airline carrier (categorical),
origin and destination airports (categorical), delays (with missing
values for cancelled flights), distance, and more. It is sourced from
the US Bureau of Transportation Statistics.
This assignment reinforces the Week 4 topics:
lubridate.zoo.All questions (except the final reflection) require you to write and run R code to solve them. Submit your URL for your RPubs. Make sure to comment your code, along with key outputs (e.g., summaries, plots, or tables). Use the provided setup code to load the data.
Install and load the necessary packages if not already done:
#install.packages(c("nycflights13", "dplyr", "lubridate", "zoo", "forcats")) # If needed
library(nycflights13)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(forcats) # For factor recoding; base R alternatives are acceptable
data(flights) # Load the dataset
Explore the data briefly with str(flights) and
head(flights) to understand the structure. Note: Dates are
in separate year, month, day
columns; times are in dep_time and arr_time
(as integers like 517 for 5:17 AM).
#Explore your data here
str(flights)
## tibble [336,776 Ă— 19] (S3: tbl_df/tbl/data.frame)
## $ year : int [1:336776] 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
## $ month : int [1:336776] 1 1 1 1 1 1 1 1 1 1 ...
## $ day : int [1:336776] 1 1 1 1 1 1 1 1 1 1 ...
## $ dep_time : int [1:336776] 517 533 542 544 554 554 555 557 557 558 ...
## $ sched_dep_time: int [1:336776] 515 529 540 545 600 558 600 600 600 600 ...
## $ dep_delay : num [1:336776] 2 4 2 -1 -6 -4 -5 -3 -3 -2 ...
## $ arr_time : int [1:336776] 830 850 923 1004 812 740 913 709 838 753 ...
## $ sched_arr_time: int [1:336776] 819 830 850 1022 837 728 854 723 846 745 ...
## $ arr_delay : num [1:336776] 11 20 33 -18 -25 12 19 -14 -8 8 ...
## $ carrier : chr [1:336776] "UA" "UA" "AA" "B6" ...
## $ flight : int [1:336776] 1545 1714 1141 725 461 1696 507 5708 79 301 ...
## $ tailnum : chr [1:336776] "N14228" "N24211" "N619AA" "N804JB" ...
## $ origin : chr [1:336776] "EWR" "LGA" "JFK" "JFK" ...
## $ dest : chr [1:336776] "IAH" "IAH" "MIA" "BQN" ...
## $ air_time : num [1:336776] 227 227 160 183 116 150 158 53 140 138 ...
## $ distance : num [1:336776] 1400 1416 1089 1576 762 ...
## $ hour : num [1:336776] 5 5 5 5 6 5 6 6 6 6 ...
## $ minute : num [1:336776] 15 29 40 45 0 58 0 0 0 0 ...
## $ time_hour : POSIXct[1:336776], format: "2013-01-01 05:00:00" "2013-01-01 05:00:00" ...
head(flights)
## # A tibble: 6 Ă— 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
lubridateCreate a column dep_datetime by combining year, month, day, and
dep_time into a POSIXct datetime using lubridate. (Hint: Use
make_datetime function to combine: year, month, day, for
hour and min use division, e.g., hour = dep_time %/% 100, minute =
dep_time %% 100.)
Show the first 5 rows of flights with dep_datetime.
Output: First 5 rows showing year, month, day, dep_time, and dep_datetime.
dep_datetime <- flights %>% mutate(dep_datetime = make_datetime(year = year, month = month,day = day,hour = dep_time %/% 100, min = dep_time %% 100))
flights
## # A tibble: 336,776 Ă— 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## 7 2013 1 1 555 600 -5 913 854
## 8 2013 1 1 557 600 -3 709 723
## 9 2013 1 1 557 600 -3 838 846
## 10 2013 1 1 558 600 -2 753 745
## # ℹ 336,766 more rows
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
head(dep_datetime %>% select(year, month, day, dep_time, dep_datetime))
## # A tibble: 6 Ă— 5
## year month day dep_time dep_datetime
## <int> <int> <int> <int> <dttm>
## 1 2013 1 1 517 2013-01-01 05:17:00
## 2 2013 1 1 533 2013-01-01 05:33:00
## 3 2013 1 1 542 2013-01-01 05:42:00
## 4 2013 1 1 544 2013-01-01 05:44:00
## 5 2013 1 1 554 2013-01-01 05:54:00
## 6 2013 1 1 554 2013-01-01 05:54:00
lubridateUsing dep_datetime from Question 1, create a column weekday with the day of the week (e.g., “Mon”) using wday(dep_datetime, label = TRUE). Use table() to show how many flights occur on each weekday.
Output: The table of flight counts by weekday.
dep_datetime_2 <- dep_datetime %>%
mutate(weekday = wday(dep_datetime, label = TRUE))
table(dep_datetime_2$weekday)
##
## Sun Mon Tue Wed Thu Fri Sat
## 45643 49468 49273 48858 48654 48703 37922
Filter for flights from JFK (origin == “JFK”) and create a zoo time series of departure delays (dep_delay) by dep_datetime. Plot the time series (use plot()). (Hint: Use a subset to avoid memory issues, e.g., first 1000 JFK flights.)
Output: The time series plot.
library(zoo)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” ggplot2 4.0.2 âś” stringr 1.6.0
## âś” purrr 1.2.1 âś” tibble 3.3.1
## âś” readr 2.2.0 âś” tidyr 1.3.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
jfk_zoo <- flights %>% filter(origin == "JFK") %>% slice_head(n = 1000)
jfk_zoo
## # A tibble: 1,000 Ă— 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 542 540 2 923 850
## 2 2013 1 1 544 545 -1 1004 1022
## 3 2013 1 1 557 600 -3 838 846
## 4 2013 1 1 558 600 -2 849 851
## 5 2013 1 1 558 600 -2 853 856
## 6 2013 1 1 558 600 -2 924 917
## 7 2013 1 1 559 559 0 702 706
## 8 2013 1 1 606 610 -4 837 845
## 9 2013 1 1 611 600 11 945 931
## 10 2013 1 1 613 610 3 925 921
## # ℹ 990 more rows
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
jfk_time_series <- zoo(jfk_zoo$dep_delay, order.by = jfk_zoo$dep_time)
## Warning in zoo(jfk_zoo$dep_delay, order.by = jfk_zoo$dep_time): some methods
## for "zoo" objects do not work if the index entries in 'order.by' are not unique
jfk_time_series
## 25 32 42 50 106 126 235 535 535 539 542 543 544 546 550 553
## 26 33 43 185 141 156 156 -5 -5 -6 2 -2 -1 6 -10 -7
## 553 553 554 554 554 555 556 556 557 557 557 557 558 558 558 558
## -7 -7 -6 -6 -6 -5 -4 -4 -3 -3 -3 -3 -2 -2 -2 -2
## 559 559 600 604 604 606 606 606 606 608 610 611 611 612 612 613
## 0 -1 0 4 -6 -4 -4 -4 7 23 -5 11 -4 -3 -3 3
## 614 615 615 616 616 617 618 622 627 627 628 628 628 630 637 637
## -1 0 0 16 17 2 3 -8 -3 -3 -2 -2 -2 20 7 -3
## 639 640 641 641 642 643 645 649 650 651 652 653 653 653 654 655
## -1 0 -6 1 -5 -4 -2 -6 -5 -4 -3 -2 -7 -2 -6 0
## 655 655 655 655 655 655 656 656 656 657 658 658 658 658 659 659
## -5 -4 0 -5 -4 0 -3 -4 -4 -2 -2 -2 3 -2 -1 -1
## 700 701 702 704 704 705 707 707 708 709 710 711 712 712 712 712
## 0 1 2 9 9 35 -8 -8 -7 9 -5 -4 -3 -3 -3 -3
## 714 715 715 715 716 716 719 719 720 721 721 721 724 727 728 728
## -1 -6 0 15 1 16 -2 19 -1 21 0 6 -6 -3 -2 -2
## 729 729 729 730 730 732 732 732 732 733 733 733 733 733 734 735
## -1 -1 -6 15 0 -3 2 -4 -4 -3 -2 -4 -3 -2 -3 -2
## 736 737 739 742 742 743 743 743 743 744 745 745 746 747 747 751
## -4 -3 -6 -3 5 13 -6 13 -2 -5 0 -5 1 -2 -3 -9
## 755 756 757 757 758 758 759 759 759 759 759 800 800 801 801 803
## -5 -4 -3 -3 -1 -2 -1 -6 -1 -1 0 -10 -10 -9 -9 -7
## 803 804 804 805 805 805 805 806 806 807 807 807 807 807 808 808
## 3 -6 15 0 5 0 -5 -4 -4 -3 -3 -3 7 -8 -2 -2
## 808 808 809 809 810 810 811 811 811 811 811 813 813 814 815 815
## -2 -7 -1 -6 0 0 -4 1 -4 -4 1 3 8 -6 -10 0
## 817 817 817 818 818 819 819 819 819 820 820 820 820 820 820 821
## 7 -3 7 13 -2 19 -6 -1 -1 0 -10 0 -7 0 -5 1
## 821 821 822 823 824 825 825 825 826 826 826 827 828 828 828 828
## -4 -4 -8 -2 -1 -2 -4 5 71 -4 -4 7 -2 -7 -2 -2
## 829 829 830 830 836 837 837 837 837 837 839 840 840 841 843 844
## -1 -1 -5 20 -4 -8 -8 -3 10 17 39 -5 -5 -4 -2 -1
## 845 846 847 848 853 853 855 855 855 856 857 857 858 858 859 859
## 35 26 2 853 8 -2 -5 -5 -5 -4 -3 -3 -2 -2 -1 39
## 859 901 901 903 904 905 905 907 908 909 909 909 910 912 912 913
## -1 -3 -4 3 -6 0 0 3 -2 59 9 -6 0 12 -7 8
## 914 916 916 917 917 917 920 921 923 925 926 926 926 928 928 932
## 14 -3 71 2 -3 -3 15 21 4 5 4 -4 -4 23 63 2
## 933 933 933 934 936 940 941 941 941 947 948 949 952 953 955 955
## -4 29 -4 -3 51 18 -4 -4 19 -6 -11 -6 -1 -2 -5 -4
## 957 1000 1003 1003 1004 1009 1010 1010 1010 1011 1015 1018 1020 1024 1024 1026
## -3 -5 4 103 19 16 -5 -5 -5 -4 0 48 21 -1 -6 -4
## 1026 1026 1028 1029 1031 1036 1036 1038 1042 1046 1047 1055 1056 1057 1058 1058
## -4 -4 -2 -1 1 -4 -4 8 2 -4 -3 100 -4 -3 -2 -2
## 1059 1100 1100 1104 1109 1111 1112 1122 1123 1124 1124 1125 1125 1127 1127 1127
## -1 0 0 4 -6 -4 12 -8 -2 24 -6 -5 -5 -3 -2 -3
## 1128 1131 1133 1133 1134 1144 1148 1152 1153 1153 1153 1153 1153 1155 1155 1156
## -1 1 4 3 11 29 -7 -8 30 -7 -7 -7 -7 -5 -5 -2
## 1156 1156 1159 1200 1201 1201 1202 1203 1204 1208 1213 1216 1220 1222 1225 1228
## -4 -2 -1 0 1 38 2 3 4 10 -7 -4 -9 -7 0 -7
## 1230 1230 1233 1234 1235 1237 1240 1240 1240 1245 1245 1245 1246 1249 1250 1250
## -5 -5 33 9 -5 -8 5 -5 -5 0 -4 -4 21 -9 5 -10
## 1251 1254 1255 1256 1257 1257 1300 1301 1302 1304 1306 1309 1310 1313 1324 1325
## -1 -5 0 4 -1 -2 0 2 3 5 26 40 10 15 -6 -5
## 1326 1326 1327 1328 1331 1333 1333 1335 1337 1339 1340 1341 1341 1345 1350 1351
## -4 -4 -3 -2 7 -2 34 5 77 4 -5 -4 -4 65 -5 -4
## 1352 1353 1353 1354 1354 1355 1356 1356 1358 1404 1408 1416 1418 1421 1421 1422
## -3 -3 -6 -5 30 -1 -3 1 2 5 13 -13 -4 -1 26 -3
## 1425 1428 1430 1433 1436 1437 1439 1441 1442 1442 1444 1446 1446 1448 1448 1448
## -4 59 -5 -4 1 0 -6 66 -8 -3 -11 -9 -4 -7 3 -2
## 1449 1450 1451 1451 1451 1452 1452 1452 1452 1453 1453 1453 1454 1454 1454 1455
## -1 0 -9 -4 -6 -3 -5 -5 -3 3 -6 23 -6 -6 -6 -2
## 1455 1456 1456 1457 1457 1458 1500 1500 1503 1506 1507 1507 1507 1507 1507 1508
## -4 34 1 -2 1 -2 0 5 6 11 -8 -3 8 -3 -8 11
## 1509 1510 1512 1512 1513 1513 1515 1518 1519 1520 1522 1522 1524 1524 1525 1525
## 19 0 -6 7 -2 13 38 -12 20 -5 -8 -8 27 -6 0 -5
## 1526 1527 1527 1527 1527 1527 1529 1529 1530 1530 1530 1530 1534 1535 1536 1536
## 8 -3 32 -6 27 37 -1 14 0 0 33 0 19 5 -4 18
## 1538 1539 1539 1539 1540 1540 1540 1542 1542 1543 1543 1545 1546 1546 1546 1547
## -2 9 -6 6 122 -5 0 -3 2 -7 -2 5 6 1 -4 2
## 1547 1548 1549 1550 1550 1550 1550 1550 1550 1551 1551 1551 1552 1552 1552 1553
## 7 3 16 0 0 5 0 30 53 3 1 -9 -8 -8 2 -7
## 1554 1554 1554 1554 1556 1557 1557 1557 1558 1558 1559 1600 1600 1600 1601 1602
## -6 -6 -6 -1 -4 -3 -3 7 119 -2 -6 -10 0 0 1 -3
## 1603 1604 1604 1604 1604 1605 1605 1606 1607 1607 1608 1609 1609 1610 1612 1613
## 13 9 4 4 16 -5 0 -4 12 337 -1 19 0 -5 -3 4
## 1614 1615 1617 1619 1619 1621 1622 1623 1624 1625 1625 1626 1626 1626 1626 1627
## 19 25 12 14 54 33 2 3 174 35 15 -4 -4 -4 -4 88
## 1627 1627 1627 1628 1628 1628 1629 1630 1633 1634 1635 1637 1640 1641 1644 1645
## -3 -3 -3 -2 -2 38 -1 -6 -7 19 20 52 0 -9 -6 15
## 1646 1648 1651 1651 1652 1652 1654 1655 1655 1655 1655 1655 1656 1657 1657 1658
## -4 13 1 -4 2 -8 -6 -5 -5 0 -5 0 -4 -3 -3 -2
## 1658 1658 1658 1659 1700 1700 1701 1701 1701 1703 1705 1705 1706 1706 1711 1712
## -2 58 -2 -1 0 0 -9 1 25 13 5 -5 -4 1 36 36
## 1713 1714 1714 1715 1715 1716 1716 1717 1718 1719 1719 1719 1720 1720 1721 1724
## 13 -6 -1 -5 15 91 6 32 8 -1 -6 -1 -5 35 26 -1
## 1725 1725 1726 1726 1727 1727 1730 1730 1730 1732 1732 1734 1735 1736 1738 1739
## -4 0 -4 -3 2 -2 0 1 5 -2 2 9 15 56 -2 -1
## 1739 1741 1743 1744 1744 1745 1746 1749 1749 1750 1751 1751 1751 1753 1755 1755
## -6 7 88 24 -1 -4 -4 -11 0 0 6 181 -2 53 10 10
## 1757 1758 1758 1758 1758 1758 1800 1800 1801 1801 1802 1803 1806 1808 1809 1809
## -3 -2 -2 128 -2 -2 0 0 66 86 102 3 -4 -7 -1 9
## 1810 1811 1811 1812 1814 1821 1823 1823 1824 1825 1825 1825 1826 1828 1828 1830
## 40 1 11 -3 -1 171 -7 13 -6 -4 -10 -5 -4 -2 -1 -5
## 1831 1832 1832 1832 1833 1834 1834 1835 1838 1839 1839 1840 1840 1840 1841 1843
## 16 -3 -3 2 33 -1 -1 5 23 -6 99 4 -5 -5 56 -2
## 1843 1843 1844 1846 1847 1847 1847 1848 1848 1849 1850 1850 1850 1850 1850 1852
## 8 -2 -1 -9 -8 -8 82 63 8 -1 -10 0 0 65 -5 -3
## 1854 1854 1856 1856 1856 1856 1856 1856 1857 1857 1857 1858 1858 1859 1900 1900
## -6 -1 131 -3 1 -4 -4 -3 -2 -3 -3 -1 -2 -1 15 24
## 1902 1904 1904 1904 1905 1905 1906 1909 1909 1909 1910 1910 1910 1910 1911 1914
## -3 -1 -6 125 5 0 -9 -3 -1 4 0 -5 81 -5 12 2
## 1914 1915 1915 1917 1918 1919 1919 1920 1921 1922 1922 1923 1923 1925 1926 1927
## 24 -5 45 7 -2 -1 -1 0 1 22 22 24 68 25 -4 -3
## 1928 1928 1929 1930 1933 1933 1933 1934 1937 1937 1938 1939 1939 1939 1939 1940
## -2 53 -1 105 -7 48 8 129 32 37 18 59 -1 -1 -1 5
## 1942 1944 1945 1946 1946 1946 1949 1949 1950 1950 1951 1951 1952 1952 1954 1955
## 157 -1 5 6 16 56 50 50 65 -9 11 11 -8 53 9 -5
## 1955 1957 1957 1957 1958 1958 1958 1959 2000 2001 2001 2002 2003 2005 2008 2008
## 25 -8 12 12 -2 28 23 29 0 1 86 17 -12 180 56 268
## 2013 2015 2015 2015 2016 2017 2017 2019 2021 2023 2023 2023 2024 2024 2025 2025
## -2 10 60 0 36 42 2 19 -4 38 8 -7 109 -5 -5 -5
## 2025 2026 2027 2027 2027 2028 2030 2030 2030 2030 2031 2032 2033 2036 2036 2038
## -4 56 27 12 -3 -2 -5 0 5 1 -4 97 34 1 6 23
## 2040 2041 2041 2046 2046 2046 2047 2048 2050 2052 2052 2056 2056 2100 2101 2101
## 0 1 -4 11 -4 6 7 78 51 23 7 291 31 0 11 21
## 2103 2107 2108 2110 2113 2113 2114 2114 2115 2116 2116 2119 2128 2128 2128 2129
## -7 27 23 70 3 38 158 14 255 71 -4 -6 -7 3 -2 9
## 2129 2130 2134 2137 2137 2140 2140 2141 2142 2150 2150 2151 2152 2154 2155 2157
## -1 0 -1 2 2 5 0 -4 -3 105 -5 16 32 -1 -4 2
## 2209 2209 2211 2211 2215 2217 2217 2220 2222 2225 2229 2237 2240 2241 2245 2245
## 24 14 26 86 -5 -12 47 35 -7 -4 30 -8 -5 -4 0 50
## 2250 2257 2257 2258 2303 2306 2307 2308 2310 2313 2317 2322 2326 2327 2337 2347
## -5 177 12 8 8 21 22 23 15 108 22 24 116 37 102 62
## 2349 2351 2353 2353 2354 2356 <NA> <NA>
## -10 -8 -6 -6 -5 -3 NA NA
plot(jfk_time_series,
main = "Departure Delays for First 1000 JFK Flights",
xlab = "Departure Time",
ylab = "Delay (minutes)",
col = "red")
Convert the origin column (airports: “JFK”, “LGA”, “EWR”) to a factor called origin_factor. Show the factor levels with levels() and create a frequency table with table(). Make a bar plot of flights by airport using barplot().
Output: The levels, frequency table, and bar plot.
flights$origin_factor <- factor(flights$origin)
levels(flights$origin_factor)
## [1] "EWR" "JFK" "LGA"
origin_counts <- table(flights$origin_factor)
origin_counts
##
## EWR JFK LGA
## 120835 111279 104662
barplot(origin_counts,
main = "Number of Flights by Airport",
xlab = "Airport",
ylab = "Flight Count",
col = "darkgreen")
Recode origin_factor from Question 4 into a new column origin_recoded with full names: “JFK” to “Kennedy”, “LGA” to “LaGuardia”, “EWR” to “Newark” using fct_recode() or base R. Create a bar plot of the recoded factor.
Output: The new levels and bar plot.
flights <- flights %>%
mutate(origin_recoded = fct_recode(origin_factor,
"Newark" = "EWR",
"Kennedy" = "JFK",
"LaGuardia" = "LGA"
))
recoded_counts <- table(flights$origin_recoded)
barplot(recoded_counts,
main = "Flights by Airport (Recoded)",
col = "darkred",
ylab = "Number of Flights")
Count missing values in dep_delay and arr_delay using colSums(is.na(flights)). Impute missing dep_delay values with 0 (assuming no delay for cancelled flights) in a new column dep_delay_imputed. Create a frequency table of dep_delay_imputed for delays between -20 and 20 minutes (use filter() to subset).
Output: NA counts, and the frequency table for imputed delays.
colSums(is.na(flights[, c("dep_delay", "arr_delay")]))
## dep_delay arr_delay
## 8255 9430
flights <- flights %>%
mutate(dep_delay_imputed = ifelse(is.na(dep_delay), 0, dep_delay))
delay_subset <- flights %>%
filter(dep_delay_imputed >= -20 & dep_delay_imputed <= 20)
table(delay_subset$dep_delay_imputed)
##
## -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8
## 37 19 81 110 162 408 498 901 1594 2727 5891 7875 11791
## -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5
## 16752 20701 24821 24619 24218 21516 18813 24769 8050 6233 5450 4807 4447
## 6 7 8 9 10 11 12 13 14 15 16 17 18
## 3789 3520 3381 3062 2859 2756 2494 2414 2256 2140 2085 1873 1749
## 19 20
## 1730 1704
Reflect on the assignment: What was easy or hard about working with flight dates or missing data? How might assuming zero delay for missing values (Question 6) affect conclusions about flight punctuality? What did you learn about NYC flights in 2013? (150-200 words)
The assignment was a great exercise of putting together everything we have learnt thus far. Question one through 4 were pretty straight forward as most of the information i was able to regurgitate but five and six were a bit daunting as i don’t remember having cover some of the coding and i had to research how to actually put the code together. the research itself was fruitful and i was able to understand what ws being asked and how to implement it into usable code. assuming 0 delay for question 6 would create an unrealistic result and would would show erroneous data rather than actual. i learnt a lot about NYC flights in 2013 13 but if i had to pick just one take away it would be that their average delay time was about 200 minutes, which is very significant to me in an industry where transportation is their forte.