Homework Assignment: Analyzing NYC Flight Data

This homework assignment uses the flights dataset from the nycflights13 package, which contains real-world data on over 336,000 flights departing from New York City airports (JFK, LGA, EWR) in 2013. The dataset includes variables such as departure and arrival times (with date components), airline carrier (categorical), origin and destination airports (categorical), delays (with missing values for cancelled flights), distance, and more. It is sourced from the US Bureau of Transportation Statistics.

Objectives

This assignment reinforces the Week 4 topics:

  • Parsing and manipulating dates/times using lubridate.
  • Creating and analyzing time series with zoo.
  • Working with factors, inspecting levels, and recoding them.
  • Identifying and handling missing data (e.g., removal, imputation).

All questions (except the final reflection) require you to write and run R code to solve them. Submit your URL for your RPubs. Make sure to comment your code, along with key outputs (e.g., summaries, plots, or tables). Use the provided setup code to load the data.

Setup

Install and load the necessary packages if not already done:

#install.packages(c("nycflights13", "dplyr", "lubridate", "zoo", "forcats"))  # If needed
library(nycflights13)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(forcats)  # For factor recoding; base R alternatives are acceptable
data(flights)  # Load the dataset

Explore the data briefly with str(flights) and head(flights) to understand the structure. Note: Dates are in separate year, month, day columns; times are in dep_time and arr_time (as integers like 517 for 5:17 AM).

#Explore your data here
str(flights)
## tibble [336,776 Ă— 19] (S3: tbl_df/tbl/data.frame)
##  $ year          : int [1:336776] 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
##  $ month         : int [1:336776] 1 1 1 1 1 1 1 1 1 1 ...
##  $ day           : int [1:336776] 1 1 1 1 1 1 1 1 1 1 ...
##  $ dep_time      : int [1:336776] 517 533 542 544 554 554 555 557 557 558 ...
##  $ sched_dep_time: int [1:336776] 515 529 540 545 600 558 600 600 600 600 ...
##  $ dep_delay     : num [1:336776] 2 4 2 -1 -6 -4 -5 -3 -3 -2 ...
##  $ arr_time      : int [1:336776] 830 850 923 1004 812 740 913 709 838 753 ...
##  $ sched_arr_time: int [1:336776] 819 830 850 1022 837 728 854 723 846 745 ...
##  $ arr_delay     : num [1:336776] 11 20 33 -18 -25 12 19 -14 -8 8 ...
##  $ carrier       : chr [1:336776] "UA" "UA" "AA" "B6" ...
##  $ flight        : int [1:336776] 1545 1714 1141 725 461 1696 507 5708 79 301 ...
##  $ tailnum       : chr [1:336776] "N14228" "N24211" "N619AA" "N804JB" ...
##  $ origin        : chr [1:336776] "EWR" "LGA" "JFK" "JFK" ...
##  $ dest          : chr [1:336776] "IAH" "IAH" "MIA" "BQN" ...
##  $ air_time      : num [1:336776] 227 227 160 183 116 150 158 53 140 138 ...
##  $ distance      : num [1:336776] 1400 1416 1089 1576 762 ...
##  $ hour          : num [1:336776] 5 5 5 5 6 5 6 6 6 6 ...
##  $ minute        : num [1:336776] 15 29 40 45 0 58 0 0 0 0 ...
##  $ time_hour     : POSIXct[1:336776], format: "2013-01-01 05:00:00" "2013-01-01 05:00:00" ...
head(flights)
## # A tibble: 6 Ă— 19
##    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
## 1  2013     1     1      517            515         2      830            819
## 2  2013     1     1      533            529         4      850            830
## 3  2013     1     1      542            540         2      923            850
## 4  2013     1     1      544            545        -1     1004           1022
## 5  2013     1     1      554            600        -6      812            837
## 6  2013     1     1      554            558        -4      740            728
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>

Question 1: Creating Dates with lubridate

Create a column dep_datetime by combining year, month, day, and dep_time into a POSIXct datetime using lubridate. (Hint: Use make_datetime function to combine: year, month, day, for hour and min use division, e.g., hour = dep_time %/% 100, minute = dep_time %% 100.)

Show the first 5 rows of flights with dep_datetime.

Output: First 5 rows showing year, month, day, dep_time, and dep_datetime.

dep_datetime <- flights %>% mutate(dep_datetime = make_datetime(year = year, month = month,day = day,hour = dep_time %/% 100, min = dep_time %% 100))
flights
## # A tibble: 336,776 Ă— 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1      517            515         2      830            819
##  2  2013     1     1      533            529         4      850            830
##  3  2013     1     1      542            540         2      923            850
##  4  2013     1     1      544            545        -1     1004           1022
##  5  2013     1     1      554            600        -6      812            837
##  6  2013     1     1      554            558        -4      740            728
##  7  2013     1     1      555            600        -5      913            854
##  8  2013     1     1      557            600        -3      709            723
##  9  2013     1     1      557            600        -3      838            846
## 10  2013     1     1      558            600        -2      753            745
## # ℹ 336,766 more rows
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>
head(dep_datetime %>% select(year, month, day, dep_time, dep_datetime))
## # A tibble: 6 Ă— 5
##    year month   day dep_time dep_datetime       
##   <int> <int> <int>    <int> <dttm>             
## 1  2013     1     1      517 2013-01-01 05:17:00
## 2  2013     1     1      533 2013-01-01 05:33:00
## 3  2013     1     1      542 2013-01-01 05:42:00
## 4  2013     1     1      544 2013-01-01 05:44:00
## 5  2013     1     1      554 2013-01-01 05:54:00
## 6  2013     1     1      554 2013-01-01 05:54:00

Question 2: Simple Date Manipulations with lubridate

Using dep_datetime from Question 1, create a column weekday with the day of the week (e.g., “Mon”) using wday(dep_datetime, label = TRUE). Use table() to show how many flights occur on each weekday.

Output: The table of flight counts by weekday.

dep_datetime_2 <- dep_datetime %>% 
  mutate(weekday = wday(dep_datetime, label = TRUE))

table(dep_datetime_2$weekday)
## 
##   Sun   Mon   Tue   Wed   Thu   Fri   Sat 
## 45643 49468 49273 48858 48654 48703 37922

Question 3: Time Series with zoo

Filter for flights from JFK (origin == “JFK”) and create a zoo time series of departure delays (dep_delay) by dep_datetime. Plot the time series (use plot()). (Hint: Use a subset to avoid memory issues, e.g., first 1000 JFK flights.)

Output: The time series plot.

library(zoo)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” ggplot2 4.0.2     âś” stringr 1.6.0
## âś” purrr   1.2.1     âś” tibble  3.3.1
## âś” readr   2.2.0     âś” tidyr   1.3.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
jfk_zoo <- flights %>% filter(origin == "JFK") %>% slice_head(n = 1000)
jfk_zoo
## # A tibble: 1,000 Ă— 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1      542            540         2      923            850
##  2  2013     1     1      544            545        -1     1004           1022
##  3  2013     1     1      557            600        -3      838            846
##  4  2013     1     1      558            600        -2      849            851
##  5  2013     1     1      558            600        -2      853            856
##  6  2013     1     1      558            600        -2      924            917
##  7  2013     1     1      559            559         0      702            706
##  8  2013     1     1      606            610        -4      837            845
##  9  2013     1     1      611            600        11      945            931
## 10  2013     1     1      613            610         3      925            921
## # ℹ 990 more rows
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>
jfk_time_series <- zoo(jfk_zoo$dep_delay, order.by = jfk_zoo$dep_time)
## Warning in zoo(jfk_zoo$dep_delay, order.by = jfk_zoo$dep_time): some methods
## for "zoo" objects do not work if the index entries in 'order.by' are not unique
jfk_time_series
##   25   32   42   50  106  126  235  535  535  539  542  543  544  546  550  553 
##   26   33   43  185  141  156  156   -5   -5   -6    2   -2   -1    6  -10   -7 
##  553  553  554  554  554  555  556  556  557  557  557  557  558  558  558  558 
##   -7   -7   -6   -6   -6   -5   -4   -4   -3   -3   -3   -3   -2   -2   -2   -2 
##  559  559  600  604  604  606  606  606  606  608  610  611  611  612  612  613 
##    0   -1    0    4   -6   -4   -4   -4    7   23   -5   11   -4   -3   -3    3 
##  614  615  615  616  616  617  618  622  627  627  628  628  628  630  637  637 
##   -1    0    0   16   17    2    3   -8   -3   -3   -2   -2   -2   20    7   -3 
##  639  640  641  641  642  643  645  649  650  651  652  653  653  653  654  655 
##   -1    0   -6    1   -5   -4   -2   -6   -5   -4   -3   -2   -7   -2   -6    0 
##  655  655  655  655  655  655  656  656  656  657  658  658  658  658  659  659 
##   -5   -4    0   -5   -4    0   -3   -4   -4   -2   -2   -2    3   -2   -1   -1 
##  700  701  702  704  704  705  707  707  708  709  710  711  712  712  712  712 
##    0    1    2    9    9   35   -8   -8   -7    9   -5   -4   -3   -3   -3   -3 
##  714  715  715  715  716  716  719  719  720  721  721  721  724  727  728  728 
##   -1   -6    0   15    1   16   -2   19   -1   21    0    6   -6   -3   -2   -2 
##  729  729  729  730  730  732  732  732  732  733  733  733  733  733  734  735 
##   -1   -1   -6   15    0   -3    2   -4   -4   -3   -2   -4   -3   -2   -3   -2 
##  736  737  739  742  742  743  743  743  743  744  745  745  746  747  747  751 
##   -4   -3   -6   -3    5   13   -6   13   -2   -5    0   -5    1   -2   -3   -9 
##  755  756  757  757  758  758  759  759  759  759  759  800  800  801  801  803 
##   -5   -4   -3   -3   -1   -2   -1   -6   -1   -1    0  -10  -10   -9   -9   -7 
##  803  804  804  805  805  805  805  806  806  807  807  807  807  807  808  808 
##    3   -6   15    0    5    0   -5   -4   -4   -3   -3   -3    7   -8   -2   -2 
##  808  808  809  809  810  810  811  811  811  811  811  813  813  814  815  815 
##   -2   -7   -1   -6    0    0   -4    1   -4   -4    1    3    8   -6  -10    0 
##  817  817  817  818  818  819  819  819  819  820  820  820  820  820  820  821 
##    7   -3    7   13   -2   19   -6   -1   -1    0  -10    0   -7    0   -5    1 
##  821  821  822  823  824  825  825  825  826  826  826  827  828  828  828  828 
##   -4   -4   -8   -2   -1   -2   -4    5   71   -4   -4    7   -2   -7   -2   -2 
##  829  829  830  830  836  837  837  837  837  837  839  840  840  841  843  844 
##   -1   -1   -5   20   -4   -8   -8   -3   10   17   39   -5   -5   -4   -2   -1 
##  845  846  847  848  853  853  855  855  855  856  857  857  858  858  859  859 
##   35   26    2  853    8   -2   -5   -5   -5   -4   -3   -3   -2   -2   -1   39 
##  859  901  901  903  904  905  905  907  908  909  909  909  910  912  912  913 
##   -1   -3   -4    3   -6    0    0    3   -2   59    9   -6    0   12   -7    8 
##  914  916  916  917  917  917  920  921  923  925  926  926  926  928  928  932 
##   14   -3   71    2   -3   -3   15   21    4    5    4   -4   -4   23   63    2 
##  933  933  933  934  936  940  941  941  941  947  948  949  952  953  955  955 
##   -4   29   -4   -3   51   18   -4   -4   19   -6  -11   -6   -1   -2   -5   -4 
##  957 1000 1003 1003 1004 1009 1010 1010 1010 1011 1015 1018 1020 1024 1024 1026 
##   -3   -5    4  103   19   16   -5   -5   -5   -4    0   48   21   -1   -6   -4 
## 1026 1026 1028 1029 1031 1036 1036 1038 1042 1046 1047 1055 1056 1057 1058 1058 
##   -4   -4   -2   -1    1   -4   -4    8    2   -4   -3  100   -4   -3   -2   -2 
## 1059 1100 1100 1104 1109 1111 1112 1122 1123 1124 1124 1125 1125 1127 1127 1127 
##   -1    0    0    4   -6   -4   12   -8   -2   24   -6   -5   -5   -3   -2   -3 
## 1128 1131 1133 1133 1134 1144 1148 1152 1153 1153 1153 1153 1153 1155 1155 1156 
##   -1    1    4    3   11   29   -7   -8   30   -7   -7   -7   -7   -5   -5   -2 
## 1156 1156 1159 1200 1201 1201 1202 1203 1204 1208 1213 1216 1220 1222 1225 1228 
##   -4   -2   -1    0    1   38    2    3    4   10   -7   -4   -9   -7    0   -7 
## 1230 1230 1233 1234 1235 1237 1240 1240 1240 1245 1245 1245 1246 1249 1250 1250 
##   -5   -5   33    9   -5   -8    5   -5   -5    0   -4   -4   21   -9    5  -10 
## 1251 1254 1255 1256 1257 1257 1300 1301 1302 1304 1306 1309 1310 1313 1324 1325 
##   -1   -5    0    4   -1   -2    0    2    3    5   26   40   10   15   -6   -5 
## 1326 1326 1327 1328 1331 1333 1333 1335 1337 1339 1340 1341 1341 1345 1350 1351 
##   -4   -4   -3   -2    7   -2   34    5   77    4   -5   -4   -4   65   -5   -4 
## 1352 1353 1353 1354 1354 1355 1356 1356 1358 1404 1408 1416 1418 1421 1421 1422 
##   -3   -3   -6   -5   30   -1   -3    1    2    5   13  -13   -4   -1   26   -3 
## 1425 1428 1430 1433 1436 1437 1439 1441 1442 1442 1444 1446 1446 1448 1448 1448 
##   -4   59   -5   -4    1    0   -6   66   -8   -3  -11   -9   -4   -7    3   -2 
## 1449 1450 1451 1451 1451 1452 1452 1452 1452 1453 1453 1453 1454 1454 1454 1455 
##   -1    0   -9   -4   -6   -3   -5   -5   -3    3   -6   23   -6   -6   -6   -2 
## 1455 1456 1456 1457 1457 1458 1500 1500 1503 1506 1507 1507 1507 1507 1507 1508 
##   -4   34    1   -2    1   -2    0    5    6   11   -8   -3    8   -3   -8   11 
## 1509 1510 1512 1512 1513 1513 1515 1518 1519 1520 1522 1522 1524 1524 1525 1525 
##   19    0   -6    7   -2   13   38  -12   20   -5   -8   -8   27   -6    0   -5 
## 1526 1527 1527 1527 1527 1527 1529 1529 1530 1530 1530 1530 1534 1535 1536 1536 
##    8   -3   32   -6   27   37   -1   14    0    0   33    0   19    5   -4   18 
## 1538 1539 1539 1539 1540 1540 1540 1542 1542 1543 1543 1545 1546 1546 1546 1547 
##   -2    9   -6    6  122   -5    0   -3    2   -7   -2    5    6    1   -4    2 
## 1547 1548 1549 1550 1550 1550 1550 1550 1550 1551 1551 1551 1552 1552 1552 1553 
##    7    3   16    0    0    5    0   30   53    3    1   -9   -8   -8    2   -7 
## 1554 1554 1554 1554 1556 1557 1557 1557 1558 1558 1559 1600 1600 1600 1601 1602 
##   -6   -6   -6   -1   -4   -3   -3    7  119   -2   -6  -10    0    0    1   -3 
## 1603 1604 1604 1604 1604 1605 1605 1606 1607 1607 1608 1609 1609 1610 1612 1613 
##   13    9    4    4   16   -5    0   -4   12  337   -1   19    0   -5   -3    4 
## 1614 1615 1617 1619 1619 1621 1622 1623 1624 1625 1625 1626 1626 1626 1626 1627 
##   19   25   12   14   54   33    2    3  174   35   15   -4   -4   -4   -4   88 
## 1627 1627 1627 1628 1628 1628 1629 1630 1633 1634 1635 1637 1640 1641 1644 1645 
##   -3   -3   -3   -2   -2   38   -1   -6   -7   19   20   52    0   -9   -6   15 
## 1646 1648 1651 1651 1652 1652 1654 1655 1655 1655 1655 1655 1656 1657 1657 1658 
##   -4   13    1   -4    2   -8   -6   -5   -5    0   -5    0   -4   -3   -3   -2 
## 1658 1658 1658 1659 1700 1700 1701 1701 1701 1703 1705 1705 1706 1706 1711 1712 
##   -2   58   -2   -1    0    0   -9    1   25   13    5   -5   -4    1   36   36 
## 1713 1714 1714 1715 1715 1716 1716 1717 1718 1719 1719 1719 1720 1720 1721 1724 
##   13   -6   -1   -5   15   91    6   32    8   -1   -6   -1   -5   35   26   -1 
## 1725 1725 1726 1726 1727 1727 1730 1730 1730 1732 1732 1734 1735 1736 1738 1739 
##   -4    0   -4   -3    2   -2    0    1    5   -2    2    9   15   56   -2   -1 
## 1739 1741 1743 1744 1744 1745 1746 1749 1749 1750 1751 1751 1751 1753 1755 1755 
##   -6    7   88   24   -1   -4   -4  -11    0    0    6  181   -2   53   10   10 
## 1757 1758 1758 1758 1758 1758 1800 1800 1801 1801 1802 1803 1806 1808 1809 1809 
##   -3   -2   -2  128   -2   -2    0    0   66   86  102    3   -4   -7   -1    9 
## 1810 1811 1811 1812 1814 1821 1823 1823 1824 1825 1825 1825 1826 1828 1828 1830 
##   40    1   11   -3   -1  171   -7   13   -6   -4  -10   -5   -4   -2   -1   -5 
## 1831 1832 1832 1832 1833 1834 1834 1835 1838 1839 1839 1840 1840 1840 1841 1843 
##   16   -3   -3    2   33   -1   -1    5   23   -6   99    4   -5   -5   56   -2 
## 1843 1843 1844 1846 1847 1847 1847 1848 1848 1849 1850 1850 1850 1850 1850 1852 
##    8   -2   -1   -9   -8   -8   82   63    8   -1  -10    0    0   65   -5   -3 
## 1854 1854 1856 1856 1856 1856 1856 1856 1857 1857 1857 1858 1858 1859 1900 1900 
##   -6   -1  131   -3    1   -4   -4   -3   -2   -3   -3   -1   -2   -1   15   24 
## 1902 1904 1904 1904 1905 1905 1906 1909 1909 1909 1910 1910 1910 1910 1911 1914 
##   -3   -1   -6  125    5    0   -9   -3   -1    4    0   -5   81   -5   12    2 
## 1914 1915 1915 1917 1918 1919 1919 1920 1921 1922 1922 1923 1923 1925 1926 1927 
##   24   -5   45    7   -2   -1   -1    0    1   22   22   24   68   25   -4   -3 
## 1928 1928 1929 1930 1933 1933 1933 1934 1937 1937 1938 1939 1939 1939 1939 1940 
##   -2   53   -1  105   -7   48    8  129   32   37   18   59   -1   -1   -1    5 
## 1942 1944 1945 1946 1946 1946 1949 1949 1950 1950 1951 1951 1952 1952 1954 1955 
##  157   -1    5    6   16   56   50   50   65   -9   11   11   -8   53    9   -5 
## 1955 1957 1957 1957 1958 1958 1958 1959 2000 2001 2001 2002 2003 2005 2008 2008 
##   25   -8   12   12   -2   28   23   29    0    1   86   17  -12  180   56  268 
## 2013 2015 2015 2015 2016 2017 2017 2019 2021 2023 2023 2023 2024 2024 2025 2025 
##   -2   10   60    0   36   42    2   19   -4   38    8   -7  109   -5   -5   -5 
## 2025 2026 2027 2027 2027 2028 2030 2030 2030 2030 2031 2032 2033 2036 2036 2038 
##   -4   56   27   12   -3   -2   -5    0    5    1   -4   97   34    1    6   23 
## 2040 2041 2041 2046 2046 2046 2047 2048 2050 2052 2052 2056 2056 2100 2101 2101 
##    0    1   -4   11   -4    6    7   78   51   23    7  291   31    0   11   21 
## 2103 2107 2108 2110 2113 2113 2114 2114 2115 2116 2116 2119 2128 2128 2128 2129 
##   -7   27   23   70    3   38  158   14  255   71   -4   -6   -7    3   -2    9 
## 2129 2130 2134 2137 2137 2140 2140 2141 2142 2150 2150 2151 2152 2154 2155 2157 
##   -1    0   -1    2    2    5    0   -4   -3  105   -5   16   32   -1   -4    2 
## 2209 2209 2211 2211 2215 2217 2217 2220 2222 2225 2229 2237 2240 2241 2245 2245 
##   24   14   26   86   -5  -12   47   35   -7   -4   30   -8   -5   -4    0   50 
## 2250 2257 2257 2258 2303 2306 2307 2308 2310 2313 2317 2322 2326 2327 2337 2347 
##   -5  177   12    8    8   21   22   23   15  108   22   24  116   37  102   62 
## 2349 2351 2353 2353 2354 2356 <NA> <NA> 
##  -10   -8   -6   -6   -5   -3   NA   NA
plot(jfk_time_series, 
     main = "Departure Delays for First 1000 JFK Flights", 
     xlab = "Departure Time", 
     ylab = "Delay (minutes)",
     col = "red")

Question 4: Working with Factors

Convert the origin column (airports: “JFK”, “LGA”, “EWR”) to a factor called origin_factor. Show the factor levels with levels() and create a frequency table with table(). Make a bar plot of flights by airport using barplot().

Output: The levels, frequency table, and bar plot.

flights$origin_factor <- factor(flights$origin)
levels(flights$origin_factor)
## [1] "EWR" "JFK" "LGA"
origin_counts <- table(flights$origin_factor)
origin_counts
## 
##    EWR    JFK    LGA 
## 120835 111279 104662
barplot(origin_counts, 
        main = "Number of Flights by Airport", 
        xlab = "Airport", 
        ylab = "Flight Count", 
        col = "darkgreen")

Question 5: Recoding Factors

Recode origin_factor from Question 4 into a new column origin_recoded with full names: “JFK” to “Kennedy”, “LGA” to “LaGuardia”, “EWR” to “Newark” using fct_recode() or base R. Create a bar plot of the recoded factor.

Output: The new levels and bar plot.

flights <- flights %>%
  mutate(origin_recoded = fct_recode(origin_factor,
    "Newark"    = "EWR",
    "Kennedy"   = "JFK",
    "LaGuardia" = "LGA"
  ))
recoded_counts <- table(flights$origin_recoded)
barplot(recoded_counts, 
        main = "Flights by Airport (Recoded)", 
        col = "darkred",
        ylab = "Number of Flights")

Question 6: Handling Missing Data

Count missing values in dep_delay and arr_delay using colSums(is.na(flights)). Impute missing dep_delay values with 0 (assuming no delay for cancelled flights) in a new column dep_delay_imputed. Create a frequency table of dep_delay_imputed for delays between -20 and 20 minutes (use filter() to subset).

Output: NA counts, and the frequency table for imputed delays.

colSums(is.na(flights[, c("dep_delay", "arr_delay")]))
## dep_delay arr_delay 
##      8255      9430
flights <- flights %>%
  mutate(dep_delay_imputed = ifelse(is.na(dep_delay), 0, dep_delay))
delay_subset <- flights %>%
  filter(dep_delay_imputed >= -20 & dep_delay_imputed <= 20)

table(delay_subset$dep_delay_imputed)
## 
##   -20   -19   -18   -17   -16   -15   -14   -13   -12   -11   -10    -9    -8 
##    37    19    81   110   162   408   498   901  1594  2727  5891  7875 11791 
##    -7    -6    -5    -4    -3    -2    -1     0     1     2     3     4     5 
## 16752 20701 24821 24619 24218 21516 18813 24769  8050  6233  5450  4807  4447 
##     6     7     8     9    10    11    12    13    14    15    16    17    18 
##  3789  3520  3381  3062  2859  2756  2494  2414  2256  2140  2085  1873  1749 
##    19    20 
##  1730  1704

Question 7: Reflection (No Coding)

Reflect on the assignment: What was easy or hard about working with flight dates or missing data? How might assuming zero delay for missing values (Question 6) affect conclusions about flight punctuality? What did you learn about NYC flights in 2013? (150-200 words)

The assignment was a great exercise of putting together everything we have learnt thus far. Question one through 4 were pretty straight forward as most of the information i was able to regurgitate but five and six were a bit daunting as i don’t remember having cover some of the coding and i had to research how to actually put the code together. the research itself was fruitful and i was able to understand what ws being asked and how to implement it into usable code. assuming 0 delay for question 6 would create an unrealistic result and would would show erroneous data rather than actual. i learnt a lot about NYC flights in 2013 13 but if i had to pick just one take away it would be that their average delay time was about 200 minutes, which is very significant to me in an industry where transportation is their forte.