NYC Flights Assignments

Author

N Bellot Norman

Published

June 13, 2024

Loading the libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data("flights")

Understanding the dataset: Descriptive Statistics

summary(flights)
      year          month             day           dep_time     sched_dep_time
 Min.   :2023   Min.   : 1.000   Min.   : 1.00   Min.   :   1    Min.   : 500  
 1st Qu.:2023   1st Qu.: 3.000   1st Qu.: 8.00   1st Qu.: 931    1st Qu.: 930  
 Median :2023   Median : 6.000   Median :16.00   Median :1357    Median :1359  
 Mean   :2023   Mean   : 6.423   Mean   :15.74   Mean   :1366    Mean   :1364  
 3rd Qu.:2023   3rd Qu.: 9.000   3rd Qu.:23.00   3rd Qu.:1804    3rd Qu.:1759  
 Max.   :2023   Max.   :12.000   Max.   :31.00   Max.   :2400    Max.   :2359  
                                                 NA's   :10738                 
   dep_delay          arr_time     sched_arr_time   arr_delay       
 Min.   : -50.00   Min.   :   1    Min.   :   1   Min.   : -97.000  
 1st Qu.:  -6.00   1st Qu.:1105    1st Qu.:1135   1st Qu.: -22.000  
 Median :  -2.00   Median :1519    Median :1551   Median : -10.000  
 Mean   :  13.84   Mean   :1497    Mean   :1552   Mean   :   4.345  
 3rd Qu.:  10.00   3rd Qu.:1946    3rd Qu.:2007   3rd Qu.:   9.000  
 Max.   :1813.00   Max.   :2400    Max.   :2359   Max.   :1812.000  
 NA's   :10738     NA's   :11453                  NA's   :12534     
   carrier              flight         tailnum             origin         
 Length:435352      Min.   :   1.0   Length:435352      Length:435352     
 Class :character   1st Qu.: 364.0   Class :character   Class :character  
 Mode  :character   Median : 734.0   Mode  :character   Mode  :character  
                    Mean   : 785.2                                        
                    3rd Qu.:1188.0                                        
                    Max.   :1972.0                                        
                                                                          
     dest              air_time        distance           hour      
 Length:435352      Min.   : 18.0   Min.   :  80.0   Min.   : 5.00  
 Class :character   1st Qu.: 77.0   1st Qu.: 479.0   1st Qu.: 9.00  
 Mode  :character   Median :121.0   Median : 762.0   Median :13.00  
                    Mean   :141.8   Mean   : 977.5   Mean   :13.35  
                    3rd Qu.:177.0   3rd Qu.:1182.0   3rd Qu.:17.00  
                    Max.   :701.0   Max.   :4983.0   Max.   :23.00  
                    NA's   :12534                                   
     minute        time_hour                     
 Min.   : 0.00   Min.   :2023-01-01 05:00:00.00  
 1st Qu.:10.00   1st Qu.:2023-03-30 20:00:00.00  
 Median :29.00   Median :2023-06-27 08:00:00.00  
 Mean   :28.53   Mean   :2023-06-29 10:02:22.39  
 3rd Qu.:45.00   3rd Qu.:2023-09-27 11:00:00.00  
 Max.   :59.00   Max.   :2023-12-31 23:00:00.00  
                                                 

Displaying Flight Distances at New York Airports: Bar Graph

plot1 <- flights |>
  filter(origin %in% c("JFK", "EWR", "LGA")) |>
  ggplot() +
  geom_bar(aes(x=origin, y=air_time, fill = distance),
      position = "dodge", stat = "identity") +
  labs(y = "Flight Distance",
       title = "Exploring New York Airports",
       subtitle = "Distance Analysis of 2023 Flights", 
       fill = "Legend: Flight Distance",
      caption = "Source:  RITA, Bureau of Transportation Statistics")
plot1
Warning: Removed 12534 rows containing missing values or values outside the scale range
(`geom_bar()`).

Summary

The airline industry is a major contributor to the economy. Compared to land transportation, air travel offers efficiency, accessibility, safety, and comfort. Within the United States, New York is a pivotal hub for national and international travel. To gain insights into travel patterns and trends that can inform marketing and strategic planning, the analyst reviewed Bureau of Transportation Statistics for calendar year 2023.

The three main airports in New York are John F.Kennedy International Airport (JFK), LaGuardia Airport (LGA) and Newark Liberty International Airport (EWR).

The summary statistics provide important context. The average distance is 977.5 miles, with the data range starting from 0 and extending up to 4,983 miles. A deeper review of the data reveals that JFK is the only originating hub that flew the longest distance of 4,983 miles to Daniel K. Inouye International Airport in Honolulu, Hawaii. The shortest destination was 80 miles to Philadelphia.

The bar graph illustrates that JFK and EWR lead the way in long flight distances compared to LGA, which is indicative of their international airport status and the lighter blue color. LGA mainly traveled shorter distances indicative of the darker blue color on the bar graph. It is important to note that the range in miles is quite large. Therefore, arranging the y-axis in thousands would be beneficial for future visualizations.

Loading the Libraries

library(ggplot2) 
library(dplyr)

Filtering the 4 major Airlines of interest

flights <- flights %>%
filter(carrier %in% c("UA", "DL", "AA", "WN"))

Using group_by to obtain mean, median, minimum and maximum distance

distance_stats <- flights %>%
  group_by(carrier) %>%
  summarize(
    mean_distance = mean(distance, na.rm = TRUE),
    median_distance = median(distance, na.rm = TRUE),
    min_distance = min(distance, na.rm = TRUE),
    max_distance = max(distance, na.rm = TRUE),
    count_flights = n())

Displaying Flight Distances by US Airline: A Boxplot

ggplot(flights, aes(x = carrier, y = distance, fill = carrier)) +
  geom_boxplot() +
  labs(title = "Distribution of Flight Distances by US Airline",
       x = "US Airlines",
       y = "Distance (miles)",
       fill = "US Airlines") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

The box plot diagram supports the review of four major United States air carriers—United Airlines (UA), Delta Airlines (DL), American Airlines (AA), and Southwest Airlines (WN)—and their contributions to travel distances. The data show that DL and UA lead the way with the longest travel distances. Interestingly, both DL and UA have major outliers that support their long travel distances, as depicted by the dots in their respective categories.

Future Analysis

The summary statistics reveal that the 6th month or June experienced the highest travel in the month category. The analyst can explore summer travel destinations in a future study.