── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)data("flights")
Understanding the dataset: Descriptive Statistics
summary(flights)
year month day dep_time sched_dep_time
Min. :2023 Min. : 1.000 Min. : 1.00 Min. : 1 Min. : 500
1st Qu.:2023 1st Qu.: 3.000 1st Qu.: 8.00 1st Qu.: 931 1st Qu.: 930
Median :2023 Median : 6.000 Median :16.00 Median :1357 Median :1359
Mean :2023 Mean : 6.423 Mean :15.74 Mean :1366 Mean :1364
3rd Qu.:2023 3rd Qu.: 9.000 3rd Qu.:23.00 3rd Qu.:1804 3rd Qu.:1759
Max. :2023 Max. :12.000 Max. :31.00 Max. :2400 Max. :2359
NA's :10738
dep_delay arr_time sched_arr_time arr_delay
Min. : -50.00 Min. : 1 Min. : 1 Min. : -97.000
1st Qu.: -6.00 1st Qu.:1105 1st Qu.:1135 1st Qu.: -22.000
Median : -2.00 Median :1519 Median :1551 Median : -10.000
Mean : 13.84 Mean :1497 Mean :1552 Mean : 4.345
3rd Qu.: 10.00 3rd Qu.:1946 3rd Qu.:2007 3rd Qu.: 9.000
Max. :1813.00 Max. :2400 Max. :2359 Max. :1812.000
NA's :10738 NA's :11453 NA's :12534
carrier flight tailnum origin
Length:435352 Min. : 1.0 Length:435352 Length:435352
Class :character 1st Qu.: 364.0 Class :character Class :character
Mode :character Median : 734.0 Mode :character Mode :character
Mean : 785.2
3rd Qu.:1188.0
Max. :1972.0
dest air_time distance hour
Length:435352 Min. : 18.0 Min. : 80.0 Min. : 5.00
Class :character 1st Qu.: 77.0 1st Qu.: 479.0 1st Qu.: 9.00
Mode :character Median :121.0 Median : 762.0 Median :13.00
Mean :141.8 Mean : 977.5 Mean :13.35
3rd Qu.:177.0 3rd Qu.:1182.0 3rd Qu.:17.00
Max. :701.0 Max. :4983.0 Max. :23.00
NA's :12534
minute time_hour
Min. : 0.00 Min. :2023-01-01 05:00:00.00
1st Qu.:10.00 1st Qu.:2023-03-30 20:00:00.00
Median :29.00 Median :2023-06-27 08:00:00.00
Mean :28.53 Mean :2023-06-29 10:02:22.39
3rd Qu.:45.00 3rd Qu.:2023-09-27 11:00:00.00
Max. :59.00 Max. :2023-12-31 23:00:00.00
Displaying Flight Distances at New York Airports: Bar Graph
plot1 <- flights |>filter(origin %in%c("JFK", "EWR", "LGA")) |>ggplot() +geom_bar(aes(x=origin, y=air_time, fill = distance),position ="dodge", stat ="identity") +labs(y ="Flight Distance",title ="Exploring New York Airports",subtitle ="Distance Analysis of 2023 Flights", fill ="Legend: Flight Distance",caption ="Source: RITA, Bureau of Transportation Statistics")plot1
Warning: Removed 12534 rows containing missing values or values outside the scale range
(`geom_bar()`).
Summary
The airline industry is a major contributor to the economy. Compared to land transportation, air travel offers efficiency, accessibility, safety, and comfort. Within the United States, New York is a pivotal hub for national and international travel. To gain insights into travel patterns and trends that can inform marketing and strategic planning, the analyst reviewed Bureau of Transportation Statistics for calendar year 2023.
The three main airports in New York are John F.Kennedy International Airport (JFK), LaGuardia Airport (LGA) and Newark Liberty International Airport (EWR).
The summary statistics provide important context. The average distance is 977.5 miles, with the data range starting from 0 and extending up to 4,983 miles. A deeper review of the data reveals that JFK is the only originating hub that flew the longest distance of 4,983 miles to Daniel K. Inouye International Airport in Honolulu, Hawaii. The shortest destination was 80 miles to Philadelphia.
The bar graph illustrates that JFK and EWR lead the way in long flight distances compared to LGA, which is indicative of their international airport status and the lighter blue color. LGA mainly traveled shorter distances indicative of the darker blue color on the bar graph. It is important to note that the range in miles is quite large. Therefore, arranging the y-axis in thousands would be beneficial for future visualizations.
Displaying Flight Distances by US Airline: A Boxplot
ggplot(flights, aes(x = carrier, y = distance, fill = carrier)) +geom_boxplot() +labs(title ="Distribution of Flight Distances by US Airline",x ="US Airlines",y ="Distance (miles)",fill ="US Airlines") +theme_minimal() +theme(plot.title =element_text(hjust =0.5))
The box plot diagram supports the review of four major United States air carriers—United Airlines (UA), Delta Airlines (DL), American Airlines (AA), and Southwest Airlines (WN)—and their contributions to travel distances. The data show that DL and UA lead the way with the longest travel distances. Interestingly, both DL and UA have major outliers that support their long travel distances, as depicted by the dots in their respective categories.
Future Analysis
The summary statistics reveal that the 6th month or June experienced the highest travel in the month category. The analyst can explore summer travel destinations in a future study.