Introduction

The purpose of this analysis is to explore airline delay patterns using visualization techniques in R. By analyzing carrier delay, ATC delay, and weather delay across different airlines, we aim to:

  1. Identify which airlines experience the most delays and their causes.
  2. Examine the relationship between different types of delays (e.g., carrier delays vs. ATC delays).
  3. Gain insights into the performance of airlines in terms of punctuality.

This analysis helps in understanding delay trends and potential areas for improvement in airline operations.

Load Necessary Libraries

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data <- read.csv("airline_stats.csv", header = TRUE, stringsAsFactors = TRUE)
head(data)
##   pct_carrier_delay pct_atc_delay pct_weather_delay  airline
## 1          8.153226      1.971774         0.7620968 American
## 2          5.959924      3.706107         1.5858779 American
## 3          7.157270      2.706231         2.0267062 American
## 4         12.100000     11.033333         0.0000000 American
## 5          7.333333      3.365591         1.7741935 American
## 6          6.163889      3.225000         0.9750000 American
avg_delay <- data %>% 
  group_by(airline) %>%
  summarise(avg_carrier_delay = mean(pct_carrier_delay, na.rm = TRUE))

print(avg_delay)
## # A tibble: 6 × 2
##   airline   avg_carrier_delay
##   <fct>                 <dbl>
## 1 Alaska                 3.52
## 2 American               9.04
## 3 Delta                  6.33
## 4 Jet Blue               8.08
## 5 Southwest              7.52
## 6 United                 7.40
## Bar plot for average carrier delay
ggplot(avg_delay, aes(x = reorder(airline, -avg_carrier_delay), y = avg_carrier_delay, fill = airline)) +
  geom_bar(stat = "identity") +
  labs(title = "Average Carrier Delay by Airline",
       x = "Airline",
       y = "Average Carrier Delay (%)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

## Scatter plot to visualize correlation between carrier delay and ATC delay
ggplot(data, aes(x = pct_carrier_delay, y = pct_atc_delay, color = airline)) +
  geom_point(alpha = 0.6, size = 3) +  # Larger points
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") + # Add trend line
  labs(title = "Carrier Delay vs. ATC Delay",
       x = "Carrier Delay (%)",
       y = "ATC Delay (%)") +
  theme_minimal() +
  theme(legend.position = "bottom",
        text = element_text(size = 16))
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 28 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).