Assigment 4

R Markdown

# Load necessary libraries
library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)

# Read the CSV file (use `skip = 1` to skip the first empty row)
arrival_delays <- read_csv("/Users/aribarazzaq/Desktop/A_D.csv")

## New names:
## • `` -> `...1`
## • `` -> `...2`

## Rows: 6 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): ...1, ...2
## dbl (5): Los Angeles, Phoenix, San Diego, San Franciso , Seattle 
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Check the column names
names(arrival_delays)

## [1] "...1"          "...2"          "Los Angeles"   "Phoenix"      
## [5] "San Diego"     "San Franciso " "Seattle "

# View the first few rows of the dataset to check if it is loaded correctly
head(arrival_delays)

# View the column names
names(arrival_delays)

## [1] "...1"          "...2"          "Los Angeles"   "Phoenix"      
## [5] "San Diego"     "San Franciso " "Seattle "

# Fill in the missing airline names
arrival_delays <- arrival_delays %>%
  fill(...1, .direction = "down") # Replace '...1' if the column name is different

# Pivot to long format: transform destination columns into rows
arrival_delays_long <- arrival_delays %>%
  pivot_longer(cols = starts_with("Los Angeles"), # Adjust based on your actual destination column names
               names_to = "Destination",          # This creates a new 'Destination' column
               values_to = "Flights") %>%         # This contains the flight numbers (delayed or on-time)
  rename(Airline = ...1, Status = ...2) %>%       # Rename the columns for better readability
  filter(!is.na(Flights))                         # Filter out any empty rows

# View the tidy dataset
head(arrival_delays_long)

# Summarize on-time and delayed flights for each airline
summary_delays <- arrival_delays_long %>%
  group_by(Airline, Status) %>%
  summarize(Total_Flights = sum(Flights))

## `summarise()` has grouped output by 'Airline'. You can override using the
## `.groups` argument.

# View the summary
print(summary_delays)

## # A tibble: 4 × 3
## # Groups:   Airline [2]
##   Airline  Status   Total_Flights
##   <chr>    <chr>            <dbl>
## 1 AM West  delayed            117
## 2 AM West  on time            694
## 3 Alaska   delayed             62
## 4 Alaska   on time            497

# Load ggplot2 for plotting
library(ggplot2)

# Create a bar plot comparing on-time and delayed flights by destination and airline
ggplot(arrival_delays_long, aes(x = Destination, y = Flights, fill = Status)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ Airline) +
  labs(title = "Comparison of On-time and Delayed Flights by Airline",
       x = "Destination",
       y = "Number of Flights")

Alaska Airlines had 3274 flights which were on-time where as 501 were delayed. This gives Alaska airline a good on-time performance ratio. AM West Airlines had 6438 flights which were on-time where as 787 were delayed, Eventhough AM West has more number of flights, the ratio of on-time to delayed flights is not as favorable as Alaska’s

So Alaska airline will be my choice since it is on time.

Assigment 4

2024-10-08

R Markdown