This part of this airline analysis includes an analysis of the top ten domestic airlines control of the market share. This analysis is done using all the flight data from the top ten domestic airlines. An explanation of how the flight data was collected is located on the home page of this project.
The first part of this market share analysis included finding out which one of these airlines controlled the most amount of domestic routes by the number of flights. For this analysis the statement “control a route” is defined as the airline with the most number of flights on that route over the past five years.
# Loading Packages
library(flightplot)
library(tidyverse)
library(stringr)
library(tidyr)
library(dplyr)
library(ggplot2)
library(ggthemr)
library(knitr)
# Loading Datasets
read.csv("alljoined_airlines.csv") -> alljoined
read.csv("airline_key.csv")-> airline_key
read.csv("L_AIRPORT_ID.csv") ->aiport_ids
read.csv("L_AIRPORT.csv") -> aiport_code
read.csv("excel_combined_data.csv") -> excel_combined_data
# Loading Color Theme
ggthemr('dust', type = 'outer')
set_swatch(c("#fc8d27", "#f28422", "#e97c1f", "#df741b", "#d66d18", "#cd6517", "#c45d10", "#ba550f", "#b14d0c", "#a74507", "#9e3d03"))
### --------------------------------------------------------------------------------------------------------------------------------------
## Process of Organizing the Data for Flightplot
# First sort airfare report by counting the number of airport routes
alljoined %>%
group_by(ORIGIN_AIRPORT_ID, DEST_AIRPORT_ID, sort = TRUE) %>%
count(OP_UNIQUE_CARRIER) %>%
arrange(desc(n)) -> test2
# Uniting the Origin Airport & Destination Airport so that they can eventually be counted
alljoined %>%
unite('route', ORIGIN_AIRPORT_ID:DEST_AIRPORT_ID, remove = FALSE) -> alljoined_addroute
## IMPORTANT Shows the totals for each route
alljoined_addroute %>%
count(route) %>%
arrange(desc(n)) -> total_count_by_route
# Combines total_count_by_route with original dataset
alljoined_addroute %>%
full_join(total_count_by_route) -> alljoined_expanded
# Counting the number of routes by airlines
alljoined_expanded %>%
group_by(route) %>%
count(OP_UNIQUE_CARRIER) %>%
arrange(desc(n), .by_group = TRUE) %>%
summarise(max_n=first(n), min_n=last(n), largest_carrier=first(OP_UNIQUE_CARRIER)) -> carrier_counted
# Seperating back Orgin Aiport & Destination Airport
carrier_counted %>%
separate(route, c('ORGIN_AIPORT_ID', "DEST_AIPORT_ID")) ->largest_carrier_data
# Combining the Airport Code & ID
aiport_ids %>%
full_join(aiport_code, by = 'Description') -> airport_info
colnames(airport_info)[1] = "ORGIN_AIPORT_ID"
# Merge of Airport Information & Largest Carrier Data in Excel
# write_csv(largest_carrier_data, "largest_carrier_data.csv")
# write_csv(airport_info, "aiport_info.csv")
# These Steps use the newly made dataframe which was combined in excel ---------------------------------------------------
excel_combined_data %>%
unite('route', ORGIN_AIPORT_ID:DEST_AIPORT_ID, remove = FALSE) -> route_united_data
#Combining Excel dataset with the total counts by route
route_united_data %>%
full_join(total_count_by_route) -> route_united_data2
# Arranging the route_united_data
route_united_data2 %>%
arrange(desc(n)) -> route_united_arranged
# Selecting the top 1000 flight routes
route_united_arranged[1:1000,] -> top_1000_routes
# st step in reducing the columns for flightplot
top_1000_routes %>%
select(c(ORGIN_CODE, DEST_CODE, largest_carrier, n)) -> reduced_top_1000
# renaming the n column
colnames(reduced_top_1000)[4] = "flights"
As shown below the airline that controls the most domestic flight routes is Southwest Airlines. This makes sense because as found in the on-time performance section of this project, Southwest Airlines has the largest number of flights in the past five years.
# Getting Rid of the uneeded rows that mess up domestic calculation
route_united_arranged[1:6404,] -> route_united_arranged_reduced
## Who controls the most routes domestically ------------------------------------
route_united_arranged_reduced %>%
group_by(largest_carrier) %>%
summarize(count = n()) %>%
arrange(desc(count)) -> controls_the_most_routes
# Replacing the Airline Codes with Names
colnames(controls_the_most_routes)[1] = "OP_UNIQUE_CARRIER"
controls_the_most_routes %>%
full_join(airline_key, by = "OP_UNIQUE_CARRIER") -> controls_the_most_routes_names
# Plot showing Number of Routes with Largest Market Share by Airline
ggplot(controls_the_most_routes_names, aes(reorder(airline_name, count), count, fill = airline_name)) + geom_col() + coord_flip() + labs(x = "Airlines", y = "Number of Routes Controlled") + labs(title = "Number of Routes with Largest Market Share by Airline") + theme(legend.position = "none")
# Creating a Table for Number of Routes with Largest Market Share by Airline
colnames(controls_the_most_routes_names)[2] = "number_of_routes"
controls_the_most_routes_names[c("airline_name", "number_of_routes")] %>%
arrange(desc(number_of_routes)) -> controls_most_routes_table
kable(controls_most_routes_table)
| airline_name | number_of_routes |
|---|---|
| Southwest | 1434 |
| Allegiant | 1333 |
| Delta | 840 |
| American | 788 |
| United | 769 |
| Frontier | 463 |
| JetBlue | 305 |
| Alaska | 260 |
| Spirit | 168 |
| Hawaiian | 44 |
While the overall domestic flight routes was an important statistic, limiting the analysis to the top 1000 flights by number of flights helped to paint a more influential picture. While Southwest Airlines was still in the lead, there was a significant finding with Allegiant Airlines. In the overall flight routes Allegiant controlled the second most routes while they control zero of the top 1000 routes. This is a interesting result, but can potentially be explained by their business model of flying to airports where other airlines do not typically fly.
## Who controls the most routes of the top 1000 --------------------------------------
reduced_top_1000 %>%
group_by(largest_carrier) %>%
summarize(count = n()) %>%
arrange(desc(count)) -> controls_the_most_routes_1000
# Replacing the Airline Codes with Names
colnames(controls_the_most_routes_1000)[1] = "OP_UNIQUE_CARRIER"
controls_the_most_routes_1000 %>%
full_join(airline_key, by = "OP_UNIQUE_CARRIER") -> controls_the_most_routes_1000_names
controls_the_most_routes_1000_omit <- controls_the_most_routes_1000_names %>% filter(!is.na(count))
# Plot showing Number of Routes with Largest Market Share Top 1000 Routes by Airline
ggplot(controls_the_most_routes_1000_omit, aes(reorder(airline_name, count), count, fill = airline_name)) + geom_col() + coord_flip() + labs(x = "Airlines", y = "Number of Routes Controlled") + labs(title = "Number of Routes with Largest Market Share Top 1000 Routes") + theme(legend.position = "none")
# Creating a Table for Number of Routes with Largest Market Share Top 1000 Routes by Airline
colnames(controls_the_most_routes_1000_names)[2] = "number_of_routes"
controls_the_most_routes_1000_names[c("airline_name", "number_of_routes")] %>%
arrange(desc(number_of_routes)) -> controls_most_routes_1000_table
kable(controls_most_routes_1000_table)
| airline_name | number_of_routes |
|---|---|
| Southwest | 307 |
| Delta | 238 |
| American | 226 |
| United | 142 |
| Alaska | 39 |
| JetBlue | 36 |
| Hawaiian | 8 |
| Frontier | 2 |
| Spirit | 2 |
| Allegiant | NA |
# Writing the theme for flightplot
ian_theme <- theme(axis.text.y = element_blank(),
axis.text.x = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
panel.background = element_rect(fill = '#faf7f2'),
plot.background = element_rect(fill = '#faf7f2'))
# Southwest Analysis
subset(reduced_top_1000, largest_carrier == 'WN') -> southwest
southwest[,-3:-4] -> southwest_flightplot
plot_flights(
southwest_flightplot,
crop = TRUE,
land_color = "#f6e8c3",
water_color = "aliceblue",
dom_color = "#000066",
int_color = "#000066",
alpha = 0.8,
times_as_thickness = FALSE
) + ian_theme
# American Flightplot
subset(reduced_top_1000, largest_carrier == 'AA') -> american
american[,-3:-4] -> american_flightplot
plot_flights(
american_flightplot,
crop = TRUE,
land_color = "#f6e8c3",
water_color = "aliceblue",
dom_color = "#000066",
int_color = "#000066",
alpha = 0.8,
times_as_thickness = FALSE
) + ian_theme
# Jetblue Flightplot
subset(reduced_top_1000, largest_carrier == 'B6') -> jetblue
jetblue[,-3:-4] -> jetblue_flightplot
plot_flights(
jetblue_flightplot,
crop = TRUE,
land_color = "#f6e8c3",
water_color = "aliceblue",
dom_color = "#000066",
int_color = "#000066",
alpha = 0.8,
times_as_thickness = FALSE
) + ian_theme
# Alaska Flightplot
subset(reduced_top_1000, largest_carrier == 'AS') -> alaska
alaska[,-3:-4] -> alaska_flightplot
plot_flights(
alaska_flightplot,
crop = TRUE,
land_color = "#f6e8c3",
water_color = "aliceblue",
dom_color = "#000066",
int_color = "#000066",
alpha = 0.8,
times_as_thickness = FALSE
) + ian_theme
# Delta Flightplot
subset(reduced_top_1000, largest_carrier == 'DL') -> delta
delta[,-3:-4] -> delta_flightplot
plot_flights(
delta_flightplot,
crop = TRUE,
land_color = "#f6e8c3",
water_color = "aliceblue",
dom_color = "#000066",
int_color = "#000066",
alpha = 0.75,
times_as_thickness = FALSE
) + ian_theme
# Hawaiian Flightplot
subset(reduced_top_1000, largest_carrier == 'HA') -> hawaiian
hawaiian[,-3:-4] -> hawaiian_flightplot
#replacing HIK with HNL
replace(hawaiian_flightplot$ORGIN_CODE,hawaiian_flightplot$ORGIN_CODE == "HIK", "HNL") -> hawaiian_flightplot$ORGIN_CODE
replace(hawaiian_flightplot$DEST_CODE,hawaiian_flightplot$DEST_CODE == "HIK", "HNL") -> hawaiian_flightplot$DEST_CODE
plot_flights(
hawaiian_flightplot,
crop = TRUE,
land_color = "#f6e8c3",
water_color = "aliceblue",
dom_color = "#000066",
int_color = "#000066",
alpha = 0.8,
times_as_thickness = FALSE
) + ian_theme
# Frontier Flightplot
subset(reduced_top_1000, largest_carrier == 'F9') -> frontier
frontier[,-3:-4] -> frontier_flightplot
plot_flights(
frontier_flightplot,
crop = '48States',
land_color = "#f6e8c3",
water_color = "aliceblue",
dom_color = "#000066",
int_color = "#000066",
alpha = 0.8,
times_as_thickness = FALSE
) + ian_theme
# Spirit Flightplot
subset(reduced_top_1000, largest_carrier == 'NK') -> spirit
spirit[,-3:-4] -> spirit_flightplot
plot_flights(
spirit_flightplot,
crop = '48States',
land_color = "#f6e8c3",
water_color = "aliceblue",
dom_color = "#000066",
int_color = "#000066",
alpha = 0.8,
times_as_thickness = FALSE
) + ian_theme
# United FLightplot
subset(reduced_top_1000, largest_carrier == 'UA') -> united
united[,-3:-4] -> united_flightplot
plot_flights(
united_flightplot,
crop = TRUE,
land_color = "#f6e8c3",
water_color = "aliceblue",
dom_color = "#000066",
int_color = "#000066",
alpha = 0.8,
times_as_thickness = FALSE
) + ian_theme
Delta Airlines controls the second most top 1000 routes which are again spread out across the contiguous United States. The most important finding from this flightplot is with Delta’s megahub airport in Atlanta (ATL). Not only is it visibly where Delta controls lots of flights, but it make up a significant portion of their top 1000 routes. As shown in the figure below, Delta controls 95% of all flight routes involving Atlanta (ATL). This is an astonishingly high number when considering that Delta controls not only many of the smaller routes, but also the routes to other cities that are considered hubs or focus cities for other airlines. Great examples of this include Alaska Airlines primary hub in Seattle (SEA), American’s primary hub in Dallas/Fort Worth (DFW), United Airlines primary hub at Chicago-O’Hare (ORD), Frontier’s operating base in Denver (DEN), and Jetblue’s operating base in New York - Kennedy (JFK).
kable(atlanta_arranged)
| airline | number_of_flights | percent |
|---|---|---|
| delta | 288 | 0.9536424 |
| southwest | 6 | 0.0198675 |
| frontier | 6 | 0.0198675 |
| spirit | 2 | 0.0066225 |
Overall there were some interesting findings throughout this Market Share analysis. In conjunction with the other parts of this project there is no clear standout except for that Southwest Airlines controls the most amount of routes which makes sense based on their highest number of flights. One other significant finding that ties in with other parts of the project, is the continued theme that Allegiant is an outlier and unique airline in how it is operated which is supported by the on-time performance analysis as well. Delta Airlines, who has the best on-time performance, controls the second most top 1000 routes as well as 95% of Atlanta’s routes.
To continue to the next analysis of on-time performance click here.