This part of this airline analysis includes an analysis of the top ten domestic airlines control of the market share. This analysis is done using all the flight data from the top ten domestic airlines. An explanation of how the flight data was collected is located on the home page of this project.

The first part of this market share analysis included finding out which one of these airlines controlled the most amount of domestic routes by the number of flights. For this analysis the statement “control a route” is defined as the airline with the most number of flights on that route over the past five years.

# Loading Packages
library(flightplot)
library(tidyverse)
library(stringr)
library(tidyr)
library(dplyr)
library(ggplot2)
library(ggthemr)
library(knitr)

# Loading Datasets
read.csv("alljoined_airlines.csv") -> alljoined
read.csv("airline_key.csv")-> airline_key
read.csv("L_AIRPORT_ID.csv") ->aiport_ids
read.csv("L_AIRPORT.csv") -> aiport_code
read.csv("excel_combined_data.csv") -> excel_combined_data

# Loading Color Theme
ggthemr('dust', type = 'outer')
set_swatch(c("#fc8d27", "#f28422", "#e97c1f", "#df741b", "#d66d18", "#cd6517", "#c45d10", "#ba550f", "#b14d0c", "#a74507", "#9e3d03"))

### --------------------------------------------------------------------------------------------------------------------------------------

## Process of Organizing the Data for Flightplot

# First sort airfare report by counting the number of airport routes
alljoined %>% 
  group_by(ORIGIN_AIRPORT_ID, DEST_AIRPORT_ID, sort = TRUE) %>% 
  count(OP_UNIQUE_CARRIER) %>% 
  arrange(desc(n)) -> test2

# Uniting the Origin Airport & Destination Airport so that they can eventually be counted
alljoined %>% 
  unite('route', ORIGIN_AIRPORT_ID:DEST_AIRPORT_ID, remove = FALSE) -> alljoined_addroute

## IMPORTANT Shows the totals for each route
alljoined_addroute %>% 
  count(route) %>% 
  arrange(desc(n)) -> total_count_by_route

# Combines total_count_by_route with original dataset
alljoined_addroute %>% 
  full_join(total_count_by_route) -> alljoined_expanded

# Counting the number of routes by airlines
alljoined_expanded %>% 
  group_by(route) %>% 
  count(OP_UNIQUE_CARRIER) %>% 
  arrange(desc(n), .by_group = TRUE) %>% 
  summarise(max_n=first(n), min_n=last(n), largest_carrier=first(OP_UNIQUE_CARRIER)) -> carrier_counted

# Seperating back Orgin Aiport & Destination Airport
carrier_counted %>% 
  separate(route, c('ORGIN_AIPORT_ID', "DEST_AIPORT_ID")) ->largest_carrier_data

# Combining the Airport Code & ID
aiport_ids %>% 
  full_join(aiport_code, by = 'Description') -> airport_info

colnames(airport_info)[1] = "ORGIN_AIPORT_ID"

# Merge of Airport Information & Largest Carrier Data in Excel
  # write_csv(largest_carrier_data, "largest_carrier_data.csv")
  # write_csv(airport_info, "aiport_info.csv")


# These Steps use the newly made dataframe which was combined in excel --------------------------------------------------- 
excel_combined_data %>% 
  unite('route', ORGIN_AIPORT_ID:DEST_AIPORT_ID, remove = FALSE) -> route_united_data

#Combining Excel dataset with the total counts by route
route_united_data %>% 
  full_join(total_count_by_route) -> route_united_data2
  
# Arranging the route_united_data
route_united_data2 %>% 
  arrange(desc(n)) -> route_united_arranged
  
# Selecting the top 1000 flight routes
route_united_arranged[1:1000,] -> top_1000_routes
  
# st step in reducing the columns for flightplot
top_1000_routes %>% 
  select(c(ORGIN_CODE, DEST_CODE, largest_carrier, n)) -> reduced_top_1000

# renaming the n column
colnames(reduced_top_1000)[4] = "flights"


Control of Market Share by Route

As shown below the airline that controls the most domestic flight routes is Southwest Airlines. This makes sense because as found in the on-time performance section of this project, Southwest Airlines has the largest number of flights in the past five years.

# Getting Rid of the uneeded rows that mess up domestic calculation
route_united_arranged[1:6404,] -> route_united_arranged_reduced

## Who controls the most routes domestically ------------------------------------
route_united_arranged_reduced %>% 
  group_by(largest_carrier) %>% 
  summarize(count = n()) %>% 
  arrange(desc(count)) -> controls_the_most_routes

# Replacing the Airline Codes with Names 
colnames(controls_the_most_routes)[1] = "OP_UNIQUE_CARRIER"

controls_the_most_routes %>% 
  full_join(airline_key, by = "OP_UNIQUE_CARRIER") -> controls_the_most_routes_names


# Plot showing Number of Routes with Largest Market Share by Airline
ggplot(controls_the_most_routes_names, aes(reorder(airline_name, count), count, fill = airline_name)) + geom_col() + coord_flip() + labs(x = "Airlines", y = "Number of Routes Controlled") + labs(title = "Number of Routes with Largest Market Share by Airline") + theme(legend.position = "none")
# Creating a Table for Number of Routes with Largest Market Share by Airline
colnames(controls_the_most_routes_names)[2] = "number_of_routes"

controls_the_most_routes_names[c("airline_name", "number_of_routes")] %>% 
  arrange(desc(number_of_routes)) -> controls_most_routes_table
kable(controls_most_routes_table)
airline_name number_of_routes
Southwest 1434
Allegiant 1333
Delta 840
American 788
United 769
Frontier 463
JetBlue 305
Alaska 260
Spirit 168
Hawaiian 44


Control of Market Share by Top 1000 Routes

While the overall domestic flight routes was an important statistic, limiting the analysis to the top 1000 flights by number of flights helped to paint a more influential picture. While Southwest Airlines was still in the lead, there was a significant finding with Allegiant Airlines. In the overall flight routes Allegiant controlled the second most routes while they control zero of the top 1000 routes. This is a interesting result, but can potentially be explained by their business model of flying to airports where other airlines do not typically fly.

## Who controls the most routes of the top 1000 --------------------------------------
reduced_top_1000 %>% 
  group_by(largest_carrier) %>% 
  summarize(count = n()) %>% 
  arrange(desc(count)) -> controls_the_most_routes_1000

# Replacing the Airline Codes with Names 
colnames(controls_the_most_routes_1000)[1] = "OP_UNIQUE_CARRIER"

controls_the_most_routes_1000 %>% 
  full_join(airline_key, by = "OP_UNIQUE_CARRIER") -> controls_the_most_routes_1000_names

controls_the_most_routes_1000_omit <- controls_the_most_routes_1000_names %>%  filter(!is.na(count))

# Plot showing Number of Routes with Largest Market Share Top 1000 Routes by Airline
ggplot(controls_the_most_routes_1000_omit, aes(reorder(airline_name, count), count, fill = airline_name)) + geom_col() + coord_flip() + labs(x = "Airlines", y = "Number of Routes Controlled") + labs(title = "Number of Routes with Largest Market Share Top 1000 Routes") + theme(legend.position = "none")
# Creating a Table for Number of Routes with Largest Market Share Top 1000 Routes by Airline
colnames(controls_the_most_routes_1000_names)[2] = "number_of_routes"

controls_the_most_routes_1000_names[c("airline_name", "number_of_routes")] %>% 
  arrange(desc(number_of_routes)) -> controls_most_routes_1000_table
kable(controls_most_routes_1000_table)
airline_name number_of_routes
Southwest 307
Delta 238
American 226
United 142
Alaska 39
JetBlue 36
Hawaiian 8
Frontier 2
Spirit 2
Allegiant NA



# Writing the theme for flightplot
ian_theme <- theme(axis.text.y = element_blank(), 
      axis.text.x = element_blank(), 
      axis.ticks = element_blank(), 
      panel.grid = element_blank(), 
      panel.background = element_rect(fill = '#faf7f2'),
      plot.background = element_rect(fill = '#faf7f2'))


# Southwest Analysis
subset(reduced_top_1000, largest_carrier == 'WN') -> southwest

southwest[,-3:-4] -> southwest_flightplot

plot_flights(
  southwest_flightplot,
  crop = TRUE,
  land_color = "#f6e8c3",
  water_color = "aliceblue",
  dom_color = "#000066",
  int_color = "#000066",
  alpha = 0.8,
  times_as_thickness = FALSE
) + ian_theme


# American Flightplot
subset(reduced_top_1000, largest_carrier == 'AA') -> american

american[,-3:-4] -> american_flightplot

plot_flights(
  american_flightplot,
  crop = TRUE,
  land_color = "#f6e8c3",
  water_color = "aliceblue",
  dom_color = "#000066",
  int_color = "#000066",
  alpha = 0.8,
  times_as_thickness = FALSE
) + ian_theme


# Jetblue Flightplot
subset(reduced_top_1000, largest_carrier == 'B6') -> jetblue

jetblue[,-3:-4] -> jetblue_flightplot

plot_flights(
  jetblue_flightplot,
  crop = TRUE,
  land_color = "#f6e8c3",
  water_color = "aliceblue",
  dom_color = "#000066",
  int_color = "#000066",
  alpha = 0.8,
  times_as_thickness = FALSE
) + ian_theme


# Alaska Flightplot
subset(reduced_top_1000, largest_carrier == 'AS') -> alaska

alaska[,-3:-4] -> alaska_flightplot

plot_flights(
  alaska_flightplot,
  crop = TRUE,
  land_color = "#f6e8c3",
  water_color = "aliceblue",
  dom_color = "#000066",
  int_color = "#000066",
  alpha = 0.8,
  times_as_thickness = FALSE
) + ian_theme



# Delta Flightplot
subset(reduced_top_1000, largest_carrier == 'DL') -> delta

delta[,-3:-4] -> delta_flightplot

plot_flights(
  delta_flightplot,
  crop = TRUE,
  land_color = "#f6e8c3",
  water_color = "aliceblue",
  dom_color = "#000066",
  int_color = "#000066",
  alpha = 0.75,
  times_as_thickness = FALSE
) + ian_theme


# Hawaiian Flightplot
subset(reduced_top_1000, largest_carrier == 'HA') -> hawaiian

hawaiian[,-3:-4] -> hawaiian_flightplot

      #replacing HIK with HNL
replace(hawaiian_flightplot$ORGIN_CODE,hawaiian_flightplot$ORGIN_CODE == "HIK", "HNL") -> hawaiian_flightplot$ORGIN_CODE
replace(hawaiian_flightplot$DEST_CODE,hawaiian_flightplot$DEST_CODE == "HIK", "HNL") -> hawaiian_flightplot$DEST_CODE


plot_flights(
  hawaiian_flightplot,
  crop = TRUE,
  land_color = "#f6e8c3",
  water_color = "aliceblue",
  dom_color = "#000066",
  int_color = "#000066",
  alpha = 0.8,
  times_as_thickness = FALSE
) + ian_theme


# Frontier Flightplot
subset(reduced_top_1000, largest_carrier == 'F9') -> frontier

frontier[,-3:-4] -> frontier_flightplot

plot_flights(
  frontier_flightplot,
  crop = '48States',
  land_color = "#f6e8c3",
  water_color = "aliceblue",
  dom_color = "#000066",
  int_color = "#000066",
  alpha = 0.8,
  times_as_thickness = FALSE
) + ian_theme


# Spirit Flightplot
subset(reduced_top_1000, largest_carrier == 'NK') -> spirit

spirit[,-3:-4] -> spirit_flightplot

plot_flights(
  spirit_flightplot,
  crop = '48States',
  land_color = "#f6e8c3",
  water_color = "aliceblue",
  dom_color = "#000066",
  int_color = "#000066",
  alpha = 0.8,
  times_as_thickness = FALSE
) + ian_theme



# United FLightplot
subset(reduced_top_1000, largest_carrier == 'UA') -> united

united[,-3:-4] -> united_flightplot

plot_flights(
  united_flightplot,
  crop = TRUE,
  land_color = "#f6e8c3",
  water_color = "aliceblue",
  dom_color = "#000066",
  int_color = "#000066",
  alpha = 0.8,
  times_as_thickness = FALSE
) + ian_theme

Flighplot Analysis

The second part of this analysis includes using the R package called flightplot to visualize which top 1000 routes the airlines control.


Southwest Airlines

Southwest controls the most top 1000 routes and are spread out across the contiguous United States. While there are some cities that have lots of Southwest controlled routes such as Orlando (MCO) and Denver (DEN) there are not any surprising findings shown in this flightplot.


Delta Airlines

Delta Airlines controls the second most top 1000 routes which are again spread out across the contiguous United States. The most important finding from this flightplot is with Delta’s megahub airport in Atlanta (ATL). Not only is it visibly where Delta controls lots of flights, but it make up a significant portion of their top 1000 routes. As shown in the figure below, Delta controls 95% of all flight routes involving Atlanta (ATL). This is an astonishingly high number when considering that Delta controls not only many of the smaller routes, but also the routes to other cities that are considered hubs or focus cities for other airlines. Great examples of this include Alaska Airlines primary hub in Seattle (SEA), American’s primary hub in Dallas/Fort Worth (DFW), United Airlines primary hub at Chicago-O’Hare (ORD), Frontier’s operating base in Denver (DEN), and Jetblue’s operating base in New York - Kennedy (JFK).


Control of Routes involving Atlanta (ATL)

kable(atlanta_arranged)
airline number_of_flights percent
delta 288 0.9536424
southwest 6 0.0198675
frontier 6 0.0198675
spirit 2 0.0066225


American Airlines

American Airlines controls the third most top 1000 routes and is spread out across the contiguous United States, Hawaii, and Puerto Rico. While there is a focus on Dallas/Fort Worth (DFW), there are no surprising findings from this flightplot except for that American controls two top 1000 flights to Hawaii that Hawaiian Airlines does not.


United Airlines

United Airlines controls the fourth most top 1000 routes and is spread out across the contiguous United States, Hawaii, and Puerto Rico. This flightplot reveals that United controls lots of the longer cross country routes as well as two top 1000 flights to Hawaii that Hawaiian Airlines does not.


Alaska Airlines

Alaska Airlines controls the fifth most top 1000 routes and is relatively focused on the western region of the United States and Alaska except for a few outliers of Boston (BOS), Chicago-O’hare (ORD), and Kahului, Hawaii (OGG). Every single flight they control of the top 1000 flights involves their primary hub in Seattle (SEA). Again like American and United Airlines, they control a flight to the Hawaiian Islands.


Jetblue Airlines

Jetblue Airlines controls the sixth most top 1000 routes and is generally focused in the eastern coast of the United States except for outliers of Los Angeles (LAX), Las Vegas (LAS), and Puerto Rico (SJU).


Hawaiian Airlines

Hawaiian Airlines controls the seventh most top 1000 routes and all take place in the Hawaiian Islands. Something interesting is flights between Honolulu (HNL) and Kahului (OGG) are the second busiest flight route in the United States by number of flights only behind Los Angeles (LAX) and New York - Kennedy (JFK). An interesting takeaway from this flightplot is what is not shown, which is any flights from the Hawaiian Islands to the mainland United States. Three other airlines control five different flight routes in the top 1000 while Hawaiian Airlines controls none.


Frontier Airlines

Frontier Airlines controls only one top 1000 route which is from Cleveland (CLE) to Orlando (MCO). There is no found specific reason why they control this route as they have a hub based out of Denver (DEN).


Spirit Airlines

Spirit Airlines controls only one top 1000 route which is from New Orleans (MSY) to Fort Lauderdale (FLL). There is no found specific reason why they control this route.


Conclusion of Market Share

Overall there were some interesting findings throughout this Market Share analysis. In conjunction with the other parts of this project there is no clear standout except for that Southwest Airlines controls the most amount of routes which makes sense based on their highest number of flights. One other significant finding that ties in with other parts of the project, is the continued theme that Allegiant is an outlier and unique airline in how it is operated which is supported by the on-time performance analysis as well. Delta Airlines, who has the best on-time performance, controls the second most top 1000 routes as well as 95% of Atlanta’s routes.

To continue to the next analysis of on-time performance click here.