How do Delta Airlines’ operating regions compare in terms of revenue versus load factors?
The U.S. commercial airline industry is one of the most diverse, dynamic, and perplexing in the world. The Airline Data Project (ADP) was established by the MIT Global Airline Industry Program to better understand the opportunities, risks, and challenges facing this industry. The data in this specific data set describes the traffic, capacity, and revenue for each operating region within Delta Airlines from 1995 to 2017. This data set was found on the ADP website1.
The U.S. commercial airline industry is one of the most diverse, dynamic, and perplexing in the world
This research question is important to analyze because airlines are very driven around revenue generation. One of the main applications of data science in the airline industry is revenue management2.
Revenue management is the application of data and analytics aimed at defining how to sell a product to those who need it, at a reasonable cost at the right time and using the right channel2.
Revenue management specialists, in the context of airlines, define destinations and adjust prices for specific markets, find efficient distribution channels, and manage seats to keep the airline simultaneously competitive and customer-friendly2.
Data scientist Konstantin Vandyshev, who worked at Transavia’s Revenue Management department, stresses that data science disciplines come in handy for achieving revenue management tasks2. He uses data analysis to find:
the demand for certain flight routes (inspiration for looking at revenue), and
the marginal seat revenue (inspiration for looking at load factors)
In this context, load factors will be defined by:
\[\frac{RPM}{ASM} * 100\]
Revenue Passenger Miles (RPM): this figure is the number of an airline’s available seats that were actually sold. For example, if 200 passengers fly 500 miles on a flight, 100,000 RPMs are generated3.
Available Seat Miles (ASM): this figure refers to one aircraft seat flown one mile, whether occupied or not. For example, an aircraft with 100 passenger seats that is flown a distance of 100 miles generates 10,000 ASMs3.
Essentially, the load factor metric answers the question “how full is the airplane with passengers?”
It is interesting to compare raw revenue values for each operating region as compared with load factors. Delta Airlines may be generating the most revenue from a certain operating region, but are those flights running efficiently in terms of passenger load factors? Is there an operating region that is not as profitable right now but that is running more efficient operations?
Below is a list of Delta Airlines’ operating regions.
regions <- unique(delta_data$Region)
Delta Airlines’ operating regions are: Domestic, International, Atlantic, Pacific, and Latin America.
Below is the average passenger revenue from 1995 to 2017.
avg_pass_rev <- mean(delta_data$Pass_Rev)
The average passenger revenue from 1995 to 2017 is $6,552,181.
Below is the average total revenue from 1995 to 2017.
avg_tot_rev <- mean(delta_data$Total_Rev)
The average total revenue from 1995 to 2017 is $8,889,823. It makes sense this value is higher than the passenger revenue value because it includes more types of revenue fees.
To calculate the load factor, a new column will be introduced that will follow the following equation:
\[\frac{RPM}{ASM} * 100\]
delta_data$Load_Factor <- (delta_data$Rev_Pass_Miles / delta_data$Avail_Seat_Miles) * 100
Below is the average load factor from 1995 to 2017.
avg_load_fctr <- mean(delta_data$Load_Factor)
The average load factor from 1995 to 2017 is 77.99%.
Below is a graph of the total revenue for each year from 1995 to 2017.
years <- unique(delta_data$Year)
tot_rev <- delta_data[delta_data$Region == "Total", "Total_Rev"]
frame <- data.frame(years, tot_rev)
ggplot(frame, aes(years, tot_rev)) +
geom_point() +
geom_line(color = "blue") +
theme_minimal() +
labs(y = "Total Revenue ($)",
x = "Year",
title = "Total Revenue from 1995 to 2017") +
theme(plot.title = element_text(hjust = 0.5))
This graph shows that there was a large spike in total revenue starting around 2010, and that trend has only increased since then.
Below is a graph of the average total revenue for each operating region.
group1 <- delta_data %>% group_by(Region)
frame1 <- as.data.frame(group1 %>% summarize(rev = mean(Total_Rev)))[-6,]
ggplot(frame1, aes(Region, rev)) +
geom_bar(stat = "identity",
fill = c("dark green", "maroon", "dark blue", "gold", "orange")) +
theme_minimal() +
labs(y = "Total Revenue ($)",
x = "Operating Region",
title = "Average Total Revenue by Operating Region") +
theme(plot.title = element_text(hjust = 0.5))
This graph shows that most of Delta’s total revenue comes from their Domestic operating region, followed by International and Atlantic.
Below is a graph of the average load factor for each year from 1995 to 2017.
group2 <- delta_data %>% group_by(Year)
frame2 <- as.data.frame(group2 %>% summarize(lf = mean(Load_Factor)))
ggplot(frame2, aes(Year, lf)) +
geom_point() +
geom_line(color = "blue") +
theme_minimal() +
labs(y = "Load Factor") +
labs(y = "Load Factor (%)",
x = "Year",
title = "Average Load Factor from 1995 to 2017") +
theme(plot.title = element_text(hjust = 0.5))
This graph shows that Delta’s load factors have increased greatly since 2001, which is interesting to note because the 9/11 terrorist attacks happened in 2001. One might think that the number of people traveling (which can be measured by load factors) would have decreased after 2001 for a bit, but that is not the case.
Below is a graph of the average load factor for each region.
group3 <- delta_data %>% group_by(Region)
frame3 <- as.data.frame(group3 %>% summarize(rev = mean(Load_Factor)))[-6,]
ggplot(frame3, aes(Region, rev)) +
geom_bar(stat = "identity",
fill = c("dark green", "maroon", "dark blue", "gold", "orange")) +
theme_minimal() +
labs(y = "Load Factor (%)",
x = "Operating Region",
title = "Average Load Factor by Operating Region") +
theme(plot.title = element_text(hjust = 0.5))
This graph shows that all operating regions have a load factor of around 80%, although Atlantic has the highest load factor and Latin America has the lowest.
To reiterate, the research question is: How do Delta Airlines’ operating regions compare in terms of revenue versus load factors?
To answer the research question, data visualization techniques are used.
tot_rev_graph <- ggplot(delta_data, aes(x = Year, y = Total_Rev, color = Region)) +
geom_line() +
geom_point() +
scale_x_continuous(limits = c(1995, 2017), breaks = 1995:2017) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
panel.grid.major = element_line(color = "light gray"),
panel.grid.minor = element_line(color = "light gray"),
panel.background = element_rect(fill = "white"),
plot.title = element_text(hjust = 0.5)) +
labs(title = "Delta Airlines' Total Revenue \nby Operating Region \nfrom 1995 to 2017",
y = "Total Revenue ($)")
tot_rev_graph
This graph shows that the Domestic operating region has been Delta’s leader in driving total revenue since 1995. Specifically, below is an ordered list (from highest to lowest) of operating regions contributing to total revenue.
load_fctrs_graph <- ggplot(delta_data, aes(x = Year, y = Load_Factor, color = Region)) +
geom_line() +
geom_point() +
scale_x_continuous(limits = c(1995, 2017), breaks = 1995:2017) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
panel.grid.major = element_line(color = "light gray"),
panel.grid.minor = element_line(color = "light gray"),
panel.background = element_rect(fill = "white"),
plot.title = element_text(hjust = 0.5)) +
labs(title = "Delta Airlines' Load Factors \nby Operating Region \nfrom 1995 to 2017",
y = "Load Factor (%)")
load_fctrs_graph
This graph is a bit more complex to understand. One major trend across all years is that the Latin America operating region has consistently had the lowest load factors. Generally, the load factor order (from highest to lowest) is Pacific, Atlantic, International, Domestic, and Latin America. However, based on the latest data from 2017, below is an ordered list (from highest to lowest) of load factors by operating region.
To answer the research question, below is a summary chart that compares Delta’s operating regions in terms of their contribution to revenue and load factors. (the chart is listed from highest to lowest)
By Revenue | By Load Factor |
---|---|
Domestic | Domestic |
International | Latin America |
Atlantic | Pacifc |
Pacific | Internatioal |
Latin America | Atlantic |
This chart shows that some operating regions, although not generating as much revenue, still fly efficient flights (i.e. they have high load factors).
For example, Latin America generates the lowest amount of revenue but has the second-highest load factor.
This is important information for Delta, as well as other airlines, to consider.
Current Limitations
Additional Analysis