Research Question

How do Delta Airlines’ operating regions compare in terms of revenue versus load factors?

Background Information

The U.S. commercial airline industry is one of the most diverse, dynamic, and perplexing in the world. The Airline Data Project (ADP) was established by the MIT Global Airline Industry Program to better understand the opportunities, risks, and challenges facing this industry. The data in this specific data set describes the traffic, capacity, and revenue for each operating region within Delta Airlines from 1995 to 2017. This data set was found on the ADP website1.

The U.S. commercial airline industry is one of the most diverse, dynamic, and perplexing in the world

This research question is important to analyze because airlines are very driven around revenue generation. One of the main applications of data science in the airline industry is revenue management2.

  • Revenue management is the application of data and analytics aimed at defining how to sell a product to those who need it, at a reasonable cost at the right time and using the right channel2.

  • Revenue management specialists, in the context of airlines, define destinations and adjust prices for specific markets, find efficient distribution channels, and manage seats to keep the airline simultaneously competitive and customer-friendly2.

Data scientist Konstantin Vandyshev, who worked at Transavia’s Revenue Management department, stresses that data science disciplines come in handy for achieving revenue management tasks2. He uses data analysis to find:

  • the demand for certain flight routes (inspiration for looking at revenue), and

  • the marginal seat revenue (inspiration for looking at load factors)

In this context, load factors will be defined by:

\[\frac{RPM}{ASM} * 100\]

  • Revenue Passenger Miles (RPM): this figure is the number of an airline’s available seats that were actually sold. For example, if 200 passengers fly 500 miles on a flight, 100,000 RPMs are generated3.

  • Available Seat Miles (ASM): this figure refers to one aircraft seat flown one mile, whether occupied or not. For example, an aircraft with 100 passenger seats that is flown a distance of 100 miles generates 10,000 ASMs3.

Essentially, the load factor metric answers the question “how full is the airplane with passengers?”

It is interesting to compare raw revenue values for each operating region as compared with load factors. Delta Airlines may be generating the most revenue from a certain operating region, but are those flights running efficiently in terms of passenger load factors? Is there an operating region that is not as profitable right now but that is running more efficient operations?

Exploratory Data Analysis

Summary Statistics

Below is a list of Delta Airlines’ operating regions.

regions <- unique(delta_data$Region)

Delta Airlines’ operating regions are: Domestic, International, Atlantic, Pacific, and Latin America.




Below is the average passenger revenue from 1995 to 2017.

  • Passenger revenue includes revenue from the air transportation of passengers only. This figure does not include revenue from cargo or ancillary fees. Examples of ancillary fees are checked baggage fees, overweight baggage fees, food and beverages fees, etc.
avg_pass_rev <- mean(delta_data$Pass_Rev)

The average passenger revenue from 1995 to 2017 is $6,552,181.




Below is the average total revenue from 1995 to 2017.

  • Total revenue includes revenue from passengers, cargo, and ancillary fees. Examples of ancillary fees are described above.
avg_tot_rev <- mean(delta_data$Total_Rev)

The average total revenue from 1995 to 2017 is $8,889,823. It makes sense this value is higher than the passenger revenue value because it includes more types of revenue fees.




To calculate the load factor, a new column will be introduced that will follow the following equation:

\[\frac{RPM}{ASM} * 100\]

delta_data$Load_Factor <- (delta_data$Rev_Pass_Miles / delta_data$Avail_Seat_Miles) * 100


Below is the average load factor from 1995 to 2017.

avg_load_fctr <- mean(delta_data$Load_Factor)

The average load factor from 1995 to 2017 is 77.99%.

Graphs

Below is a graph of the total revenue for each year from 1995 to 2017.

years <- unique(delta_data$Year)
tot_rev <- delta_data[delta_data$Region == "Total", "Total_Rev"]

frame <- data.frame(years, tot_rev)

ggplot(frame, aes(years, tot_rev)) + 
  geom_point() + 
  geom_line(color = "blue") + 
  theme_minimal() + 
  labs(y = "Total Revenue ($)", 
       x = "Year", 
       title = "Total Revenue from 1995 to 2017") + 
  theme(plot.title = element_text(hjust = 0.5))

This graph shows that there was a large spike in total revenue starting around 2010, and that trend has only increased since then.




Below is a graph of the average total revenue for each operating region.

group1 <- delta_data %>% group_by(Region)

frame1 <- as.data.frame(group1 %>% summarize(rev = mean(Total_Rev)))[-6,]

ggplot(frame1, aes(Region, rev)) + 
  geom_bar(stat = "identity", 
           fill = c("dark green", "maroon", "dark blue", "gold", "orange")) + 
  theme_minimal() + 
  labs(y = "Total Revenue ($)", 
       x = "Operating Region", 
       title = "Average Total Revenue by Operating Region") + 
  theme(plot.title = element_text(hjust = 0.5))

This graph shows that most of Delta’s total revenue comes from their Domestic operating region, followed by International and Atlantic.




Below is a graph of the average load factor for each year from 1995 to 2017.

group2 <- delta_data %>% group_by(Year)

frame2 <- as.data.frame(group2 %>% summarize(lf = mean(Load_Factor)))

ggplot(frame2, aes(Year, lf)) + 
  geom_point() + 
  geom_line(color = "blue") + 
  theme_minimal() + 
  labs(y = "Load Factor") + 
  labs(y = "Load Factor (%)", 
       x = "Year", 
       title = "Average Load Factor from 1995 to 2017") + 
  theme(plot.title = element_text(hjust = 0.5))

This graph shows that Delta’s load factors have increased greatly since 2001, which is interesting to note because the 9/11 terrorist attacks happened in 2001. One might think that the number of people traveling (which can be measured by load factors) would have decreased after 2001 for a bit, but that is not the case.




Below is a graph of the average load factor for each region.

group3 <- delta_data %>% group_by(Region)

frame3 <- as.data.frame(group3 %>% summarize(rev = mean(Load_Factor)))[-6,]

ggplot(frame3, aes(Region, rev)) + 
  geom_bar(stat = "identity", 
           fill = c("dark green", "maroon", "dark blue", "gold", "orange")) + 
  theme_minimal() + 
  labs(y = "Load Factor (%)", 
       x = "Operating Region", 
       title = "Average Load Factor by Operating Region") + 
  theme(plot.title = element_text(hjust = 0.5))

This graph shows that all operating regions have a load factor of around 80%, although Atlantic has the highest load factor and Latin America has the lowest.

Methods & Conclusions

To reiterate, the research question is: How do Delta Airlines’ operating regions compare in terms of revenue versus load factors?

To answer the research question, data visualization techniques are used.

Revenue

tot_rev_graph <- ggplot(delta_data, aes(x = Year, y = Total_Rev, color = Region)) + 
  geom_line() + 
  geom_point() + 
  scale_x_continuous(limits = c(1995, 2017), breaks = 1995:2017) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5), 
        panel.grid.major = element_line(color = "light gray"),
        panel.grid.minor = element_line(color = "light gray"),
        panel.background = element_rect(fill = "white"),
        plot.title = element_text(hjust = 0.5)) + 
  labs(title = "Delta Airlines' Total Revenue \nby Operating Region \nfrom 1995 to 2017", 
       y = "Total Revenue ($)") 

tot_rev_graph

This graph shows that the Domestic operating region has been Delta’s leader in driving total revenue since 1995. Specifically, below is an ordered list (from highest to lowest) of operating regions contributing to total revenue.

  1. Domestic
  2. International
  3. Atlantic
  4. Pacific
  5. Latin America

Load Factors

load_fctrs_graph <- ggplot(delta_data, aes(x = Year, y = Load_Factor, color = Region)) + 
  geom_line() + 
  geom_point() + 
  scale_x_continuous(limits = c(1995, 2017), breaks = 1995:2017) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5), 
        panel.grid.major = element_line(color = "light gray"),
        panel.grid.minor = element_line(color = "light gray"), 
        panel.background = element_rect(fill = "white"), 
        plot.title = element_text(hjust = 0.5)) + 
  labs(title = "Delta Airlines' Load Factors \nby Operating Region \nfrom 1995 to 2017", 
       y = "Load Factor (%)")

load_fctrs_graph

This graph is a bit more complex to understand. One major trend across all years is that the Latin America operating region has consistently had the lowest load factors. Generally, the load factor order (from highest to lowest) is Pacific, Atlantic, International, Domestic, and Latin America. However, based on the latest data from 2017, below is an ordered list (from highest to lowest) of load factors by operating region.

  1. Domestic
  2. Latin America
  3. Pacific
  4. International
  5. Atlantic

Conclusion

To answer the research question, below is a summary chart that compares Delta’s operating regions in terms of their contribution to revenue and load factors. (the chart is listed from highest to lowest)

By Revenue By Load Factor
Domestic Domestic
International Latin America
Atlantic Pacifc
Pacific Internatioal
Latin America Atlantic
  • This chart shows that some operating regions, although not generating as much revenue, still fly efficient flights (i.e. they have high load factors).

  • For example, Latin America generates the lowest amount of revenue but has the second-highest load factor.

    • This means planes are flying fairly full (lots of passengers) on those routes.
  • This is important information for Delta, as well as other airlines, to consider.

    • Airlines can realize that some operating regions fly more efficiently (higher load factors) than others.
    • Perhaps airlines can do a marketing campaign for regions that do not have as high of load factors, or they can expand the routes with existing high load factors to generate more profit.

Future Work

  • Current Limitations

    • Delta is an airline partner within the SkyTeam Alliance, which is made up of many other airlines.
    • It is unclear whether or not the data in this dataset includes travel on Delta’s other airline partners.
      • For example, a flight that was made by Air France may be counted in the Atlantic operations category because Air France is a partner within the SkyTeam Alliance and flies from New York to Paris.
    • This would, in turn, skew the data to reflect more operations than what Delta Airlines alone generated.


  • Additional Analysis

    • Further work could be done, similar to what was completed above, for other airlines.
    • Airlines could then be compared and ranked based on revenue generation and load factors by operating region.
    • Analysis could also be done to predict revenue and load factor values if airlines formed a partnership or alliance to combine/share routes.