Comparing change in recorded domestic flight load factor from 1998 to 2009, in the US, by the Census

How did the average load factor of U.S. domestic flights evolve between 1998 and 2009?

This analysis follows a dataset containing detailed information on all recorded U.S. domestic flights from 1998-2009 published by the US Census.

#The dataset includes flight origin,destination airports,number of passengers, available seats, #and the date of each flight,etc.

Focus on Load Factor

The load factor is a percentage based on the number of passengers per flight, dividied by the number of available seats.A higher load factor means more seats are filled, while a lower load factor means many seats are empty, suggesting underutilized flights. Analyzing the change in load factor is a great indication of the efficiency and profitability of US airport systems over time.

In this analysis, the load factorshows how well U.S. domestic flights utilized their seating capacityear to year.

setwd(“~/Library/Mobile Documents/comappleCloudDocs/Data 101/flightlogs”)

flights<- read_csv(“flight_edges.csv”)

install.packages(“tidyverse”) library(tidyverse)

library(dplyr) library(tidyverse)

colnames(flights) <- c(“Origin”, “Destination”, “OriginCity”, “DestinationCity”, “Passengers”, “Seats”, “Flights”, “Distance”, “FlyDate”, “OriginPopulation”, “DestinationPopulation”)

print(flights) flights <- flights %>% clean_names() # now columns are lowercase, syntactically valid colnames(flights) # check new names

flights <- flights %>% mutate( year = as.numeric(year), month = as.numeric(month), day = as.numeric(day), passengers = as.numeric(passengers), seats = as.numeric(seats) ) %>%

Convert FlyDate to Date object (first day of the month)

library(lubridate) mutate( FlyDate = make_date(year = year, month = month, day = 1) ) flights\(Fly_Date <- as.Date(paste0(as.character(flights\)Fly_Date), “01”), format = “%Y%m%d”)

print(flights_clean)

library(dplyr)

flights_clean <- flights_clean %>% mutate( # Create a load factor LoadFactor = Passengers / Seats,

# Optional: compute flight duration if you have dep/arr times
FlightDuration = arr_time - dep_time

)

#Clean data flights flights_clean <- flights %>% filter(!is.na(Passengers), Passengers > 0, !is.na(Seats), Seats > 0)

print(flights_clean)

monthly_load <- flights_clean %>% summarise( TotalPassengers = sum(Passengers, na.rm = TRUE), TotalSeats = sum(Seats, na.rm = TRUE) ) %>% mutate( LoadFactor = TotalPassengers / TotalSeats ) %>% arrange(FlyDate)

print(monthly_load)

colnames(flights_clean) head(flights_clean)

Check data

head(monthly_load)

#Clean up rows flights_clean <- flights_clean %>% filter(FlyDate >= as.Date(“1998-01-01”) & FlyDate <= as.Date(“2009-12-31”))

#Summarize yearly load from cleaned data library(dplyr) yearly_load <- flights_clean %>% mutate(Year = format(FlyDate, “%Y”)) %>% group_by(Year) %>% summarise( TotalPassengers = sum(Passengers, na.rm = TRUE), TotalSeats = sum(Seats, na.rm = TRUE) ) %>% mutate(LoadFactor = TotalPassengers / TotalSeats) %>% arrange(Year)

print(yearly_load)

#Use lag() to calculate difference year to year in load factor and % change yearly_load <- yearly_load%>% mutate(LoadFactorPctChange = (LoadFactor - lag(LoadFactor)) / lag(LoadFactor) * 100)

print(yearly_load)


#Summary of Key Findings #The load factor rose from 0.66 in 1998 to 0.76 in 2009, creating a roughly #15% improvements in average seat utilization over the decade. #Major dips align with major crises:

#2001 → 9/11 terrorist attacks.

#2008 → Global financial crisis.

#Future research can build on this exploration by examining how different #social and economic climates influence airline load factors, within flights.

#By continuing to monitor changes over time, researchers can identify how events such as economic recessions, pandemics directly affect flight occupancy rates. #This type of analysis could be expanded by adding in more variables to document, #such as ticket prices, route distances, airline type to really research consumer habits in various social climates.

#Citation: #Perkins, Jacob. 3.5 Million+ US Domestic Flights from 1990 to 2009. Infochimps, 2010, #http://infochimps.org/datasets/d35-million-us-domestic-flights-from-1990-to-2009.