Analyzing the Influence of Airline Delays on Overall Sentiment in Customer Reviews

Author

Nate Holtz

Published

May 3, 2024

Introduction

In the realm of air travel, delays are an inevitable reality. However, their impact on passenger sentiment towards airlines remains a subject of interest. This report explores the correlation between average airline delay times and the overall sentiment expressed in customer reviews. By examining this relationship, valuable insights can be gleaned to enhance customer satisfaction and loyalty within the aviation industry.

Data Sources

The primary data source for this report comprises delay times of major airlines operating in the United States, sourced from Kaggle. This dataset delineates delay times for numerous prominent US airlines, categorized by respective arrival airports, spanning the years 2019 and 2020.

Additionally, sentiment analysis of reviews retrieved from airlinequality.com for each airline featured in the Kaggle dataset serves as the secondary data source for this report.

By harnessing insights from both datasets, this report endeavors to assess the extent to which delay scores attributed to each airline influence public perception.

Delay Data

The primary delay data sourced from Kaggle is notably well-structured, necessitating minimal cleaning or formatting efforts. It is organized by both airline and airport, providing comprehensive delay information alongside the reasons for each delay. Additionally, the dataset encompasses data on cancelled flights and offers detailed insights into the duration of delays, specifying the reasons behind each delay occurrence in minutes.

Summary Statistics

To gain a broad understanding of flight delay times, generating summary statistics and visuals offers a foundational overview of the data before delving into deeper analysis.

Average Delay per Flight by Airline

To highlight airlines with the longest average flight delays, a visual representation is crafted to illustrate the average flight delay time per flight, categorized by airline.

According to the visual, ExpressJet Airlines emerges with the longest average flight delay time, closely trailed by JetBlue Airways.

Heatmap of Average Delay Time

To gain deeper insights into how delay times vary across different arrival cities for each airline, a heatmap visual is constructed. In this visual, airlines are presented on the left side, and arrival cities are depicted along the bottom axis. Darker shades of blue in the heatmap signify longer average delay times.

Looking at the top 25 cities for airline traffic, some standout points emerge. Take Newark, NJ, for example, where ExpressJet Airlines has an average delay time of almost 50 minutes over 1,400 arriving flights. Also, JetBlue Airways and ExpressJet Airlines show up with the deepest shades of blue, just like in the previous visual.

Investigating Factors Behind Delays

Exploring various factors such as weather, security breaches, or late aircrafts can provide insights into the root causes of flight delays. By conducting a detailed analysis of these factors, a deeper understanding can be gained regarding the underlying reasons for delays in air travel.

Weather Delay Relationship

To investigate whether weather delays significantly contribute to flight delays, a visual representation was generated. This visual depicts a plot between the total delay time and the count of weather delays.

The visual indicates a minimal correlation between the total delay time and the count of weather delays. This suggests that other factors may have a more substantial influence on total delay times, independent of weather related delays.

Flights Delayed by 15 minutes or more

For most people, occasional flight delays are expected and accepted as part of air travel. Delays of less than 15 minutes are generally manageable. However, when delays exceed 15 minutes, passengers often become impatient. The upcoming visual will illustrate the frequency of flights delayed more than 15 minutes per 100 flights, categorized by city and airline.

Security Comes First

Since 2001, airline security has become a paramount concern. Consequently, flights are often delayed due to security risks, which may not necessarily be within the airline’s control but could be influenced by the city. The following visual aims to highlight cities whose airports experience the most delays associated with security risks.

Despite common assumptions favoring large cities like New York or Los Angeles for experiencing the highest number of security delays, data reveals a surprising trend. When sorted by delay time per hundred flights, Anchorage, AK emerges as the frontrunner for the most security delay time between 2019 and 2020. While this may be attributed to a singular incident, it’s plausible that Anchorage’s stringent security protocols also play a significant role. Further investigation is needed to draw conclusive assumptions.

Sentiment Analysis

For the second part of this report, sentiment analysis was conducted on reviews sourced from airlinequality.com using the AFINN sentiment scoring method. This analysis aimed to gauge the overall sentiment expressed towards different airlines based on the content of the reviews. These sentiment scores will be compared to the overall delay scores obtained from flight data to explore potential correlations between passenger sentiment and flight delays. This comparison will help determine if there is any relationship between customer satisfaction, as reflected in sentiment scores, and the frequency or severity of flight delays experienced by different airlines.

Data Scrape

In order to gather the required information for this report, a data scraping process from airlinequality.com was undertaken. However, it was noted that several airlines present in our initial dataset were not available on this webpage. Consequently, these airlines were excluded from the final results. The following script outlines the procedure used to extract data from each webpage.

for (i in seq_along(all_airline_urls)) {
  airline_url<-all_airline_urls[i]
  print(paste("Collecting page",i,"of",length(all_airline_urls),":) ",sep =" "))
  Sys.sleep(runif(1, 10, 15))

  comments <- read_html(airline_url) %>%
    html_elements("div.tc_mobile") %>%
    html_elements("div.text_content") %>%
    html_text2() %>%
    str_remove(".*\\|\\s*") %>%
    tibble(comment=.,airline=airline_names[which(all_airline_urls==airline_url)])

  all_comments <- bind_rows(all_comments, comments)  # Bind rows

  print(paste(airline_url,"collected",sep = " "))
  print(paste(nrow(all_comments),"total reviews collected so far!",sep=" "))}

The resulting output provides a clean table displaying airlines along with their corresponding AFINN scores, obtained through the scraping process. This table is presented below:

airline afinn_score
Alaska Airlines -5.700000
Allegiant Air -7.222222
American Airlines -8.666667
Delta Air Lines -1.600000
Frontier Airlines -5.888889
Hawaiian Airlines -4.100000
Jetblue Airways -4.000000
Skywest Airlines 2.555556
Southwest Airlines -5.333333
Spirit Air Lines -4.900000
United Air Lines 1.500000

Sentiment Results

To compare sentiment analysis scores with the overall average delay times of each airline, a point plot was created below. This visualization aims to uncover any potential patterns or correlations between delay times and sentiment scores.

As a result, it becomes evident that there is little to no correlation between AFINN sentiment scores and the average delay time of an airline. This finding suggests that external factors beyond delay times exert a significant influence on the sentiment review scores of airlines. This observation underscores the complexity of passenger perceptions and highlights the multifaceted nature of factors that contribute to overall airline satisfaction.

Summary

In this report, an investigation was conducted to examine the relationship between airline delay times and passenger sentiment scores obtained from airlinequality.com using AFINN sentiment analysis. While analyzing the data, it became apparent that there is little to no correlation between AFINN sentiment scores and average delay times among airlines. This suggests that factors beyond flight delays significantly influence passenger sentiment towards airlines. The findings underscore the intricate nature of passenger perceptions and emphasize the importance of considering various factors beyond operational metrics in assessing overall airline satisfaction.