Explanation

This analysis will present the analysis from US Airline Passenger Satisfaction survey

Data Preparation

# Data Input and Checking

airlinesat <- read.csv(file = "airlinesatistest.csv")

str(airlinesat)
## 'data.frame':    25976 obs. of  25 variables:
##  $ X                                : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ id                               : int  19556 90035 12360 77959 36875 39177 79433 97286 27508 62482 ...
##  $ Gender                           : chr  "Female" "Female" "Male" "Male" ...
##  $ Customer.Type                    : chr  "Loyal Customer" "Loyal Customer" "disloyal Customer" "Loyal Customer" ...
##  $ Age                              : int  52 36 20 44 49 16 77 43 47 46 ...
##  $ Type.of.Travel                   : chr  "Business travel" "Business travel" "Business travel" "Business travel" ...
##  $ Class                            : chr  "Eco" "Business" "Eco" "Business" ...
##  $ Flight.Distance                  : int  160 2863 192 3377 1182 311 3987 2556 556 1744 ...
##  $ Inflight.wifi.service            : int  5 1 2 0 2 3 5 2 5 2 ...
##  $ Departure.Arrival.time.convenient: int  4 1 0 0 3 3 5 2 2 2 ...
##  $ Ease.of.Online.booking           : int  3 3 2 0 4 3 5 2 2 2 ...
##  $ Gate.location                    : int  4 1 4 2 3 3 5 2 2 2 ...
##  $ Food.and.drink                   : int  3 5 2 3 4 5 3 4 5 3 ...
##  $ Online.boarding                  : int  4 4 2 4 1 5 5 4 5 4 ...
##  $ Seat.comfort                     : int  3 5 2 4 2 3 5 5 5 4 ...
##  $ Inflight.entertainment           : int  5 4 2 1 2 5 5 4 5 4 ...
##  $ On.board.service                 : int  5 4 4 1 2 4 5 4 2 4 ...
##  $ Leg.room.service                 : int  5 4 1 1 2 3 5 4 2 4 ...
##  $ Baggage.handling                 : int  5 4 3 1 2 1 5 4 5 4 ...
##  $ Checkin.service                  : int  2 3 2 3 4 1 4 5 3 5 ...
##  $ Inflight.service                 : int  5 4 2 1 2 2 5 4 3 4 ...
##  $ Cleanliness                      : int  5 5 2 4 4 5 3 3 5 4 ...
##  $ Departure.Delay.in.Minutes       : int  50 0 0 0 0 0 0 77 1 28 ...
##  $ Arrival.Delay.in.Minutes         : num  44 0 0 6 20 0 0 65 0 14 ...
##  $ satisfaction                     : chr  "satisfied" "satisfied" "neutral or dissatisfied" "satisfied" ...
# Inspecting Data & Data Cleaning

airlinesat$Gender <- as.factor(airlinesat$Gender)
airlinesat$Customer.Type <- as.factor(airlinesat$Customer.Type)
airlinesat$Type.of.Travel <- as.factor(airlinesat$Type.of.Travel)
airlinesat$Class <- as.factor(airlinesat$Class)
airlinesat$satisfaction <- as.factor(airlinesat$satisfaction)
str(airlinesat)
## 'data.frame':    25976 obs. of  25 variables:
##  $ X                                : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ id                               : int  19556 90035 12360 77959 36875 39177 79433 97286 27508 62482 ...
##  $ Gender                           : Factor w/ 2 levels "Female","Male": 1 1 2 2 1 2 1 1 2 1 ...
##  $ Customer.Type                    : Factor w/ 2 levels "disloyal Customer",..: 2 2 1 2 2 2 2 2 2 2 ...
##  $ Age                              : int  52 36 20 44 49 16 77 43 47 46 ...
##  $ Type.of.Travel                   : Factor w/ 2 levels "Business travel",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Class                            : Factor w/ 3 levels "Business","Eco",..: 2 1 2 1 2 2 1 1 2 1 ...
##  $ Flight.Distance                  : int  160 2863 192 3377 1182 311 3987 2556 556 1744 ...
##  $ Inflight.wifi.service            : int  5 1 2 0 2 3 5 2 5 2 ...
##  $ Departure.Arrival.time.convenient: int  4 1 0 0 3 3 5 2 2 2 ...
##  $ Ease.of.Online.booking           : int  3 3 2 0 4 3 5 2 2 2 ...
##  $ Gate.location                    : int  4 1 4 2 3 3 5 2 2 2 ...
##  $ Food.and.drink                   : int  3 5 2 3 4 5 3 4 5 3 ...
##  $ Online.boarding                  : int  4 4 2 4 1 5 5 4 5 4 ...
##  $ Seat.comfort                     : int  3 5 2 4 2 3 5 5 5 4 ...
##  $ Inflight.entertainment           : int  5 4 2 1 2 5 5 4 5 4 ...
##  $ On.board.service                 : int  5 4 4 1 2 4 5 4 2 4 ...
##  $ Leg.room.service                 : int  5 4 1 1 2 3 5 4 2 4 ...
##  $ Baggage.handling                 : int  5 4 3 1 2 1 5 4 5 4 ...
##  $ Checkin.service                  : int  2 3 2 3 4 1 4 5 3 5 ...
##  $ Inflight.service                 : int  5 4 2 1 2 2 5 4 3 4 ...
##  $ Cleanliness                      : int  5 5 2 4 4 5 3 3 5 4 ...
##  $ Departure.Delay.in.Minutes       : int  50 0 0 0 0 0 0 77 1 28 ...
##  $ Arrival.Delay.in.Minutes         : num  44 0 0 6 20 0 0 65 0 14 ...
##  $ satisfaction                     : Factor w/ 2 levels "neutral or dissatisfied",..: 2 2 1 2 2 2 2 2 2 2 ...
# Check missing Value/Null/NA

anyNA(airlinesat)
## [1] TRUE
# Handling missing value
colSums(is.na(airlinesat))
##                                 X                                id 
##                                 0                                 0 
##                            Gender                     Customer.Type 
##                                 0                                 0 
##                               Age                    Type.of.Travel 
##                                 0                                 0 
##                             Class                   Flight.Distance 
##                                 0                                 0 
##             Inflight.wifi.service Departure.Arrival.time.convenient 
##                                 0                                 0 
##            Ease.of.Online.booking                     Gate.location 
##                                 0                                 0 
##                    Food.and.drink                   Online.boarding 
##                                 0                                 0 
##                      Seat.comfort            Inflight.entertainment 
##                                 0                                 0 
##                  On.board.service                  Leg.room.service 
##                                 0                                 0 
##                  Baggage.handling                   Checkin.service 
##                                 0                                 0 
##                  Inflight.service                       Cleanliness 
##                                 0                                 0 
##        Departure.Delay.in.Minutes          Arrival.Delay.in.Minutes 
##                                 0                                83 
##                      satisfaction 
##                                 0

Because of the small amount of missing value (<50%) in compared with total data, so we can directly drop the missing value.

# Drop missing value

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
airlinesat_clean <- drop_na(data = airlinesat, Arrival.Delay.in.Minutes)
anyNA(airlinesat_clean)
## [1] FALSE
# Subset data needed

airlinesat_clean2<- airlinesat_clean[,c(3:25)]
airlinesat_clean2 # data frame that we will use

Data Summary

summary(airlinesat_clean2)
##     Gender                Customer.Type        Age       
##  Female:13127   disloyal Customer: 4782   Min.   : 7.00  
##  Male  :12766   Loyal Customer   :21111   1st Qu.:27.00  
##                                           Median :40.00  
##                                           Mean   :39.62  
##                                           3rd Qu.:51.00  
##                                           Max.   :85.00  
##          Type.of.Travel       Class       Flight.Distance Inflight.wifi.service
##  Business travel:17980   Business:12457   Min.   :  31    Min.   :0.000        
##  Personal Travel: 7913   Eco     :11524   1st Qu.: 414    1st Qu.:2.000        
##                          Eco Plus: 1912   Median : 849    Median :3.000        
##                                           Mean   :1194    Mean   :2.724        
##                                           3rd Qu.:1744    3rd Qu.:4.000        
##                                           Max.   :4983    Max.   :5.000        
##  Departure.Arrival.time.convenient Ease.of.Online.booking Gate.location  
##  Min.   :0.000                     Min.   :0.000          Min.   :1.000  
##  1st Qu.:2.000                     1st Qu.:2.000          1st Qu.:2.000  
##  Median :3.000                     Median :3.000          Median :3.000  
##  Mean   :3.046                     Mean   :2.756          Mean   :2.976  
##  3rd Qu.:4.000                     3rd Qu.:4.000          3rd Qu.:4.000  
##  Max.   :5.000                     Max.   :5.000          Max.   :5.000  
##  Food.and.drink  Online.boarding  Seat.comfort   Inflight.entertainment
##  Min.   :0.000   Min.   :0.000   Min.   :1.000   Min.   :0.000         
##  1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000         
##  Median :3.000   Median :4.000   Median :4.000   Median :4.000         
##  Mean   :3.215   Mean   :3.262   Mean   :3.449   Mean   :3.357         
##  3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:4.000         
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000         
##  On.board.service Leg.room.service Baggage.handling Checkin.service
##  Min.   :0.000    Min.   :0.00     Min.   :1.000    Min.   :1.000  
##  1st Qu.:2.000    1st Qu.:2.00     1st Qu.:3.000    1st Qu.:3.000  
##  Median :4.000    Median :4.00     Median :4.000    Median :3.000  
##  Mean   :3.386    Mean   :3.35     Mean   :3.633    Mean   :3.314  
##  3rd Qu.:4.000    3rd Qu.:4.00     3rd Qu.:5.000    3rd Qu.:4.000  
##  Max.   :5.000    Max.   :5.00     Max.   :5.000    Max.   :5.000  
##  Inflight.service  Cleanliness    Departure.Delay.in.Minutes
##  Min.   :0.000    Min.   :0.000   Min.   :   0.00           
##  1st Qu.:3.000    1st Qu.:2.000   1st Qu.:   0.00           
##  Median :4.000    Median :3.000   Median :   0.00           
##  Mean   :3.649    Mean   :3.286   Mean   :  14.23           
##  3rd Qu.:5.000    3rd Qu.:4.000   3rd Qu.:  12.00           
##  Max.   :5.000    Max.   :5.000   Max.   :1128.00           
##  Arrival.Delay.in.Minutes                  satisfaction  
##  Min.   :   0.00          neutral or dissatisfied:14528  
##  1st Qu.:   0.00          satisfied              :11365  
##  Median :   0.00                                         
##  Mean   :  14.74                                         
##  3rd Qu.:  13.00                                         
##  Max.   :1115.00

General summary from the data

  • Most of the G Airlines are loyal passenger (81%)
  • In average, G Airlines passenger are in mature age (39 y.o) and they are majority fly for Business reason
  • G Airlines have 3 seat classes, Business class is the most used followed by eco
  • From several criteria of satisfaction level, in-flight service is having the highest rating (mean) while in-flight wifi service and the ease of online booking rated the lowest (mean)
  • More than half of the G Airlines feels neutral or dissatisfied with overall services
  • The maximum delay that ever felt by G Airlines’s passenger are >1000 mins (16 hours) for both departure and arrival

Data Manipulation & transformation

1. In overall, which factors that rated the lowest in satisfaction survey?

# Change to long database

airlinesatis.long <- pivot_longer(data =  airlinesat_clean2, 
             cols = c(Inflight.wifi.service, 
             Departure.Arrival.time.convenient, 
             Ease.of.Online.booking, 
             Gate.location, 
             Food.and.drink, 
             Online.boarding, 
             Seat.comfort, 
             Inflight.entertainment, 
             On.board.service, 
             Leg.room.service, 
             Baggage.handling, 
             Checkin.service, 
             Inflight.service, 
             Cleanliness),
             names_to = "satis.factor", 
             values_to = "satis.level")
airlinesatis.long
# Transforming satis.factor to factor

airlinesatis.long$satis.factor <- as.factor(airlinesatis.long$satis.factor)
# Subset 'neutral or dissatisfied' data only to have more understanding

airlinedisat <- airlinesatis.long[airlinesatis.long$satisfaction == "neutral or dissatisfied", ]
# Finding the factors that rated bad

aggdisat <- aggregate(formula = satis.level ~ satisfaction + satis.factor, 
          data = airlinedisat, 
          FUN = mean)
aggdisat
# Sort the factor

aggdisat[order(aggdisat$satis.level, decreasing = FALSE), ]
mean(aggdisat$satis.level)
## [1] 2.955151

Analysis:

In overall, there are 4 factors that has bad score among the others: * Inflight wifi services * The ease of online booking * Online boarding * Inflight entertainment.

These 4 factors has the scored lower than average (2.955)

2. Which Passenger class that most dissappointed with Airlines services?

# Filtering only 4 factors

worse <- c("Inflight.wifi.service", "Ease.of.Online.booking", "Online.boarding", "Inflight.entertainment")

worsefactor <- airlinedisat[airlinedisat$satis.factor %in% worse,  ]
worsefactor
# Aggregate to know satisfaction level in each passenger class

aggregate(formula = satis.level ~ Class, 
          data = worsefactor, 
          FUN = mean)

Analysis:

Eco plus class passenger is the most dissapointed group

3. How is the overall satisfaction for each travel type group?

prop.table(table(airlinesatis.long$satisfaction, airlinesatis.long$Type.of.Travel))
##                          
##                           Business travel Personal Travel
##   neutral or dissatisfied      0.28606187      0.27501641
##   satisfied                    0.40833430      0.03058742
table(airlinesatis.long$satisfaction, airlinesatis.long$Type.of.Travel)
##                          
##                           Business travel Personal Travel
##   neutral or dissatisfied          103698           99694
##   satisfied                        148022           11088

Analysis:

Mostly of the Personal travel group are dissappointed

aggregate(formula = satis.level ~ satis.factor + Type.of.Travel, 
          data = worsefactor, 
          FUN = mean)

Analysis:

There are differences of dissatisfaction factor among 2 type of passenger:

Business travel: - Inflight wifi - Online boarding

Personal travel: - Ease of online booking - Inflight wifi

4. Which factor that mostly rated good by the passenger

# Subset group dengan nilai satisfaction

airlinesatis1 <- airlinesatis.long[airlinesatis.long$satisfaction == "satisfied", ]
airlinesatis1
# Aggregate to find the overall satisfaction level

aggsatis <- aggregate(formula = satis.level ~ satisfaction + satis.factor, 
          data = airlinesatis1, 
          FUN = mean)
aggsatis
# Order from the highest score

aggsatis[order(aggsatis$satis.level, decreasing = TRUE), ]

Analysis:

Top 5 factors that have best satisfaction rating: * Onlne Boarding –> rated bad also on dissatifaction group * Inflight service * Baggage handling * Seat comfort * Inflight entertainment –> rated bad also on dissatifaction group

Analysis Summary & Recomendation

Based on the survey data, there are 4 factors that have bad rating from the passenger which mostly related to the support infrastructure, such as wifi services, online booking system, etc. In contrast, the main infrastructure and the service offered by the Airline is rated very good such as seat comfort and also inflight service.

Recomenadation:

  • Maintaining the factors that rated good: Inflight service, Baggage handling, seat comfort
  • Focus to improve on Inflight wifi service and online boarding system, and followed to improve the online booking system and inflight entertainment. Since majority of the Airline passenger is coming from Business travel, therefore improving the factors that rated bad by business travel group’s passenger is improtant
  • Need to have more analysis on which factors that really impact the most on the passenger’s satisfaction level.