This analysis will present the analysis from US Airline Passenger Satisfaction survey
# Data Input and Checking
airlinesat <- read.csv(file = "airlinesatistest.csv")
str(airlinesat)## 'data.frame': 25976 obs. of 25 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ id : int 19556 90035 12360 77959 36875 39177 79433 97286 27508 62482 ...
## $ Gender : chr "Female" "Female" "Male" "Male" ...
## $ Customer.Type : chr "Loyal Customer" "Loyal Customer" "disloyal Customer" "Loyal Customer" ...
## $ Age : int 52 36 20 44 49 16 77 43 47 46 ...
## $ Type.of.Travel : chr "Business travel" "Business travel" "Business travel" "Business travel" ...
## $ Class : chr "Eco" "Business" "Eco" "Business" ...
## $ Flight.Distance : int 160 2863 192 3377 1182 311 3987 2556 556 1744 ...
## $ Inflight.wifi.service : int 5 1 2 0 2 3 5 2 5 2 ...
## $ Departure.Arrival.time.convenient: int 4 1 0 0 3 3 5 2 2 2 ...
## $ Ease.of.Online.booking : int 3 3 2 0 4 3 5 2 2 2 ...
## $ Gate.location : int 4 1 4 2 3 3 5 2 2 2 ...
## $ Food.and.drink : int 3 5 2 3 4 5 3 4 5 3 ...
## $ Online.boarding : int 4 4 2 4 1 5 5 4 5 4 ...
## $ Seat.comfort : int 3 5 2 4 2 3 5 5 5 4 ...
## $ Inflight.entertainment : int 5 4 2 1 2 5 5 4 5 4 ...
## $ On.board.service : int 5 4 4 1 2 4 5 4 2 4 ...
## $ Leg.room.service : int 5 4 1 1 2 3 5 4 2 4 ...
## $ Baggage.handling : int 5 4 3 1 2 1 5 4 5 4 ...
## $ Checkin.service : int 2 3 2 3 4 1 4 5 3 5 ...
## $ Inflight.service : int 5 4 2 1 2 2 5 4 3 4 ...
## $ Cleanliness : int 5 5 2 4 4 5 3 3 5 4 ...
## $ Departure.Delay.in.Minutes : int 50 0 0 0 0 0 0 77 1 28 ...
## $ Arrival.Delay.in.Minutes : num 44 0 0 6 20 0 0 65 0 14 ...
## $ satisfaction : chr "satisfied" "satisfied" "neutral or dissatisfied" "satisfied" ...
# Inspecting Data & Data Cleaning
airlinesat$Gender <- as.factor(airlinesat$Gender)
airlinesat$Customer.Type <- as.factor(airlinesat$Customer.Type)
airlinesat$Type.of.Travel <- as.factor(airlinesat$Type.of.Travel)
airlinesat$Class <- as.factor(airlinesat$Class)
airlinesat$satisfaction <- as.factor(airlinesat$satisfaction)
str(airlinesat)## 'data.frame': 25976 obs. of 25 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ id : int 19556 90035 12360 77959 36875 39177 79433 97286 27508 62482 ...
## $ Gender : Factor w/ 2 levels "Female","Male": 1 1 2 2 1 2 1 1 2 1 ...
## $ Customer.Type : Factor w/ 2 levels "disloyal Customer",..: 2 2 1 2 2 2 2 2 2 2 ...
## $ Age : int 52 36 20 44 49 16 77 43 47 46 ...
## $ Type.of.Travel : Factor w/ 2 levels "Business travel",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Class : Factor w/ 3 levels "Business","Eco",..: 2 1 2 1 2 2 1 1 2 1 ...
## $ Flight.Distance : int 160 2863 192 3377 1182 311 3987 2556 556 1744 ...
## $ Inflight.wifi.service : int 5 1 2 0 2 3 5 2 5 2 ...
## $ Departure.Arrival.time.convenient: int 4 1 0 0 3 3 5 2 2 2 ...
## $ Ease.of.Online.booking : int 3 3 2 0 4 3 5 2 2 2 ...
## $ Gate.location : int 4 1 4 2 3 3 5 2 2 2 ...
## $ Food.and.drink : int 3 5 2 3 4 5 3 4 5 3 ...
## $ Online.boarding : int 4 4 2 4 1 5 5 4 5 4 ...
## $ Seat.comfort : int 3 5 2 4 2 3 5 5 5 4 ...
## $ Inflight.entertainment : int 5 4 2 1 2 5 5 4 5 4 ...
## $ On.board.service : int 5 4 4 1 2 4 5 4 2 4 ...
## $ Leg.room.service : int 5 4 1 1 2 3 5 4 2 4 ...
## $ Baggage.handling : int 5 4 3 1 2 1 5 4 5 4 ...
## $ Checkin.service : int 2 3 2 3 4 1 4 5 3 5 ...
## $ Inflight.service : int 5 4 2 1 2 2 5 4 3 4 ...
## $ Cleanliness : int 5 5 2 4 4 5 3 3 5 4 ...
## $ Departure.Delay.in.Minutes : int 50 0 0 0 0 0 0 77 1 28 ...
## $ Arrival.Delay.in.Minutes : num 44 0 0 6 20 0 0 65 0 14 ...
## $ satisfaction : Factor w/ 2 levels "neutral or dissatisfied",..: 2 2 1 2 2 2 2 2 2 2 ...
# Check missing Value/Null/NA
anyNA(airlinesat)## [1] TRUE
# Handling missing value
colSums(is.na(airlinesat))## X id
## 0 0
## Gender Customer.Type
## 0 0
## Age Type.of.Travel
## 0 0
## Class Flight.Distance
## 0 0
## Inflight.wifi.service Departure.Arrival.time.convenient
## 0 0
## Ease.of.Online.booking Gate.location
## 0 0
## Food.and.drink Online.boarding
## 0 0
## Seat.comfort Inflight.entertainment
## 0 0
## On.board.service Leg.room.service
## 0 0
## Baggage.handling Checkin.service
## 0 0
## Inflight.service Cleanliness
## 0 0
## Departure.Delay.in.Minutes Arrival.Delay.in.Minutes
## 0 83
## satisfaction
## 0
Because of the small amount of missing value (<50%) in compared with total data, so we can directly drop the missing value.
# Drop missing value
library(tidyverse)## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
airlinesat_clean <- drop_na(data = airlinesat, Arrival.Delay.in.Minutes)
anyNA(airlinesat_clean)## [1] FALSE
# Subset data needed
airlinesat_clean2<- airlinesat_clean[,c(3:25)]
airlinesat_clean2 # data frame that we will usesummary(airlinesat_clean2)## Gender Customer.Type Age
## Female:13127 disloyal Customer: 4782 Min. : 7.00
## Male :12766 Loyal Customer :21111 1st Qu.:27.00
## Median :40.00
## Mean :39.62
## 3rd Qu.:51.00
## Max. :85.00
## Type.of.Travel Class Flight.Distance Inflight.wifi.service
## Business travel:17980 Business:12457 Min. : 31 Min. :0.000
## Personal Travel: 7913 Eco :11524 1st Qu.: 414 1st Qu.:2.000
## Eco Plus: 1912 Median : 849 Median :3.000
## Mean :1194 Mean :2.724
## 3rd Qu.:1744 3rd Qu.:4.000
## Max. :4983 Max. :5.000
## Departure.Arrival.time.convenient Ease.of.Online.booking Gate.location
## Min. :0.000 Min. :0.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :3.000
## Mean :3.046 Mean :2.756 Mean :2.976
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000
## Food.and.drink Online.boarding Seat.comfort Inflight.entertainment
## Min. :0.000 Min. :0.000 Min. :1.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :3.000 Median :4.000 Median :4.000 Median :4.000
## Mean :3.215 Mean :3.262 Mean :3.449 Mean :3.357
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## On.board.service Leg.room.service Baggage.handling Checkin.service
## Min. :0.000 Min. :0.00 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:2.00 1st Qu.:3.000 1st Qu.:3.000
## Median :4.000 Median :4.00 Median :4.000 Median :3.000
## Mean :3.386 Mean :3.35 Mean :3.633 Mean :3.314
## 3rd Qu.:4.000 3rd Qu.:4.00 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.00 Max. :5.000 Max. :5.000
## Inflight.service Cleanliness Departure.Delay.in.Minutes
## Min. :0.000 Min. :0.000 Min. : 0.00
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.: 0.00
## Median :4.000 Median :3.000 Median : 0.00
## Mean :3.649 Mean :3.286 Mean : 14.23
## 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.: 12.00
## Max. :5.000 Max. :5.000 Max. :1128.00
## Arrival.Delay.in.Minutes satisfaction
## Min. : 0.00 neutral or dissatisfied:14528
## 1st Qu.: 0.00 satisfied :11365
## Median : 0.00
## Mean : 14.74
## 3rd Qu.: 13.00
## Max. :1115.00
General summary from the data
# Change to long database
airlinesatis.long <- pivot_longer(data = airlinesat_clean2,
cols = c(Inflight.wifi.service,
Departure.Arrival.time.convenient,
Ease.of.Online.booking,
Gate.location,
Food.and.drink,
Online.boarding,
Seat.comfort,
Inflight.entertainment,
On.board.service,
Leg.room.service,
Baggage.handling,
Checkin.service,
Inflight.service,
Cleanliness),
names_to = "satis.factor",
values_to = "satis.level")
airlinesatis.long# Transforming satis.factor to factor
airlinesatis.long$satis.factor <- as.factor(airlinesatis.long$satis.factor)# Subset 'neutral or dissatisfied' data only to have more understanding
airlinedisat <- airlinesatis.long[airlinesatis.long$satisfaction == "neutral or dissatisfied", ]# Finding the factors that rated bad
aggdisat <- aggregate(formula = satis.level ~ satisfaction + satis.factor,
data = airlinedisat,
FUN = mean)
aggdisat# Sort the factor
aggdisat[order(aggdisat$satis.level, decreasing = FALSE), ]mean(aggdisat$satis.level)## [1] 2.955151
Analysis:
In overall, there are 4 factors that has bad score among the others: * Inflight wifi services * The ease of online booking * Online boarding * Inflight entertainment.
These 4 factors has the scored lower than average (2.955)
# Filtering only 4 factors
worse <- c("Inflight.wifi.service", "Ease.of.Online.booking", "Online.boarding", "Inflight.entertainment")
worsefactor <- airlinedisat[airlinedisat$satis.factor %in% worse, ]
worsefactor# Aggregate to know satisfaction level in each passenger class
aggregate(formula = satis.level ~ Class,
data = worsefactor,
FUN = mean)Analysis:
Eco plus class passenger is the most dissapointed group
prop.table(table(airlinesatis.long$satisfaction, airlinesatis.long$Type.of.Travel))##
## Business travel Personal Travel
## neutral or dissatisfied 0.28606187 0.27501641
## satisfied 0.40833430 0.03058742
table(airlinesatis.long$satisfaction, airlinesatis.long$Type.of.Travel)##
## Business travel Personal Travel
## neutral or dissatisfied 103698 99694
## satisfied 148022 11088
Analysis:
Mostly of the Personal travel group are dissappointed
aggregate(formula = satis.level ~ satis.factor + Type.of.Travel,
data = worsefactor,
FUN = mean)Analysis:
There are differences of dissatisfaction factor among 2 type of passenger:
Business travel: - Inflight wifi - Online boarding
Personal travel: - Ease of online booking - Inflight wifi
# Subset group dengan nilai satisfaction
airlinesatis1 <- airlinesatis.long[airlinesatis.long$satisfaction == "satisfied", ]
airlinesatis1# Aggregate to find the overall satisfaction level
aggsatis <- aggregate(formula = satis.level ~ satisfaction + satis.factor,
data = airlinesatis1,
FUN = mean)
aggsatis# Order from the highest score
aggsatis[order(aggsatis$satis.level, decreasing = TRUE), ]Analysis:
Top 5 factors that have best satisfaction rating: * Onlne Boarding –> rated bad also on dissatifaction group * Inflight service * Baggage handling * Seat comfort * Inflight entertainment –> rated bad also on dissatifaction group
Based on the survey data, there are 4 factors that have bad rating from the passenger which mostly related to the support infrastructure, such as wifi services, online booking system, etc. In contrast, the main infrastructure and the service offered by the Airline is rated very good such as seat comfort and also inflight service.
Recomenadation: