Homework 2 - Hypothesis Testing

Airline Passenger Satisfaction Data Set

Data Source: Kaggle.com (2025)

Link: https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction/data

1) Raw Data Importation

#train.csv will be used as it is 80% of the full dataset

raw_cloudrate <- read.table("./train.csv", header = TRUE, sep = ",", dec = ".")
head(raw_cloudrate)

##   X     id Gender     Customer.Type Age  Type.of.Travel    Class
## 1 0  70172   Male    Loyal Customer  13 Personal Travel Eco Plus
## 2 1   5047   Male disloyal Customer  25 Business travel Business
## 3 2 110028 Female    Loyal Customer  26 Business travel Business
## 4 3  24026 Female    Loyal Customer  25 Business travel Business
## 5 4 119299   Male    Loyal Customer  61 Business travel Business
## 6 5 111157 Female    Loyal Customer  26 Personal Travel      Eco
##   Flight.Distance Inflight.wifi.service Departure.Arrival.time.convenient
## 1             460                     3                                 4
## 2             235                     3                                 2
## 3            1142                     2                                 2
## 4             562                     2                                 5
## 5             214                     3                                 3
## 6            1180                     3                                 4
##   Ease.of.Online.booking Gate.location Food.and.drink Online.boarding
## 1                      3             1              5               3
## 2                      3             3              1               3
## 3                      2             2              5               5
## 4                      5             5              2               2
## 5                      3             3              4               5
## 6                      2             1              1               2
##   Seat.comfort Inflight.entertainment On.board.service Leg.room.service
## 1            5                      5                4                3
## 2            1                      1                1                5
## 3            5                      5                4                3
## 4            2                      2                2                5
## 5            5                      3                3                4
## 6            1                      1                3                4
##   Baggage.handling Checkin.service Inflight.service Cleanliness
## 1                4               4                5           5
## 2                3               1                4           1
## 3                4               4                4           5
## 4                3               1                4           2
## 5                4               3                3           3
## 6                4               4                4           1
##   Departure.Delay.in.Minutes Arrival.Delay.in.Minutes            satisfaction
## 1                         25                       18 neutral or dissatisfied
## 2                          1                        6 neutral or dissatisfied
## 3                          0                        0               satisfied
## 4                         11                        9 neutral or dissatisfied
## 5                          0                        0               satisfied
## 6                          0                        0 neutral or dissatisfied

Explanation of Raw Data

Target population: All airline passengers

Sample size: Passengers who took part in the airline passengers satisfaction survey (Total = 103,904 observations)

Unit of observation: 1 of the 103904 passengers who participated in the survey

Number of variables: 25 (after cleaning = 23 variables)

Definition & unit of measurement of all initial variables (before clean-up)

id (categorical - nominal): Customer unique identification number
Gender (categorical - nominal): Gender of the passengers - “Female” or “Male”
Customer.Type (categorical - nominal): Customer type - “Loyal” or “disloyal” customer
Age (numerical - ratio): The actual age of the passengers
Type.of.Travel (categorical - nominal): Purpose of the flight of the passengers - “Personal Travel” or “Business Travel”
Class (categorical - ordinal): Travel class in the plane of the passengers - “Business”, “Eco”, “Eco Plus”
Flight.distance (numerical - ratio): The flight distance of this journey
Inflight.wifi.service (numerical - interval): Satisfaction level of the inflight wifi service (0:Not Applicable; 1-5: Satisfaction level)
Departure/Arrival time convenient (numerical - interval): Satisfaction level of Departure/Arrival time convenient
Ease of Online booking (numerical - interval): Satisfaction level of online booking
Gate location (numerical - interval): Satisfaction level of Gate location
Food and drink (numerical - interval): Satisfaction level of Food and drink
Online boarding (numerical - interval): Satisfaction level of online boarding
Seat comfort (numerical - interval): Satisfaction level of Seat comfort
Inflight entertainment (numerical - interval): Satisfaction level of inflight entertainment
On-board service (numerical - interval): Satisfaction level of On-board service
Leg room service (numerical - interval): Satisfaction level of Leg room service
Baggage handling (numerical - interval): Satisfaction level of baggage handling
Check-in service (numerical - interval): Satisfaction level of Check-in service
Inflight service (numerical - interval): Satisfaction level of inflight service
Cleanliness (numerical - interval): Satisfaction level of Cleanliness
Departure Delay in Minutes (numerical - ratio): Minutes delayed when departure
Arrival Delay in Minutes (numerical - ratio): Minutes delayed when Arrival
Satisfaction (categorical - ordinal): Airline satisfaction level - “Satisfaction”, or “Neutral or dissatisfaction”)

2) Data Manipulation / Cleaning

Load dplyr & tidyr packages

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)

Initial cleaning of raw data

raw_cloudrate <- raw_cloudrate %>% 
                 select(-1, -Customer.Type) #Remove the first column (serial number) and Customer.Type column

Cleaning observations

raw_cloudrate <- raw_cloudrate %>% drop_na() #Drop all observations with NA values in their records
raw_cloudrate$id <- sprintf("%06d", raw_cloudrate$id) #Make ID values to be consistent at 6 digits

Rename variables

raw_cloudrate <- raw_cloudrate %>%
                 rename (cust_id = id, 
                         gender = Gender,
                         age = Age,
                         travel_type = Type.of.Travel,
                         class = Class,
                         flight_dist = Flight.Distance,
                         plane_wifi = Inflight.wifi.service,
                         dep_arr_conv = Departure.Arrival.time.convenient,
                         online_book = Ease.of.Online.booking,
                         gate_loc = Gate.location,
                         food_drink = Food.and.drink,
                         online_board = Online.boarding,
                         seat_comf = Seat.comfort,
                         plane_ent = Inflight.entertainment,
                         onboard_srv = On.board.service,
                         legroom = Leg.room.service,
                         baggage = Baggage.handling,
                         checkin_srv = Checkin.service,
                         plane_srv = Inflight.service,
                         clean = Cleanliness,
                         dep_delay = Departure.Delay.in.Minutes,
                         arr_delay = Arrival.Delay.in.Minutes,
                         overall_sat = satisfaction)

Converting categorical variables into factor variables

raw_cloudrate <- raw_cloudrate %>%
  mutate(
    gender = factor(gender, levels = c("Female", "Male"), labels = c("F", "M")),
    travel_type = factor(travel_type, levels = c("Personal Travel", "Business travel"), labels = c("Personal", "Work-related")),
    class = factor(class, levels = c("Business", "Eco", "Eco Plus"), labels = c("Business", "Eco/EcoPlus", "Eco/EcoPlus")), #Combining both Eco and Eco Plus classes, as Eco Plus class is observed to only have relatively less count to make comparisons
    overall_sat = factor(overall_sat, levels = c("neutral or dissatisfied", "satisfied"))
  )

Create new dataframes & variables/columns based on conditions

#Create age group column
raw_cloudrate <- raw_cloudrate %>%
  mutate(age_grp = cut(age, breaks = seq(0, max(90), by = 10), right = FALSE, labels = FALSE))

raw_cloudrate <- raw_cloudrate %>% select(1:3, age_grp, everything())

raw_cloudrate <- raw_cloudrate %>%
  mutate(age_grp = factor(age_grp, 
                          levels = c("1", "2", "3", "4", "5", "6", "7", "8", "9"), 
                          labels = c("0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89")))


#Inflight Services Ratings By Passengers
indiv_inflight_ratings <- raw_cloudrate %>% 
  select(cust_id, starts_with("plane"), onboard_srv, seat_comf, legroom) %>%
  mutate(avg_score = round(rowMeans(select(., -cust_id), na.rm = TRUE), 2))

#Average Inflight Services Rating Based on Class
avg_inflight_srv_ratings <- raw_cloudrate %>%
  group_by(class) %>%
  summarise(class_count = n(),
            across(c("plane_wifi", "seat_comf", "food_drink", 
                     "plane_ent", "onboard_srv", "legroom"),
            ~ round(mean(.x, na.rm = TRUE), 2)))

#Delay Statistics Based on Class and Travel Type
delay_stats_class = raw_cloudrate %>%
  group_by (class, travel_type) %>%
  summarise(avg_dep_delay = mean(dep_delay, na.rm = TRUE),
            avg_arr_delay = mean(arr_delay, na.rm = TRUE),
            .groups = "drop")

#Satisfaction Breakdown Based on Travel Type
satisfaction_travel <- raw_cloudrate %>%
  group_by(travel_type, overall_sat) %>%
  summarise(count = n(), .groups = "drop") %>%
  mutate(percentage = count / sum(count) * 100)

3) Descriptive Statistics

#Load psych package
library(psych)

#Overall summary of the raw database (after data manipulation)
summary(raw_cloudrate)

##    cust_id          gender         age           age_grp     
##  Length:103594      F:52576   Min.   : 7.00   40-49  :23632  
##  Class :character   M:51018   1st Qu.:27.00   20-29  :20854  
##  Mode  :character             Median :40.00   30-39  :20593  
##                               Mean   :39.38   50-59  :19053  
##                               3rd Qu.:51.00   60-69  : 8313  
##                               Max.   :85.00   10-19  : 7896  
##                                               (Other): 3253  
##        travel_type            class        flight_dist     plane_wifi  
##  Personal    :32129   Business   :49533   Min.   :  31   Min.   :0.00  
##  Work-related:71465   Eco/EcoPlus:54061   1st Qu.: 414   1st Qu.:2.00  
##                                           Median : 842   Median :3.00  
##                                           Mean   :1189   Mean   :2.73  
##                                           3rd Qu.:1743   3rd Qu.:4.00  
##                                           Max.   :4983   Max.   :5.00  
##                                                                        
##   dep_arr_conv   online_book       gate_loc       food_drink     online_board 
##  Min.   :0.00   Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.00  
##  1st Qu.:2.00   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.00  
##  Median :3.00   Median :3.000   Median :3.000   Median :3.000   Median :3.00  
##  Mean   :3.06   Mean   :2.757   Mean   :2.977   Mean   :3.202   Mean   :3.25  
##  3rd Qu.:4.00   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.00  
##  Max.   :5.00   Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.00  
##                                                                               
##    seat_comf      plane_ent      onboard_srv       legroom         baggage     
##  Min.   :0.00   Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :1.000  
##  1st Qu.:2.00   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:3.000  
##  Median :4.00   Median :4.000   Median :4.000   Median :4.000   Median :4.000  
##  Mean   :3.44   Mean   :3.358   Mean   :3.383   Mean   :3.351   Mean   :3.632  
##  3rd Qu.:5.00   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:5.000  
##  Max.   :5.00   Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##                                                                                
##   checkin_srv      plane_srv         clean         dep_delay      
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :   0.00  
##  1st Qu.:3.000   1st Qu.:3.000   1st Qu.:2.000   1st Qu.:   0.00  
##  Median :3.000   Median :4.000   Median :3.000   Median :   0.00  
##  Mean   :3.304   Mean   :3.641   Mean   :3.286   Mean   :  14.75  
##  3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:  12.00  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :1592.00  
##                                                                   
##    arr_delay                        overall_sat   
##  Min.   :   0.00   neutral or dissatisfied:58697  
##  1st Qu.:   0.00   satisfied              :44897  
##  Median :   0.00                                  
##  Mean   :  15.18                                  
##  3rd Qu.:  13.00                                  
##  Max.   :1584.00                                  
##

#Selected a few estimates of parameters
raw_cloudrate %>%
  select(age, flight_dist, dep_delay, arr_delay, plane_wifi, food_drink, seat_comf, onboard_srv) %>%
  describeBy()

## Warning in describeBy(.): no grouping variable requested

##             vars      n    mean     sd median trimmed    mad min  max range
## age            1 103594   39.38  15.11     40   39.40  17.79   7   85    78
## flight_dist    2 103594 1189.33 997.30    842 1042.61 766.50  31 4983  4952
## dep_delay      3 103594   14.75  38.12      0    5.79   0.00   0 1592  1592
## arr_delay      4 103594   15.18  38.70      0    6.09   0.00   0 1584  1584
## plane_wifi     5 103594    2.73   1.33      3    2.70   1.48   0    5     5
## food_drink     6 103594    3.20   1.33      3    3.25   1.48   0    5     5
## seat_comf      7 103594    3.44   1.32      4    3.55   1.48   0    5     5
## onboard_srv    8 103594    3.38   1.29      4    3.48   1.48   0    5     5
##              skew kurtosis   se
## age          0.00    -0.72 0.05
## flight_dist  1.11     0.27 3.10
## dep_delay    6.77   101.46 0.12
## arr_delay    6.60    94.53 0.12
## plane_wifi   0.04    -0.85 0.00
## food_drink  -0.15    -1.15 0.00
## seat_comf   -0.48    -0.92 0.00
## onboard_srv -0.42    -0.89 0.00

Explanation of Some Parameters:

age (median): 50% of the sample size are aged 40 and below, while the remaining 50% are aged above 40.
dep_delay (mean): Passengers experienced an average 14.75 minutes delay in their departure flights.
arr_delay (mean): Passengers experienced an average 15.18 minutes delay upon their estimated arrival time.
onboard_srv (mean): Passengers rated an average of 3.38 out of 5 on their on-board service experience in their flights.
flight_dist (sd): There is significant variability among the flight distances, with a standard deviation of 997.30km. With its median lower than the mean, implies that there is a right-skewed distribution. This suggests that while many flights have shorter distances, a few long-distance flights are pulling the mean higher.

4) Hypothesis Testing

#Load the necessary packages for hypothesis testing
library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

library(ggpubr)
library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

## The following object is masked from 'package:dplyr':
## 
##     recode

library(pastecs)

## 
## Attaching package: 'pastecs'

## The following object is masked from 'package:tidyr':
## 
##     extract

## The following objects are masked from 'package:dplyr':
## 
##     first, last

library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

library(effectsize)

## 
## Attaching package: 'effectsize'

## The following objects are masked from 'package:rstatix':
## 
##     cohens_d, eta_squared

## The following object is masked from 'package:psych':
## 
##     phi

#Convert satisfaction levels into a binary numeric variable (1 = Satisfied, 0 = Neutral or Dissatisfied)
raw_cloudrate$sat_binary <- ifelse(raw_cloudrate$overall_sat == "satisfied", 1, 0)

#Convert class into binary numeric variable and rearrange column
raw_cloudrate$class_binary <- ifelse(raw_cloudrate$class == "Business", 1, 0)
raw_cloudrate <- raw_cloudrate %>% select(1:6, class_binary, everything())

#Create sample size due to large dataset (for Shapiro Test; n <= 5000)
set.seed(123)
sample_size <- raw_cloudrate[sample(nrow(raw_cloudrate), 500), ]

Paired T-test

Conditions & Assumptions:

Variable is numeric
Variable on the population is normally distributed

Research Question: Is there a significant difference between departure delays(dd) and arrival delays(ad)?

Research Hypotheses:

H0: No significant difference between departure and arrival delays (mean dd = mean ad = 0)
H1: Departure and arrival delays significantly differ (mean dd =/ mean ad)

#--------------------------------------
#Create a separate dataframe
paired_t_data <- sample_size %>% 
  select(c(dep_delay, arr_delay)) %>%
  mutate(difference = dep_delay - arr_delay)

describeBy(paired_t_data)

## Warning in describeBy(paired_t_data): no grouping variable requested

##            vars   n  mean    sd median trimmed  mad min max range  skew
## dep_delay     1 500 15.53 37.93      0    6.17 0.00   0 310   310  4.23
## arr_delay     2 500 15.30 37.69      0    6.10 0.00   0 307   307  4.30
## difference    3 500  0.23  9.61      0    0.58 1.48 -62  37    99 -1.23
##            kurtosis   se
## dep_delay     21.66 1.70
## arr_delay     21.87 1.69
## difference     7.76 0.43

#Visualisation for normality check
ggplot(paired_t_data, aes(x=difference)) +
  geom_histogram(binwidth = 10, color = "black") +
  xlab("Differences") +
  ylab("Count")

Explanation of Histogram

At a glance, most of the differences between departure delays and arrival delays are clustered around 0. This suggest that for many pairs, there is little to no difference. The distribution appears somewhat symmetric but slightly left-skewed, which may indicate that although the differences are centered around zero, there exist some larger positive and negative differences in the data as well. While this histogram may suggest normality to some extent, a Shapiro-Wilk test would provide a more accurate assessment.

#Using Shapiro Test
shapiro.test(paired_t_data$difference)

## 
##  Shapiro-Wilk normality test
## 
## data:  paired_t_data$difference
## W = 0.80737, p-value < 2.2e-16

Explanation of Shapiro Test

We reject null hypothesis (H0) that the data is normally distributed as p-value is <0.0001. Hence, we can conclude that the data did not meet the normality requirement to continue with T-test. A non-parametric testing would be more appropriate.

#T-test
t.test(paired_t_data$dep_delay, paired_t_data$arr_delay,
       paired = TRUE,
       alternative = "two.sided")

## 
##  Paired t-test
## 
## data:  paired_t_data$dep_delay and paired_t_data$arr_delay
## t = 0.53493, df = 499, p-value = 0.5929
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.6147604  1.0747604
## sample estimates:
## mean difference 
##            0.23

Explanation of T-test (Assuming there is normality)

If we were to assume normality is met: We fail to reject null hypothesis (H0) as p-value > 0.05. This suggests that departure delays and arrival delay have no significant difference and that its observed mean difference of 23 minutes is not statisically significant too.

#Wilcoxon Signed Rank Test (Non-parametric Test)
wilcox_rank <- wilcox.test(paired_t_data$dep_delay, paired_t_data$arr_delay,
            paired = TRUE,
            correct = FALSE,
            exact = FALSE,
            alternative = "two.sided")

wilcox_rank

## 
##  Wilcoxon signed rank test
## 
## data:  paired_t_data$dep_delay and paired_t_data$arr_delay
## V = 20331, p-value = 0.08863
## alternative hypothesis: true location shift is not equal to 0

Explanation of Wilcoxon Signed Rank Test

Going back to the actual test that the data did not meet the normality requirement, we fail to reject null hypothesis as the p-value is > 0.05. There is no significant evidence to show that the median difference between departure delays and arrival delays is different from 0. Even though the observed differences may still exist in the data, they are not statistically significant at 0.05 level.

With Wilcoxon signed-rank test being more robust to non-normality, this result further confirms the hypothesis that there is no significant difference between departure and arrival delays as it is consistent with the paired t-test above.

#Calculating effect size
effectsize(wilcox_rank)

## r (rank biserial) |        95% CI
## ---------------------------------
## 0.12              | [-0.03, 0.27]

interpret_rank_biserial(0.12, rules = "funder2019")

## [1] "small"
## (Rules: funder2019)

Interpretation of effect size

According to Funder (2019), the effect size value is 0.12 which is considered small. While there may be some difference between the groups, it is not very strong and an indication that the difference between the groups is minor. With its confidence interval ranging from -0.03 to 0.27, it could suggest that the effect might not be statistically meaningful as the interval includes zero.

Going back to the research question (1)

Is there a significant difference between departure delays(dd) and arrival delays(ad)?

No, with the statistical results from the paired t-test, it shows sufficient evidence that there is little to no significant difference between departure delays and arrival delays that indicates to affect satisfaction level of passengers.

Independent T-test

Conditions & Assumptions:

Variable is numeric
Distribution of the variable is normal in both population
Variable has the same variance. If it is violated, Welch correction to be applied.

Research Question: Does on-board service ratings differ significantly between satisfied and dissatisfied passengers?

Research Hypotheses:

H0: No significant difference in on-board service ratings between satisfied and dissatisfied passengers
H1: Satisfied passengers rate on-board service significantly higher than dissatisfied passengers

#Create a separate dataframe
indep_test <- sample_size %>% 
  select(c(onboard_srv, overall_sat))

describeBy(indep_test$onboard_srv, indep_test$overall_sat)

## 
##  Descriptive statistics by group 
## group: neutral or dissatisfied
##    vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 294 2.95 1.29      3    2.94 1.48   1   5     4 -0.15    -1.13 0.07
## ------------------------------------------------------------ 
## group: satisfied
##    vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 206 3.82 1.11      4    3.96 1.48   1   5     4 -0.87     0.14 0.08

#Visualisation for normality check
ggqqplot(indep_test,
         "onboard_srv",
         facet.by = "overall_sat")

Explanation of graph

In the Q-Q plot above for both groups (“neutral or dissatisfied” and “satisfied”), there are noticeable deviations from the diagonal line, especially at the ends. This may suggest that the distributions have some skewness or heavy tails, not perfectly follow a normal diagonal reference pattern. Additionally, the stepped trend in the middle clearly indicates that the data may be discrete rather than smoothly continuous due to the data collected being interval numeric. There are also outliers as seen on both graphs, further confirms that the data deviates from normality.

Hence, from the visualisation, it can be seen that the variables do not follow a normal distribution and violated normality. In this case, a non-parametric test would be more suitable to continue on our statistical analysis.

#Using Shapiro Test
indep_test %>%
  group_by(overall_sat) %>%
  shapiro_test(onboard_srv)

## # A tibble: 2 × 4
##   overall_sat             variable    statistic        p
##   <fct>                   <chr>           <dbl>    <dbl>
## 1 neutral or dissatisfied onboard_srv     0.897 2.85e-13
## 2 satisfied               onboard_srv     0.848 2.12e-13

Explanation of Shapiro Test

We reject null hypothesis (H0) that the data is normally distributed as p-value is <0.0001. This suggest that on-board service ratings are not normally distributed in either of the satisfaction groups (i.e. “satisfied” and “neutral or dissatisfied”), not meeting to the normality requirement to continue with usual T-test. Since normality s violated, a non-parametric testing would be more appropriate.

#Test for same variance
leveneTest(indep_test$onboard_srv, indep_test$overall_sat)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   1  13.082 0.0003285 ***
##       498                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Explanation of Levene’s Test

We reject null hypothesis as the p-value < 0.0001. This suggests that the variances of both groups are significantly not equal, not meeting up to the requirement of having same variance to continue with the parametric t-test. Hence, we should use non-parametric testing for better analysis.

#Independent T-test (For parametric testing)
t.test(indep_test$onboard_srv ~ indep_test$overall_sat,
       var.equal = TRUE,
       alternative = "two.sided")

## 
##  Two Sample t-test
## 
## data:  indep_test$onboard_srv by indep_test$overall_sat
## t = -7.8751, df = 498, p-value = 2.148e-14
## alternative hypothesis: true difference in means between group neutral or dissatisfied and group satisfied is not equal to 0
## 95 percent confidence interval:
##  -1.0888137 -0.6540038
## sample estimates:
## mean in group neutral or dissatisfied               mean in group satisfied 
##                              2.948980                              3.820388

Explanation of Independent T-test

If we were to assume normality & same variance is met: We reject null hypothesis that the onboard service ratings between satisfied and dissatisfied passengers are equal as the p-value < 0.0001. Hence, there is a strong evidence that passengers who are satisfied rated significantly higher onboard service score as compared to passengers who are neutral or dissatisfied.

#Wilcoxon Rank Sum Test (For non-parametric testing)
wilcox.test(indep_test$onboard_srv ~ indep_test$overall_sat,
            correct = FALSE,
            exact = FALSE,
            altenative = "two.sided")

## 
##  Wilcoxon rank sum test
## 
## data:  indep_test$onboard_srv by indep_test$overall_sat
## W = 18691, p-value = 6.25e-14
## alternative hypothesis: true location shift is not equal to 0

Explanation of Wilcoxon Rank Sum Test

Going back to the actual test that the data did not meet the normality requirement, we reject null hypothesis as the p-value < 0.0001 which indicates that there is a significant difference in the ratings of onboard service ratings between satisfied and dissatisfied groups. As the Wilcoxon test do not assume normality, it further reinforces the result from the usual T-test above that there is a significant difference in the onboard service ratings between both satisfaction level groups.

#Calculating effect size
effectsize::cohens_d(indep_test$onboard_srv ~ indep_test$overall_sat,
                     pooled_sd = FALSE)

## Cohen's d |         95% CI
## --------------------------
## -0.72     | [-0.91, -0.54]
## 
## - Estimated using un-pooled SD.

interpret_cohens_d(0.72, rules = "sawilowsky2009")

## [1] "medium"
## (Rules: sawilowsky2009)

Interpretation of effect size

According to Sawilowsky (2009), the effect size of 0.72 suggests a moderate to large difference in onboard service ratings between the 2 satisfaction groups. We can infer that the passengers who are “satisfied” would tend to give notably higher onboard service ratings as compared to those who are “neutral or dissatisfied”. Even though the effect is not huge, it is substantially significant to indicate that onboard service does have an impact on passenger satisfaction level.

Going back to the research question (2)

Does on-board service ratings differ significantly between satisfied and dissatisfied passengers?

Yes, with the statistical results from the independent t-test, it shows sufficient evidence that on-board service ratings differ significantly between satisfied and dissatisfied passengers.

Conclusion

Based on both hypotheses, I believe that non-parametric methods would be more appropriate for my dataset. This is because the data is closely related to social factors, such as satisfaction, which can vary greatly from person to person depending on individual priorities and needs.