The task encompassed several key components, including data cleaning, demographic analysis, Likert scale analysis, predictive modelling using ordinal logistic regression, and drawing conclusions based on the findings.
Data Cleaning The data cleaning phase involved preparing the dataset for analysis by addressing issues such as missing values, outliers, and inconsistencies.
Demographic Analysis Demographic analysis involved examining the characteristics of the hotel customers, such as age, gender.
Likert Analysis The Likert scale analysis likely focused on understanding customer opinions, attitudes, and satisfaction levels using Likert scale questions. This involved assessing the level of agreement or disagreement.
Predictive Modelling with Ordinal Logistic Regression The task involved building a predictive model using ordinal logistic regression to understand the factors influencing customer satisfaction in European hotels.
Conclusion Based on the analyses conducted, the task likely culminated in drawing conclusions about the factors that significantly influence customer satisfaction in European hotels.
By integrating data cleaning, demographic analysis, Likert scale analysis, and predictive modelling using ordinal logistic regression, the task aimed to provide a comprehensive understanding of customer satisfaction in European hotels and derive actionable insights from the findings.
The dataset is from Kaggle. Here is the link to the dataset Europe Hotel Customer Satisfaction
Age - 7 to 85
purpose_of_travel - aviation, academic, personal, business, tourism.
Type of Travel - Group travel, Personal Travel.
Type Of Booking - Group bookings, Individual/Couple.
Hotel wifi service - Ratings out of 5.
Departure/Arrival convenience - Ratings out of 5.
Ease of On-line booking - Ratings out of 5.
Hotel location - Ratings out of 5.
Food and drink - Ratings out of 5.
Stay comfort - Ratings out of 5.
Common Room entertainment - Ratings out of 5.
Check-in/Checkout service - Ratings out of 5.
Other service - Ratings out of 5.
Cleanliness - Ratings out of 5.
satisfaction - satisfied, neutral or dissatisfied.
id | Gender | Age | purpose_of_travel | Type.of.Travel | Type.Of.Booking | Hotel.wifi.service | Departure.Arrival..convenience | Ease.of.Online.booking | Hotel.location | Food.and.drink | Stay.comfort | Common.Room.entertainment | Checkin.Checkout.service | Other.service | Cleanliness | satisfaction |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
70172 | Male | 13 | aviation | Personal Travel | Not defined | 3 | 4 | 3 | 1 | 5 | 5 | 5 | 4 | 5 | 5 | neutral or dissatisfied |
5047 | Male | 25 | tourism | Group Travel | Group bookings | 3 | 2 | 3 | 3 | 1 | 1 | 1 | 1 | 4 | 1 | neutral or dissatisfied |
110028 | Female | 26 | tourism | Group Travel | Group bookings | 2 | 2 | 2 | 2 | 5 | 5 | 5 | 4 | 4 | 5 | satisfied |
24026 | Female | 25 | tourism | Group Travel | Group bookings | 2 | 5 | 5 | 5 | 2 | 2 | 2 | 1 | 4 | 2 | neutral or dissatisfied |
119299 | Male | 61 | aviation | Group Travel | Group bookings | 3 | 3 | 3 | 3 | 4 | 5 | 3 | 3 | 3 | 3 | satisfied |
111157 | Female | 26 | business | Personal Travel | Individual/Couple | 3 | 4 | 2 | 1 | 1 | 1 | 1 | 4 | 4 | 1 | neutral or dissatisfied |
82113 | Male | 47 | academic | Personal Travel | Individual/Couple | 2 | 4 | 2 | 3 | 2 | 2 | 2 | 3 | 5 | 2 | neutral or dissatisfied |
96462 | Female | 52 | aviation | Group Travel | Group bookings | 4 | 3 | 4 | 4 | 5 | 5 | 5 | 4 | 5 | 4 | satisfied |
79485 | Female | 41 | tourism | Group Travel | Group bookings | 1 | 2 | 2 | 2 | 4 | 3 | 1 | 4 | 1 | 2 | neutral or dissatisfied |
65725 | Male | 20 | academic | Group Travel | Individual/Couple | 3 | 3 | 3 | 4 | 2 | 3 | 2 | 4 | 3 | 2 | neutral or dissatisfied |
library(janitor)
europe=clean_names(europe)
europe %>% names %>% as.data.frame() %>%
rename("column names"=".") %>% kable()
column names |
---|
id |
gender |
age |
purpose_of_travel |
type_of_travel |
type_of_booking |
hotel_wifi_service |
departure_arrival_convenience |
ease_of_online_booking |
hotel_location |
food_and_drink |
stay_comfort |
common_room_entertainment |
checkin_checkout_service |
other_service |
cleanliness |
satisfaction |
## 'data.frame': 103904 obs. of 16 variables:
## $ gender : chr "Male" "Male" "Female" "Female" ...
## $ age : int 13 25 26 25 61 26 47 52 41 20 ...
## $ purpose_of_travel : chr "aviation" "tourism" "tourism" "tourism" ...
## $ type_of_travel : chr "Personal Travel" "Group Travel" "Group Travel" "Group Travel" ...
## $ type_of_booking : chr "Not defined" "Group bookings" "Group bookings" "Group bookings" ...
## $ hotel_wifi_service : int 3 3 2 2 3 3 2 4 1 3 ...
## $ departure_arrival_convenience: int 4 2 2 5 3 4 4 3 2 3 ...
## $ ease_of_online_booking : int 3 3 2 5 3 2 2 4 2 3 ...
## $ hotel_location : int 1 3 2 5 3 1 3 4 2 4 ...
## $ food_and_drink : int 5 1 5 2 4 1 2 5 4 2 ...
## $ stay_comfort : int 5 1 5 2 5 1 2 5 3 3 ...
## $ common_room_entertainment : int 5 1 5 2 3 1 2 5 1 2 ...
## $ checkin_checkout_service : int 4 1 4 1 3 4 3 4 4 4 ...
## $ other_service : int 5 4 4 4 3 4 5 5 1 3 ...
## $ cleanliness : int 5 1 5 2 3 1 2 4 2 2 ...
## $ satisfaction : chr "neutral or dissatisfied" "neutral or dissatisfied" "satisfied" "neutral or dissatisfied" ...
x | |
---|---|
gender | 0 |
age | 0 |
purpose_of_travel | 0 |
type_of_travel | 0 |
type_of_booking | 0 |
hotel_wifi_service | 0 |
departure_arrival_convenience | 0 |
ease_of_online_booking | 0 |
hotel_location | 0 |
food_and_drink | 0 |
stay_comfort | 0 |
common_room_entertainment | 0 |
checkin_checkout_service | 0 |
other_service | 0 |
cleanliness | 0 |
satisfaction | 0 |
library(ggplot2)
library(ggthemes)
invisible(ggplot(europe_dem,aes(x=age,fill=factor(age)))+
geom_bar(stat="count",width=0.5,show.legend = F)+
theme_bw()+labs(title="Age Distribution",y="Frequenc"))+
ggthemes::theme_calc()
gender | Freq |
---|---|
Female | 52727 |
Male | 51177 |
library(plotly)
td=ggplot(europe_dem,aes(purpose_of_travel,fill=type_of_travel))+
geom_bar(position = "dodge",stat="count")+theme_bw() +
labs(fill="Type of Travel",title = "Purpose of Travel Distribution",
y="Frequency",x="Purpose of Travel")+
ggthemes::theme_economist(base_size = 5)
td
Most customers travel as a group rather than individually.
Customers who travel for the purpose of tourism they travel as a group dominating other purposes of travel.
europe_dem %>% dplyr::select(type_of_booking) %>%
table() %>% as.data.frame() %>% rename("Type of booking"="type_of_booking",
"Frequency"="Freq") %>%
kable()
Type of booking | Frequency |
---|---|
Group bookings | 49665 |
Individual/Couple | 46745 |
Not defined | 7494 |
europe_dem %>% dplyr::select(satisfaction) %>%
table() %>% as.data.frame() %>% rename("Satisfaction"="satisfaction",
"Frequency"="Freq") %>%
kable()
Satisfaction | Frequency |
---|---|
neutral or dissatisfied | 58879 |
satisfied | 45025 |
library(likert)
library(ggthemes)
hws=as.factor(europe_num$hotel_wifi_service)
dac=as.factor(europe_num$departure_arrival_convenience)
eoob=as.factor(europe_num$ease_of_online_booking)
hl=as.factor(europe_num$hotel_location)
fad=as.factor(europe_num$food_and_drink)
sc=as.factor(europe_num$stay_comfort)
cre=as.factor(europe_num$common_room_entertainment)
ccs=as.factor(europe_num$checkin_checkout_service)
os=as.factor(europe_num$other_service)
c=as.factor(europe_num$cleanliness)
new_lik=data.frame(hotel_wifi_Service=hws,
departure_arrival_convinience=dac,
ease_of_online_booking=eoob,
hotel_location=hl,
food_and_drink=fad,
stay_comfort=sc,
common_room_entertainment=cre,
checkin_checkout_service=ccs,
other_service=os,
cleanliness=c)
lik=likert(new_lik)
invisible(likert.bar.plot(lik)+theme(legend.position = "bottom")+
theme_bw(base_size = 10)+
labs(title = "Respondents Distribution",
subtitle="0=not applicable,1=very dissatisfied,2=dissatisfied,3=neutral,4=satisfied,5=very satisfied"))+
ggthemes::theme_fivethirtyeight()
Customers are satisfied with the following services:
other services, check-in check-out service, stay comfort, cleanliness and common room entertainment
Customers are neither dissatisfied or satisfied with the following services:
food and drink, hotel location, departure or arrival convenience, hotel WiFi and ease of online booking
Suggestion: The management of the hotel should enhance the following services ,food and drink, hotel location, departure or arrival convenience, hotel WiFi and ease of online booking in order to increase customer satisfaction and loyalty.
NB. Ordinal logistic regression requires three levels of factors on the response variable and this dataset has two levels of satisfaction on the response variable hence application of binary logistic regression
library(MASS)
europe_model=europe %>%
dplyr::select(6:16)
europe_model$satisfaction=ifelse(europe_model$satisfaction=="satisfied",
1,0)
model=glm(satisfaction ~ .,data=europe_model)
summary(model)
##
## Call:
## glm(formula = satisfaction ~ ., data = europe_model)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.444257 0.006781 -65.519 < 2e-16 ***
## hotel_wifi_service 0.079337 0.001450 54.709 < 2e-16 ***
## departure_arrival_convenience -0.056248 0.001007 -55.868 < 2e-16 ***
## ease_of_online_booking 0.032373 0.001448 22.358 < 2e-16 ***
## hotel_location -0.012301 0.001206 -10.204 < 2e-16 ***
## food_and_drink -0.038266 0.001410 -27.147 < 2e-16 ***
## stay_comfort 0.072424 0.001432 50.583 < 2e-16 ***
## common_room_entertainment 0.082391 0.001743 47.266 < 2e-16 ***
## checkin_checkout_service 0.062746 0.001097 57.184 < 2e-16 ***
## other_service 0.038369 0.001355 28.325 < 2e-16 ***
## cleanliness 0.009213 0.001644 5.605 2.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1751888)
##
## Null deviance: 25514 on 103903 degrees of freedom
## Residual deviance: 18201 on 103893 degrees of freedom
## AIC: 113890
##
## Number of Fisher Scoring iterations: 2
All predictor variables are significant and most of them have a positive relationship with the satisfaction level of customers.
The estimate of 0.07937 for the “hotel wifi service” predictor in the binary logistic regression output indicates the change in the log odds of customers being satisfied for a one unit increase in the quality or availability of hotel wifi service, holding all other variables constant.
model$coefficients %>% as.data.frame() %>%
rename("Service"=,"Magnitude"=".") %>%
arrange(desc(Magnitude)) %>% kable()
Magnitude | |
---|---|
common_room_entertainment | 0.0823914 |
hotel_wifi_service | 0.0793373 |
stay_comfort | 0.0724241 |
checkin_checkout_service | 0.0627459 |
other_service | 0.0383686 |
ease_of_online_booking | 0.0323731 |
cleanliness | 0.0092125 |
hotel_location | -0.0123009 |
food_and_drink | -0.0382663 |
departure_arrival_convenience | -0.0562483 |
(Intercept) | -0.4442572 |
The most important services influencing customer satisfaction are common room entertainment, hotel wifi service, stay comfort, check-in check-out service, other service, ease of online booking and cleanliness
Suggestion: The customer service department of the hotel should keep on increasing their delivery of the above service to increase customer satisfaction levels. They should also try to put more effort in making sure that their delivery of services with negative relationship to be excellent so as to enhance the customer satisfaction levels.
library(knitr)
include_graphics("europe.jpg")
knitr::opts_chunk$set(echo = T, message = F, warning = F)
europe=read.csv(file.choose())
library(knitr)
library(dplyr)
europe %>% head(10) %>% kable(caption = "First 10 rows")
library(janitor)
europe=clean_names(europe)
europe %>% names %>% as.data.frame() %>%
rename("column names"=".") %>% kable()
europe=europe[2:17] #dropped id column
europe %>% str()
colSums(is.na(europe)) %>% kable()
anyDuplicated.default(europe)
europe_dem=europe %>%
dplyr::select(1:5,16)
europe_num=europe %>%
dplyr::select(6:15)
library(ggplot2)
library(ggthemes)
invisible(ggplot(europe_dem,aes(x=age,fill=factor(age)))+
geom_bar(stat="count",width=0.5,show.legend = F)+
theme_bw()+labs(title="Age Distribution",y="Frequenc"))+
ggthemes::theme_calc()
europe_dem %>% dplyr::select(gender) %>%
table() %>% kable()
library(plotly)
td=ggplot(europe_dem,aes(purpose_of_travel,fill=type_of_travel))+
geom_bar(position = "dodge",stat="count")+theme_bw() +
labs(fill="Type of Travel",title = "Purpose of Travel Distribution",
y="Frequency",x="Purpose of Travel")+
ggthemes::theme_economist(base_size = 5)
td
europe_dem %>% dplyr::select(type_of_booking) %>%
table() %>% as.data.frame() %>% rename("Type of booking"="type_of_booking",
"Frequency"="Freq") %>%
kable()
europe_dem %>% dplyr::select(satisfaction) %>%
table() %>% as.data.frame() %>% rename("Satisfaction"="satisfaction",
"Frequency"="Freq") %>%
kable()
library(likert)
library(ggthemes)
hws=as.factor(europe_num$hotel_wifi_service)
dac=as.factor(europe_num$departure_arrival_convenience)
eoob=as.factor(europe_num$ease_of_online_booking)
hl=as.factor(europe_num$hotel_location)
fad=as.factor(europe_num$food_and_drink)
sc=as.factor(europe_num$stay_comfort)
cre=as.factor(europe_num$common_room_entertainment)
ccs=as.factor(europe_num$checkin_checkout_service)
os=as.factor(europe_num$other_service)
c=as.factor(europe_num$cleanliness)
new_lik=data.frame(hotel_wifi_Service=hws,
departure_arrival_convinience=dac,
ease_of_online_booking=eoob,
hotel_location=hl,
food_and_drink=fad,
stay_comfort=sc,
common_room_entertainment=cre,
checkin_checkout_service=ccs,
other_service=os,
cleanliness=c)
lik=likert(new_lik)
invisible(likert.bar.plot(lik)+theme(legend.position = "bottom")+
theme_bw(base_size = 10)+
labs(title = "Respondents Distribution",
subtitle="0=not applicable,1=very dissatisfied,2=dissatisfied,3=neutral,4=satisfied,5=very satisfied"))+
ggthemes::theme_fivethirtyeight()
library(MASS)
europe_model=europe %>%
dplyr::select(6:16)
europe_model$satisfaction=ifelse(europe_model$satisfaction=="satisfied",
1,0)
model=glm(satisfaction ~ .,data=europe_model)
summary(model)
model$coefficients %>% as.data.frame() %>%
rename("Service"=,"Magnitude"=".") %>%
arrange(desc(Magnitude)) %>% kable()