Hotels are considered to be the most volatile industry,since there is a lot of uncertainty in demand. Hence they need to plan the best strategies always to keep the numbers moving and keep the business growing.
The objective of this project is to analyze the open hotel booking demand dataset from Antonio, Almeida and Nunes, 2019 and find the critical factors from the past data which may help for the future planning of the business and thus to improve their revenue.
By this analysis, I would like to help the hotels understand the following
1. Which is the most preferred Hotel type by the guests ?
2. Which is the Busiest Month?
3. Which is the most preferred Meal Type?
4. Which country are most of the visitors from ?
5. What is the most preferred Room Type?
6. What are the factors to concentrate on to increase the Revenue?
7. What has been the Cancellation percentage?
I would start by Importing the data, Cleaning it and checking for completeness and then find the relationships between various factors, visually analyze the relationship between various columns and build models utilizing various functions and packages available in R.
This analysis would help the consumer to understand all the factors affecting the Business, prepare for the future bookings accordingly and thus improve their revenue.
The packages used for the analysis are
library(readxl)
library(tidyverse)
library(lubridate)
library(psych)
library(DT)
Data Source
data_dictionary <- read_excel("D:/UC BANA/Fall 2020/Data Wrangling/Final Project/data_dictionary_hotels.xlsx")
attach(data_dictionary)
datatable(
head(data_dictionary,40),
extensions = 'FixedColumns',
options = list(
scrollY = "400px",
scrollX = TRUE,
fixedColumns = TRUE
)
)
To import the data
data <- read.csv("D:/UC BANA/Fall 2020/Data Wrangling/Final Project/hotels.csv", stringsAsFactors = FALSE )
To check the structure of the data
str(data)
## 'data.frame': 119390 obs. of 32 variables:
## $ hotel : chr "Resort Hotel" "Resort Hotel" "Resort Hotel" "Resort Hotel" ...
## $ is_canceled : int 0 0 0 0 0 0 0 0 1 1 ...
## $ lead_time : int 342 737 7 13 14 14 0 9 85 75 ...
## $ arrival_date_year : int 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
## $ arrival_date_month : chr "July" "July" "July" "July" ...
## $ arrival_date_week_number : int 27 27 27 27 27 27 27 27 27 27 ...
## $ arrival_date_day_of_month : int 1 1 1 1 1 1 1 1 1 1 ...
## $ stays_in_weekend_nights : int 0 0 0 0 0 0 0 0 0 0 ...
## $ stays_in_week_nights : int 0 0 1 1 2 2 2 2 3 3 ...
## $ adults : int 2 2 1 1 2 2 2 2 2 2 ...
## $ children : int 0 0 0 0 0 0 0 0 0 0 ...
## $ babies : int 0 0 0 0 0 0 0 0 0 0 ...
## $ meal : chr "BB" "BB" "BB" "BB" ...
## $ country : chr "PRT" "PRT" "GBR" "GBR" ...
## $ market_segment : chr "Direct" "Direct" "Direct" "Corporate" ...
## $ distribution_channel : chr "Direct" "Direct" "Direct" "Corporate" ...
## $ is_repeated_guest : int 0 0 0 0 0 0 0 0 0 0 ...
## $ previous_cancellations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ previous_bookings_not_canceled: int 0 0 0 0 0 0 0 0 0 0 ...
## $ reserved_room_type : chr "C" "C" "A" "A" ...
## $ assigned_room_type : chr "C" "C" "C" "A" ...
## $ booking_changes : int 3 4 0 0 0 0 0 0 0 0 ...
## $ deposit_type : chr "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
## $ agent : chr "NULL" "NULL" "NULL" "304" ...
## $ company : chr "NULL" "NULL" "NULL" "NULL" ...
## $ days_in_waiting_list : int 0 0 0 0 0 0 0 0 0 0 ...
## $ customer_type : chr "Transient" "Transient" "Transient" "Transient" ...
## $ adr : num 0 0 75 75 98 ...
## $ required_car_parking_spaces : int 0 0 0 0 0 0 0 0 0 0 ...
## $ total_of_special_requests : int 0 0 0 0 1 1 0 1 1 0 ...
## $ reservation_status : chr "Check-Out" "Check-Out" "Check-Out" "Check-Out" ...
## $ reservation_status_date : chr "2015-07-01" "2015-07-01" "2015-07-02" "2015-07-02" ...
There are 119390 observations of 32 variables. All the Data types seem appropriate. Hence I am not changing any Data types.
To view the raw data
head(data, 10)
## hotel is_canceled lead_time arrival_date_year arrival_date_month
## 1 Resort Hotel 0 342 2015 July
## 2 Resort Hotel 0 737 2015 July
## 3 Resort Hotel 0 7 2015 July
## 4 Resort Hotel 0 13 2015 July
## 5 Resort Hotel 0 14 2015 July
## 6 Resort Hotel 0 14 2015 July
## 7 Resort Hotel 0 0 2015 July
## 8 Resort Hotel 0 9 2015 July
## 9 Resort Hotel 1 85 2015 July
## 10 Resort Hotel 1 75 2015 July
## arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
## 1 27 1 0
## 2 27 1 0
## 3 27 1 0
## 4 27 1 0
## 5 27 1 0
## 6 27 1 0
## 7 27 1 0
## 8 27 1 0
## 9 27 1 0
## 10 27 1 0
## stays_in_week_nights adults children babies meal country market_segment
## 1 0 2 0 0 BB PRT Direct
## 2 0 2 0 0 BB PRT Direct
## 3 1 1 0 0 BB GBR Direct
## 4 1 1 0 0 BB GBR Corporate
## 5 2 2 0 0 BB GBR Online TA
## 6 2 2 0 0 BB GBR Online TA
## 7 2 2 0 0 BB PRT Direct
## 8 2 2 0 0 FB PRT Direct
## 9 3 2 0 0 BB PRT Online TA
## 10 3 2 0 0 HB PRT Offline TA/TO
## distribution_channel is_repeated_guest previous_cancellations
## 1 Direct 0 0
## 2 Direct 0 0
## 3 Direct 0 0
## 4 Corporate 0 0
## 5 TA/TO 0 0
## 6 TA/TO 0 0
## 7 Direct 0 0
## 8 Direct 0 0
## 9 TA/TO 0 0
## 10 TA/TO 0 0
## previous_bookings_not_canceled reserved_room_type assigned_room_type
## 1 0 C C
## 2 0 C C
## 3 0 A C
## 4 0 A A
## 5 0 A A
## 6 0 A A
## 7 0 C C
## 8 0 C C
## 9 0 A A
## 10 0 D D
## booking_changes deposit_type agent company days_in_waiting_list
## 1 3 No Deposit NULL NULL 0
## 2 4 No Deposit NULL NULL 0
## 3 0 No Deposit NULL NULL 0
## 4 0 No Deposit 304 NULL 0
## 5 0 No Deposit 240 NULL 0
## 6 0 No Deposit 240 NULL 0
## 7 0 No Deposit NULL NULL 0
## 8 0 No Deposit 303 NULL 0
## 9 0 No Deposit 240 NULL 0
## 10 0 No Deposit 15 NULL 0
## customer_type adr required_car_parking_spaces total_of_special_requests
## 1 Transient 0.0 0 0
## 2 Transient 0.0 0 0
## 3 Transient 75.0 0 0
## 4 Transient 75.0 0 0
## 5 Transient 98.0 0 1
## 6 Transient 98.0 0 1
## 7 Transient 107.0 0 0
## 8 Transient 103.0 0 1
## 9 Transient 82.0 0 1
## 10 Transient 105.5 0 0
## reservation_status reservation_status_date
## 1 Check-Out 2015-07-01
## 2 Check-Out 2015-07-01
## 3 Check-Out 2015-07-02
## 4 Check-Out 2015-07-02
## 5 Check-Out 2015-07-03
## 6 Check-Out 2015-07-03
## 7 Check-Out 2015-07-03
## 8 Check-Out 2015-07-03
## 9 Canceled 2015-05-06
## 10 Canceled 2015-04-22
To check for the missing values in the dataset
colSums(is.na(data))
## hotel is_canceled
## 0 0
## lead_time arrival_date_year
## 0 0
## arrival_date_month arrival_date_week_number
## 0 0
## arrival_date_day_of_month stays_in_weekend_nights
## 0 0
## stays_in_week_nights adults
## 0 0
## children babies
## 4 0
## meal country
## 0 0
## market_segment distribution_channel
## 0 0
## is_repeated_guest previous_cancellations
## 0 0
## previous_bookings_not_canceled reserved_room_type
## 0 0
## assigned_room_type booking_changes
## 0 0
## deposit_type agent
## 0 0
## company days_in_waiting_list
## 0 0
## customer_type adr
## 0 0
## required_car_parking_spaces total_of_special_requests
## 0 0
## reservation_status reservation_status_date
## 0 0
It shows that there are 4 missing values in only one column “Number of children”. The number of incomplete rows are very less and can be neglected. Here I am planning to assume the number of children is 0.
data$children[is.na(data$children)] <- 0
Let us check for the missing values after imputing
sum(is.na(data))
## [1] 0
To check the summary of the data
summary(data)
## hotel is_canceled lead_time arrival_date_year
## Length:119390 Min. :0.0000 Min. : 0 Min. :2015
## Class :character 1st Qu.:0.0000 1st Qu.: 18 1st Qu.:2016
## Mode :character Median :0.0000 Median : 69 Median :2016
## Mean :0.3704 Mean :104 Mean :2016
## 3rd Qu.:1.0000 3rd Qu.:160 3rd Qu.:2017
## Max. :1.0000 Max. :737 Max. :2017
## arrival_date_month arrival_date_week_number arrival_date_day_of_month
## Length:119390 Min. : 1.00 Min. : 1.0
## Class :character 1st Qu.:16.00 1st Qu.: 8.0
## Mode :character Median :28.00 Median :16.0
## Mean :27.17 Mean :15.8
## 3rd Qu.:38.00 3rd Qu.:23.0
## Max. :53.00 Max. :31.0
## stays_in_weekend_nights stays_in_week_nights adults
## Min. : 0.0000 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 1.0 1st Qu.: 2.000
## Median : 1.0000 Median : 2.0 Median : 2.000
## Mean : 0.9276 Mean : 2.5 Mean : 1.856
## 3rd Qu.: 2.0000 3rd Qu.: 3.0 3rd Qu.: 2.000
## Max. :19.0000 Max. :50.0 Max. :55.000
## children babies meal country
## Min. : 0.0000 Min. : 0.000000 Length:119390 Length:119390
## 1st Qu.: 0.0000 1st Qu.: 0.000000 Class :character Class :character
## Median : 0.0000 Median : 0.000000 Mode :character Mode :character
## Mean : 0.1039 Mean : 0.007949
## 3rd Qu.: 0.0000 3rd Qu.: 0.000000
## Max. :10.0000 Max. :10.000000
## market_segment distribution_channel is_repeated_guest
## Length:119390 Length:119390 Min. :0.00000
## Class :character Class :character 1st Qu.:0.00000
## Mode :character Mode :character Median :0.00000
## Mean :0.03191
## 3rd Qu.:0.00000
## Max. :1.00000
## previous_cancellations previous_bookings_not_canceled reserved_room_type
## Min. : 0.00000 Min. : 0.0000 Length:119390
## 1st Qu.: 0.00000 1st Qu.: 0.0000 Class :character
## Median : 0.00000 Median : 0.0000 Mode :character
## Mean : 0.08712 Mean : 0.1371
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. :26.00000 Max. :72.0000
## assigned_room_type booking_changes deposit_type agent
## Length:119390 Min. : 0.0000 Length:119390 Length:119390
## Class :character 1st Qu.: 0.0000 Class :character Class :character
## Mode :character Median : 0.0000 Mode :character Mode :character
## Mean : 0.2211
## 3rd Qu.: 0.0000
## Max. :21.0000
## company days_in_waiting_list customer_type adr
## Length:119390 Min. : 0.000 Length:119390 Min. : -6.38
## Class :character 1st Qu.: 0.000 Class :character 1st Qu.: 69.29
## Mode :character Median : 0.000 Mode :character Median : 94.58
## Mean : 2.321 Mean : 101.83
## 3rd Qu.: 0.000 3rd Qu.: 126.00
## Max. :391.000 Max. :5400.00
## required_car_parking_spaces total_of_special_requests reservation_status
## Min. :0.00000 Min. :0.0000 Length:119390
## 1st Qu.:0.00000 1st Qu.:0.0000 Class :character
## Median :0.00000 Median :0.0000 Mode :character
## Mean :0.06252 Mean :0.5714
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :8.00000 Max. :5.0000
## reservation_status_date
## Length:119390
## Class :character
## Mode :character
##
##
##
From the summary of the data I could only understand the integer columns, To understand the character columns I will further do the exploratory data analysis.
To show the clean data
head(data,10)
## hotel is_canceled lead_time arrival_date_year arrival_date_month
## 1 Resort Hotel 0 342 2015 July
## 2 Resort Hotel 0 737 2015 July
## 3 Resort Hotel 0 7 2015 July
## 4 Resort Hotel 0 13 2015 July
## 5 Resort Hotel 0 14 2015 July
## 6 Resort Hotel 0 14 2015 July
## 7 Resort Hotel 0 0 2015 July
## 8 Resort Hotel 0 9 2015 July
## 9 Resort Hotel 1 85 2015 July
## 10 Resort Hotel 1 75 2015 July
## arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
## 1 27 1 0
## 2 27 1 0
## 3 27 1 0
## 4 27 1 0
## 5 27 1 0
## 6 27 1 0
## 7 27 1 0
## 8 27 1 0
## 9 27 1 0
## 10 27 1 0
## stays_in_week_nights adults children babies meal country market_segment
## 1 0 2 0 0 BB PRT Direct
## 2 0 2 0 0 BB PRT Direct
## 3 1 1 0 0 BB GBR Direct
## 4 1 1 0 0 BB GBR Corporate
## 5 2 2 0 0 BB GBR Online TA
## 6 2 2 0 0 BB GBR Online TA
## 7 2 2 0 0 BB PRT Direct
## 8 2 2 0 0 FB PRT Direct
## 9 3 2 0 0 BB PRT Online TA
## 10 3 2 0 0 HB PRT Offline TA/TO
## distribution_channel is_repeated_guest previous_cancellations
## 1 Direct 0 0
## 2 Direct 0 0
## 3 Direct 0 0
## 4 Corporate 0 0
## 5 TA/TO 0 0
## 6 TA/TO 0 0
## 7 Direct 0 0
## 8 Direct 0 0
## 9 TA/TO 0 0
## 10 TA/TO 0 0
## previous_bookings_not_canceled reserved_room_type assigned_room_type
## 1 0 C C
## 2 0 C C
## 3 0 A C
## 4 0 A A
## 5 0 A A
## 6 0 A A
## 7 0 C C
## 8 0 C C
## 9 0 A A
## 10 0 D D
## booking_changes deposit_type agent company days_in_waiting_list
## 1 3 No Deposit NULL NULL 0
## 2 4 No Deposit NULL NULL 0
## 3 0 No Deposit NULL NULL 0
## 4 0 No Deposit 304 NULL 0
## 5 0 No Deposit 240 NULL 0
## 6 0 No Deposit 240 NULL 0
## 7 0 No Deposit NULL NULL 0
## 8 0 No Deposit 303 NULL 0
## 9 0 No Deposit 240 NULL 0
## 10 0 No Deposit 15 NULL 0
## customer_type adr required_car_parking_spaces total_of_special_requests
## 1 Transient 0.0 0 0
## 2 Transient 0.0 0 0
## 3 Transient 75.0 0 0
## 4 Transient 75.0 0 0
## 5 Transient 98.0 0 1
## 6 Transient 98.0 0 1
## 7 Transient 107.0 0 0
## 8 Transient 103.0 0 1
## 9 Transient 82.0 0 1
## 10 Transient 105.5 0 0
## reservation_status reservation_status_date
## 1 Check-Out 2015-07-01
## 2 Check-Out 2015-07-01
## 3 Check-Out 2015-07-02
## 4 Check-Out 2015-07-02
## 5 Check-Out 2015-07-03
## 6 Check-Out 2015-07-03
## 7 Check-Out 2015-07-03
## 8 Check-Out 2015-07-03
## 9 Canceled 2015-05-06
## 10 Canceled 2015-04-22
To show the distribution of the data
Histograms are used to show the distribution of the data for the numeric variables.
multi.hist(data[,sapply(data, is.numeric)])
The observations made from the above histograms and the numerical data summary are :
To check for the outliers visually
Boxplots are used to identify the outliers.
par(mfrow = c(5,4), oma = c(1,1,0,0) + 0.1, mar = c(3,3,1,1) + 0.1)
boxplot(data$is_canceled, main = "is_canceled")
boxplot(data$lead_time, main = "lead_time")
boxplot(data$arrival_date_year, main = "arrival_date_year")
boxplot(data$arrival_date_week_number, main = "arrival_date_week_number")
boxplot(data$arrival_date_day_of_month, main = "arrival_date_day_of_month")
boxplot(data$stays_in_weekend_nights, main = "stays_in_weekend_nights")
boxplot(data$stays_in_week_nights, main = "stays_in_week_nights")
boxplot(data$children, main = "children")
boxplot(data$adults, main = "adults")
boxplot(data$babies, main = "babies")
boxplot(data$is_repeated_guest, main = "is_repeated_gues")
boxplot(data$previous_cancellations, main = "previous_cancellation")
boxplot(data$previous_bookings_not_canceled, main ="previous_bookings_not_canceled")
boxplot(data$booking_changes, main = "booking_changes")
boxplot(data$days_in_waiting_list, main = "days_in_waiting_list")
boxplot(data$adr, main = "adr")
boxplot(data$required_car_parking_spaces, main = "required_car_parking_spaces")
boxplot(data$total_of_special_requests, main = "total_of_special_requests")
The following outliers can be seen:
lead_time : There are lead times in the range 500 - 700. I would check this with the hotel to know if there are guests who made reservations 2 years prior to the arrival before doing anything to the data since there are many such observations.
days_in_waiting_list : There are observations with days in waiting list above 350. I would check with the hotel if they are really not able to assign a reservation to a guest for more than a year before doing anything to the data since there are many such observations.
adr : There is an observation with a negative adr and one with adr of 5400. The observation with negative adr can be removed without any questions. I would remove the adr of 5400 also since we are using the data for the analysis and not for reporting.
To remove the outliers
data <- data[ !(data$adr < 0 | data$adr == 5400), ]
I would now go through each variable of concern and understand how it affects the revenue
Most Preferred Hotel Type
To find the total number of nights the guest stayed including weekday nights and weekend nights
data$total_stay_nights <- rowSums(cbind(data$stays_in_weekend_nights, data$stays_in_week_nights), na.rm = TRUE)
To find the total revenue earned by each booking , this can be calculated by multiplying total number of nights and average daily price
data <- data %>% mutate(revenue = (total_stay_nights * adr))
To make a table of Total booking and Revenue for each Hotel type. Here I am only considering the reservations which are not canceled.
data_hotel_revenue <- data %>% filter(is_canceled == 0) %>% group_by(hotel) %>% summarise(total_bookings= n(), total_revenue = sum(revenue))
To visually understand this observation
ggplot(data_hotel_revenue, aes(y=total_bookings, x=hotel)) +
geom_bar(position="dodge", stat = "identity" , width=0.25, fill = "blue" )
ggplot(data_hotel_revenue, aes(y=total_revenue, x=hotel)) +
geom_bar(position="dodge", stat = "identity", width=0.25, fill = "blue")
This observation shows that more bookings were made in City Hotel than the Resort Hotel and hence City Hotel got more Revenue between 2015 and 2017.
Cancellation
To see the cancellation trends over the years
ggplot(data, aes(x = is_canceled, fill = factor(hotel))) +
geom_histogram(binwidth = 0.2) +
scale_x_continuous(breaks = seq(0, 1, 1))
This shows that more cancellations are made in City Hotel. Also, the cancellation percentage is more than 50%.
To find which year gave the most revenue
For this calculation , I am considering only the bookings which were not canceled
data_not_canceled <- data %>% filter(is_canceled == 0)
ggplot(data_not_canceled, aes(x = arrival_date_year, y = revenue, fill = factor(hotel))) +
geom_bar(stat = "identity" , width=0.25) +
scale_x_continuous(breaks = seq(2015,2017,1))
This shows the most revenue was made in 2016 and the most revenue was made through City Hotel.
To find which month gave the most revenue
For this calculation , I am considering only the bookings which were not canceled
data_not_canceled$arrival_date_month <- factor(data_not_canceled$arrival_date_month, levels = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"))
ggplot(data_not_canceled, aes(x = arrival_date_month, y = revenue, fill = factor(hotel))) +
geom_bar(stat = "identity" , width=0.25)
This shows August has most bookings and hence gives most revenue.
To find the most preferred meal type
Considering the whole data here (including cancellations) since this analysis is to understand the guest preference.
ggplot(data, aes(meal)) +
geom_bar(width=0.25, fill = "blue")
This shows “Bed & Breakfast” is the most preferred meal type
To find the most preferred Room type
Considering the whole data here (including cancellations) since this analysis is to understand the guest preference.
ggplot(data, aes(reserved_room_type , fill = factor(hotel))) +
geom_bar(width=0.25, fill = "blue")
Most preferred room type is “A”.
To find the most common customer (guest) type
Considering the whole data here (including cancellations) since this analysis is to understand the guest preference.
ggplot(data, aes(customer_type)) +
geom_bar(width=0.25, fill = "blue")
Most common customer type is “Transient”.
To find which country most of the Guests are from
Considering the whole data here (including cancellations) since this analysis is to understand the guest preference. Since there are guests from almost 182 countries, Just showing the top 10 countries the guests are from, here.
data_country <- data %>% group_by(country) %>% summarise(booking_count = n()) %>% arrange(desc(booking_count))
top_n(data_country,10,booking_count) %>%
ggplot(.,aes(country, booking_count)) +
geom_bar(stat = "identity", width = 0.25, fill ="blue")
Most common country is “PRT”.
To build the model , I would first prepare the data week wise. I would like to include Week number of arrival, hotel type, customer type, meal type, is_repeated_guest, total number of bookings, total revenue for the week, average adr and average stay nights.
data_model <- data %>% filter(is_canceled == 0) %>% group_by(arrival_date_week_number,hotel,customer_type, meal, is_repeated_guest) %>% summarise(total_bookings = n(), total_guests = sum(adults + children + babies),average_stay_nights = mean(total_stay_nights, na.rm = TRUE), week_adr = mean(adr, na.rm = TRUE) ,total_revenue = sum(revenue))
model_1 <- lm(total_revenue ~ as.factor(hotel) + as.factor(customer_type) + as.factor(meal) + as.factor(is_repeated_guest) + average_stay_nights + week_adr, data = data_model)
summary(model_1)
##
## Call:
## lm(formula = total_revenue ~ as.factor(hotel) + as.factor(customer_type) +
## as.factor(meal) + as.factor(is_repeated_guest) + average_stay_nights +
## week_adr, data = data_model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -55215 -20804 -5367 10432 242978
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7434.26 3706.77 2.006 0.0451
## as.factor(hotel)Resort Hotel -1644.98 2212.95 -0.743 0.4574
## as.factor(customer_type)Group -3607.78 3597.32 -1.003 0.3161
## as.factor(customer_type)Transient 41462.05 3017.20 13.742 < 2e-16
## as.factor(customer_type)Transient-Party 14504.96 3108.08 4.667 3.36e-06
## as.factor(meal)FB -59550.92 4570.39 -13.030 < 2e-16
## as.factor(meal)HB -32974.17 2495.85 -13.212 < 2e-16
## as.factor(meal)SC -32095.51 2730.19 -11.756 < 2e-16
## as.factor(meal)Undefined -53234.85 4497.99 -11.835 < 2e-16
## as.factor(is_repeated_guest)1 -33540.02 2258.14 -14.853 < 2e-16
## average_stay_nights 656.51 345.97 1.898 0.0580
## week_adr 208.88 20.75 10.068 < 2e-16
##
## (Intercept) *
## as.factor(hotel)Resort Hotel
## as.factor(customer_type)Group
## as.factor(customer_type)Transient ***
## as.factor(customer_type)Transient-Party ***
## as.factor(meal)FB ***
## as.factor(meal)HB ***
## as.factor(meal)SC ***
## as.factor(meal)Undefined ***
## as.factor(is_repeated_guest)1 ***
## average_stay_nights .
## week_adr ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 34580 on 1378 degrees of freedom
## Multiple R-squared: 0.3765, Adjusted R-squared: 0.3716
## F-statistic: 75.66 on 11 and 1378 DF, p-value: < 2.2e-16
The Hotels ultimately want to increase their revenue and they would like to understand and concentrate on each and everything that improves their revenue.
The conclusions from my analysis of the data from 2015 to 2017 are :
From the Exploratory Data Analysis
From the Linear Model