Initial setup and Configure the data set. Load the data file in
variable hotel_data
Data set - Hotels : This data comes from an
open hotel booking demand data-set of hotels like City Hotel , Resort
Hotel.
summary(hotel_data)
## hotel is_canceled lead_time arrival_date_year
## Length:119390 Min. :0.0000 Min. : 0 Min. :2015
## Class :character 1st Qu.:0.0000 1st Qu.: 18 1st Qu.:2016
## Mode :character Median :0.0000 Median : 69 Median :2016
## Mean :0.3704 Mean :104 Mean :2016
## 3rd Qu.:1.0000 3rd Qu.:160 3rd Qu.:2017
## Max. :1.0000 Max. :737 Max. :2017
##
## arrival_date_month arrival_date_week_number arrival_date_day_of_month
## Length:119390 Min. : 1.00 Min. : 1.0
## Class :character 1st Qu.:16.00 1st Qu.: 8.0
## Mode :character Median :28.00 Median :16.0
## Mean :27.17 Mean :15.8
## 3rd Qu.:38.00 3rd Qu.:23.0
## Max. :53.00 Max. :31.0
##
## stays_in_weekend_nights stays_in_week_nights adults
## Min. : 0.0000 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 1.0 1st Qu.: 2.000
## Median : 1.0000 Median : 2.0 Median : 2.000
## Mean : 0.9276 Mean : 2.5 Mean : 1.856
## 3rd Qu.: 2.0000 3rd Qu.: 3.0 3rd Qu.: 2.000
## Max. :19.0000 Max. :50.0 Max. :55.000
##
## children babies meal country
## Min. : 0.0000 Min. : 0.000000 Length:119390 Length:119390
## 1st Qu.: 0.0000 1st Qu.: 0.000000 Class :character Class :character
## Median : 0.0000 Median : 0.000000 Mode :character Mode :character
## Mean : 0.1039 Mean : 0.007949
## 3rd Qu.: 0.0000 3rd Qu.: 0.000000
## Max. :10.0000 Max. :10.000000
## NA's :4
## market_segment distribution_channel is_repeated_guest
## Length:119390 Length:119390 Min. :0.00000
## Class :character Class :character 1st Qu.:0.00000
## Mode :character Mode :character Median :0.00000
## Mean :0.03191
## 3rd Qu.:0.00000
## Max. :1.00000
##
## previous_cancellations previous_bookings_not_canceled reserved_room_type
## Min. : 0.00000 Min. : 0.0000 Length:119390
## 1st Qu.: 0.00000 1st Qu.: 0.0000 Class :character
## Median : 0.00000 Median : 0.0000 Mode :character
## Mean : 0.08712 Mean : 0.1371
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. :26.00000 Max. :72.0000
##
## assigned_room_type booking_changes deposit_type agent
## Length:119390 Min. : 0.0000 Length:119390 Length:119390
## Class :character 1st Qu.: 0.0000 Class :character Class :character
## Mode :character Median : 0.0000 Mode :character Mode :character
## Mean : 0.2211
## 3rd Qu.: 0.0000
## Max. :21.0000
##
## company days_in_waiting_list customer_type adr
## Length:119390 Min. : 0.000 Length:119390 Min. : -6.38
## Class :character 1st Qu.: 0.000 Class :character 1st Qu.: 69.29
## Mode :character Median : 0.000 Mode :character Median : 94.58
## Mean : 2.321 Mean : 101.83
## 3rd Qu.: 0.000 3rd Qu.: 126.00
## Max. :391.000 Max. :5400.00
##
## required_car_parking_spaces total_of_special_requests reservation_status
## Min. :0.00000 Min. :0.0000 Length:119390
## 1st Qu.:0.00000 1st Qu.:0.0000 Class :character
## Median :0.00000 Median :0.0000 Mode :character
## Mean :0.06252 Mean :0.5714
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :8.00000 Max. :5.0000
##
## reservation_status_date
## Length:119390
## Class :character
## Mode :character
##
##
##
##
#head(hotel_data)
# Arranging hotel data accoring to adr(Average Daily Rate).
hotel_data <-hotel_data[order(hotel_data$adr,decreasing = TRUE),]
#Display top 10 row for columns hotel and adr.
head(hotel_data[,c("hotel","adr")],10)
## hotel adr
## 48516 City Hotel 5400.00
## 111404 City Hotel 510.00
## 15084 Resort Hotel 508.00
## 103913 City Hotel 451.50
## 13143 Resort Hotel 450.00
## 13392 Resort Hotel 437.00
## 39156 Resort Hotel 426.25
## 39569 Resort Hotel 402.00
## 39119 Resort Hotel 397.38
## 13324 Resort Hotel 392.00
In Hotel Dataset, Column “adr”(Average Daily Rate, is a numaric field) could be one of the key valuable continuous variables of the hotel room. adr - defined by dividing the sum of all lodging transactions by the total number of staying nights or the average paid amount for a room per night.It is key metric for observing the financial performance of a hotel. It is a key data point from the hotel dataset for hotel stakeholders like hotel managers, investigators etc. to evaluate revenue generation and pricing strategies for future business. Therefore, ADR “Average Daily Rate” can be considered a valuable continuous variable in hotel data analysis in my hotel dataset.
In my hotel dataset, categorical column - reserved_room_type can be
taken as explanatory variable that might influence the response variable
- ard (Average Daily Rate).
Null Hypothesis : The average daily rate
does not vary significantly across different reserved room types.
#Determine the unique Category
hotel_data$List_category_room_type <- unique(hotel_data$reserved_room_type)
#Perform ANOVA Test with fitting function lm.
anova_result_table <-anova(lm(hotel_data$adr~hotel_data$List_category_room_type,hotel_data))
#Display Summary of Anova Tables
summary(anova_result_table)
## Df Sum Sq Mean Sq F value
## Min. : 9 Min. : 2021 Min. : 224.6 Min. :0.08794
## 1st Qu.: 29852 1st Qu.: 76226891 1st Qu.: 807.0 1st Qu.:0.08794
## Median : 59695 Median :152451760 Median :1389.3 Median :0.08794
## Mean : 59695 Mean :152451760 Mean :1389.3 Mean :0.08794
## 3rd Qu.: 89537 3rd Qu.:228676629 3rd Qu.:1971.7 3rd Qu.:0.08794
## Max. :119380 Max. :304901498 Max. :2554.0 Max. :0.08794
## NA's :1
## Pr(>F)
## Min. :0.9998
## 1st Qu.:0.9998
## Median :0.9998
## Mean :0.9998
## 3rd Qu.:0.9998
## Max. :0.9998
## NA's :1
#Perform AOV test
aov_result <- aov(adr ~ hotel_data$List_category_room_type, data = hotel_data)
summary_aov <- summary.aov(aov_result)
print(summary_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## hotel_data$List_category_room_type 9 2021 224.6 0.088 1
## Residuals 119380 304901498 2554.0
#Box plot for adr and reserved_room_type
boxplot(hotel_data$adr ~ hotel_data$reserved_room_type, ylim = c(0, 350), col = "light gray", ylab = "ADR", xlab = "Reserved Room Type")
Interpretation
1. Determine the unique Category of Room
type of any hotel.
2. Perform ANOVA Test with fitting function lm
with column - adr and reserve_room_type.
3. Report: F (9,119380):
0.08794 (relatively Low) and P-Value: 0.9998 (which quite high >
0.05)
Conclusion: The P-value is very high approx. 1
(>0.05) which means the null hypothesis is rejected. Based on the
above ANOVA test and boxplot, there is a significant difference in
average daily rates among different room types. We can conclude that our
Assumption - “reserved_room_type has an impact on ADR” is rejected.
Therefore, we reject the null hypothesis and conclude that the average
daily rate varies significantly across different room types.
In my hotel dataset,One continuous Column “lead_time” might inflence the response varaible - “adr - Average Daily Rate”.Lead Time refers to number of days that elapsed between booking and arrival.It is reasonable to assume that there might be a linear relationship between lead time and ADR.For Example, customers who book well in advance might get better rates compared to those who book closer to their arrival date.
# lm : lm is used to fit linear models, including multivariate ones.It can be used to carry out regression,single stratum analysis of variance and analysis of covariance. (Refence - rstudio help library )
#Fit the linear regression model
lm_model_lead_time_adr <- lm(adr~lead_time,data = hotel_data)
# Summary of the model
summary(lm_model_lead_time_adr)
##
## Call:
## lm(formula = adr ~ lead_time, data = hotel_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -105.5 -31.4 -7.2 23.9 5296.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 104.933697 0.203692 515.16 <2e-16 ***
## lead_time -0.029829 0.001366 -21.84 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 50.44 on 119388 degrees of freedom
## Multiple R-squared: 0.003979, Adjusted R-squared: 0.00397
## F-statistic: 476.9 on 1 and 119388 DF, p-value: < 2.2e-16
Interpret the coefficients:
Coefficient - Slop of the regression line and Interpret the coefficients
is the value where line intercept the axix.
Y varaiable = dependent
variable = ADR
X varaible = independent variable = Lead Time
✔The Intercept is 104.933697 ( estimated ADR) when lead time is zero
which is unlikely hood in our hotel bookings case.
It might not have
practical interpretation since it is unlikely to have a lead time zero.
✔ The Coefficients for lead time is -0.029829. It indicates the
estimated change in the average daily rate for a room increase in lead
time.
In our case,it suggests that for each additional day of lead
time, the average daily rate decreases by $0.02983.
Thank You.!!!