A hotel is an establishment that provides paid lodging on a short-term basis. Facilities provided may range from a modest-quality mattress in a small room to large suites with bigger, higher-quality beds, a dresser, a refrigerator and other kitchen facilities, upholstered chairs, a flatscreen television and en-suite bathrooms. Small, lower-priced hotels may offer only the most basic guest services and facilities. Larger, higher-priced hotels may provide additional guest facilities such as a swimming pool, business centre (with computers, printers and other office equipment), childcare, conference and event facilities, tennis or basketball courts, gymnasium, restaurants, day spa and social function services, Wifi, Breakfast. And also, Price of Hotel room changes occasionally like if the day is a new year, Is it a metro city, Is it a Tourist Destination, Is it a Weekend or like normal days.
Our study concerns the Strategy of hotel room prices considering various factors. The hotel industry appears to be quite glamorous and exciting to all of us. Top class hospitality, well-equipped rooms, fancy facilities, etc. - all this is done with an aim to create a seamlessly pleasant experience for guests. Needless to say, all this comes at a price which the guest actually has to pay.
A host of technology-driven hotel revenue management systems and processes are required to be put in place, and some good people and procedures need to be employed to ensure that the customer gets the best share of offerings, price being the most important component.
Setting your prices without a thorough grasp of your overall objectives can destroy any brand-building efforts.Well, pricing is certainly critical to the success of all hotels and a well-designed pricing strategy can do wonders by giving a strong push to the hotel revenues. In this Study , we have total of 19 features of hotels and on the basis of these features, we are doing our analysis and trying to find out those features which have significantly effect on the Pricing Strategy of hotel rooms.
Hypothesis :-
H1: Price of hotel room is higher if starraing is high than Price of hotel room if starrating is low.
H2: Price of Hotel Room is higher at tourist destination than Price of Hotel Room at non-tourist destination.
H3: Price of Hotel Room is higher if the hotel Capacity is full than Price of Hotel Room if the hotel capacity is low.
The data was collected from https://in.hotels.com/ in October 2016.
Dataset tracks hotel prices on 8 different dates at different hotels across different cities.
Many external factors can potentially influence the RoomRent. The dataset captures 8 external factors, 10 internal factors and ofcourse RoomRent factor.
hotel <- read.csv(paste("Cities42.csv"),sep=",")
View(hotel)
library(psych)
describe(hotel)[,c(3,4,5,8,9)]
mean sd median min max
CityName* 18.07 11.72 16 1.0 42
Population 4416836.87 4258386.00 3046163 8096.0 12442373
CityRank 14.83 13.51 9 0.0 44
IsMetroCity 0.28 0.45 0 0.0 1
IsTouristDestination 0.70 0.46 1 0.0 1
IsWeekend 0.62 0.48 1 0.0 1
IsNewYearEve 0.12 0.33 0 0.0 1
Date* 14.30 2.69 14 1.0 20
HotelName* 841.19 488.16 827 1.0 1670
RoomRent 5473.99 7333.12 4000 299.0 322500
StarRating 3.46 0.76 3 0.0 5
Airport 21.16 22.76 15 0.2 124
HotelAddress* 1202.53 582.17 1261 1.0 2108
HotelPincode 397430.26 259837.50 395003 100025.0 7000157
HotelDescription* 581.34 363.26 567 1.0 1226
FreeWifi 0.93 0.26 1 0.0 1
FreeBreakfast 0.65 0.48 1 0.0 1
HotelCapacity 62.51 76.66 34 0.0 600
HasSwimmingPool 0.36 0.48 0 0.0 1
#IsMetrocity
hotel$IsMetroCity[hotel$IsMetroCity == 1] <- 'Yes MetroCity'
hotel$IsMetroCity[hotel$IsMetroCity == 0] <- 'No MetroCity'
hotel$IsMetroCity <- as.factor(hotel$IsMetroCity)
#IsWeekend
hotel$IsWeekend[hotel$IsWeekend == 1] <- 'Yes Weekend'
hotel$IsWeekend[hotel$IsWeekend == 0] <- 'No Weekend'
hotel$IsWeekend <- as.factor(hotel$IsWeekend)
#IsTouristDestination
hotel$IsTouristDestination[hotel$IsTouristDestination == 1] <- 'Yes TouristDestination'
hotel$IsTouristDestination[hotel$IsTouristDestination == 0] <- 'No ToursitDestination'
hotel$IsTouristDestination <- as.factor(hotel$IsTouristDestination)
#IsNewYearEve
hotel$IsNewYearEve[hotel$IsNewYearEve == 1] <- 'Yes NewYear'
hotel$IsNewYearEve[hotel$IsNewYearEve == 0] <- 'No NewYear'
hotel$IsNewYearEve <- as.factor(hotel$IsNewYearEve)
#FreeWifi
hotel$FreeWifi[hotel$FreeWifi == 1] <- 'Yes Wifi'
hotel$FreeWifi[hotel$FreeWifi == 0] <- 'No Wifi'
hotel$FreeWifi <- as.factor(hotel$FreeWifi)
#FreeBreakfast
hotel$FreeBreakfast[hotel$FreeBreakfast == 1] <- 'Yes Breakfast'
hotel$FreeBreakfast[hotel$FreeBreakfast == 0] <- 'No Breakfast'
hotel$FreeBreakfast <- as.factor(hotel$FreeBreakfast)
#HasSwimmingPool
hotel$HasSwimmingPool[hotel$HasSwimmingPool == 1] <- 'Yes SwimmingPool'
hotel$HasSwimmingPool[hotel$HasSwimmingPool == 0] <- 'No SwimmingPool'
hotel$HasSwimmingPool <- as.factor(hotel$HasSwimmingPool)
str(hotel)
'data.frame': 13232 obs. of 19 variables:
$ CityName : Factor w/ 42 levels "Agra","Ahmedabad",..: 26 26 26 26 26 26 26 26 26 26 ...
$ Population : int 12442373 12442373 12442373 12442373 12442373 12442373 12442373 12442373 12442373 12442373 ...
$ CityRank : int 0 0 0 0 0 0 0 0 0 0 ...
$ IsMetroCity : Factor w/ 2 levels "No MetroCity",..: 2 2 2 2 2 2 2 2 2 2 ...
$ IsTouristDestination: Factor w/ 2 levels "No ToursitDestination",..: 2 2 2 2 2 2 2 2 2 2 ...
$ IsWeekend : Factor w/ 2 levels "No Weekend","Yes Weekend": 2 1 2 2 1 2 1 2 2 1 ...
$ IsNewYearEve : Factor w/ 2 levels "No NewYear","Yes NewYear": 1 1 1 1 1 2 1 1 1 1 ...
$ Date : Factor w/ 20 levels "04-Jan-16","04-Jan-17",..: 11 12 13 14 15 16 17 18 11 12 ...
$ HotelName : Factor w/ 1670 levels "14 Square Amanora",..: 1635 1635 1635 1635 1635 1635 1635 1635 1409 1409 ...
$ RoomRent : int 12375 10250 9900 10350 12000 11475 11220 9225 6800 9350 ...
$ StarRating : num 5 5 5 5 5 5 5 5 4 4 ...
$ Airport : num 21 21 21 21 21 21 21 21 20 20 ...
$ HotelAddress : Factor w/ 2108 levels " H.P. High Court Mall Road, Shimla",..: 925 928 930 933 935 937 940 941 699 746 ...
$ HotelPincode : int 400005 400006 400007 400008 400009 400010 400011 400012 400039 400040 ...
$ HotelDescription : Factor w/ 1226 levels "#NAME?","10 star hotel near Queensroad, Amritsar",..: 1030 1030 1030 1030 1030 1030 1030 1030 1006 1006 ...
$ FreeWifi : Factor w/ 2 levels "No Wifi","Yes Wifi": 2 2 2 2 2 2 2 2 2 2 ...
$ FreeBreakfast : Factor w/ 2 levels "No Breakfast",..: 1 1 1 1 1 1 1 1 2 2 ...
$ HotelCapacity : int 287 287 287 287 287 287 287 287 28 28 ...
$ HasSwimmingPool : Factor w/ 2 levels "No SwimmingPool",..: 2 2 2 2 2 2 2 2 1 1 ...
table(hotel$CityName)
Agra Ahmedabad Amritsar Bangalore Bhubaneswar Chandigarh Chennai Darjeeling
432 424 136 656 120 336 416 136
Delhi Gangtok Goa Guwahati Haridwar Hyderabad Indore Jaipur
2048 128 624 48 48 536 160 768
Jaisalmer Jodhpur Kanpur Kochi Kolkata Lucknow Madurai Manali
264 224 16 608 512 128 112 288
Mangalore Mumbai Munnar Mysore Nainital Ooty Panchkula Pune
104 712 328 160 144 136 64 600
Puri Rajkot Rishikesh Shimla Srinagar Surat Thiruvanthipuram Thrissur
56 128 88 280 40 80 392 32
Udaipur Varanasi
456 264
count <- aggregate(hotel$Population,by=list(CItyname=hotel$CityName),mean)
min(count$x) #minimum population
[1] 8096
count$CItyname[count$x==8096] # City name with minimum population
[1] Manali
42 Levels: Agra Ahmedabad Amritsar Bangalore Bhubaneswar Chandigarh Chennai Darjeeling Delhi Gangtok Goa Guwahati Haridwar ... Varanasi
max(count$x) #maximum population
[1] 12442373
count$CItyname[count$x==12442373] #city name with maximum population
[1] Mumbai
42 Levels: Agra Ahmedabad Amritsar Bangalore Bhubaneswar Chandigarh Chennai Darjeeling Delhi Gangtok Goa Guwahati Haridwar ... Varanasi
plot(hotel$CityRank,hotel$Population,col="darkblue",main="City rank by population",xlab = "City Rank",ylab = "Population")
table(hotel$IsMetroCity)
No MetroCity Yes MetroCity
9472 3760
barplot(table(hotel$IsMetroCity), main="Distribution of IsMetroCity", xlab="Is a Metro City?", col="blue")
count <- aggregate(hotel$RoomRent,by=list(Metrocity=hotel$IsMetroCity),mean)
count
table(hotel$IsTouristDestination) #Number of Tourist destination or not
No ToursitDestination Yes TouristDestination
4007 9225
count <- aggregate(hotel$RoomRent,by=list(ToursitDestination = hotel$IsTouristDestination),mean)
count # Room Rent according to tourist destination or not
table(hotel$IsWeekend) #Number of Weekends and week days(No Weekend)
No Weekend Yes Weekend
4991 8241
count <- aggregate(hotel$RoomRent,by=list(Weekend = hotel$IsWeekend),mean)
count #Room Rent according to weekend or not
library(ggplot2)
count <- aggregate(RoomRent~IsWeekend+IsMetroCity+IsTouristDestination,data=hotel,mean)
count
#Graph of difference in Money on Weekend or not
qplot(RoomRent, data = count, geom = "density",color = IsWeekend, linetype = IsWeekend,xlab="Room Rent in Indian rupees",
ylab="Density", main="RoomRent VS IsWeekend")
#Graph of difference in Money in MetroCity or not
qplot(RoomRent, data = count, geom = "density",color = IsMetroCity, linetype = IsMetroCity,xlab="Room Rent in Indian rupees",
ylab="Density",main="RoomRent VS IsMetroCity")
#Graph of difference in Money at Tourist Destination or not
qplot(RoomRent, data = count, geom = "density",color = IsTouristDestination, linetype = IsTouristDestination,
xlab="Room Rent in Indian rupees",ylab="Density",main="RoomRent VS IsTouristDestination")
#Graph of difference in Money on the basis of weekend, metrocity and touristdestination
qplot(RoomRent, data = count, facets = IsWeekend~IsMetroCity+IsTouristDestination)
table(hotel$IsNewYearEve) #Number of New Year Eve or not
No NewYear Yes NewYear
11586 1646
count <- aggregate(hotel$RoomRent,by=list(IsNewYearEve = hotel$IsNewYearEve),mean)
count #Room Rent on the basis of New Year Eve or not
#Graph of difference in Money on New Year or not
qplot(x, data = count, geom = "density",color = IsNewYearEve, linetype = IsNewYearEve,xlab="Room Rent in Indian rupees",ylab="Density",main="RoomRent VS IsNewYearE")
min(hotel$RoomRent) # minimum room rent
[1] 299
max(hotel$RoomRent) # maximum room rent
[1] 322500
library(car)
library(lattice)
table(hotel$StarRating) #Number of hotel on the basis of StarRating
0 1 2 2.5 3 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4 4.1 4.3 4.4 4.5 4.7 4.8 5
16 8 440 632 5953 8 16 8 1752 8 24 16 32 2463 24 16 8 376 8 16 1408
count <- aggregate(cbind(Population,RoomRent)~StarRating,data=hotel,mean)
count # StarRating on the basis of population and roomrent
#Scatter plot of Population against Room Rent
scatterplot(count$Population,count$RoomRent , main="Population & RoomRent",labels=row.names(count),xlab="Population",ylab="RoomRent")
#Scatter plot of StarRating against Room Rent
scatterplot(count$StarRating,count$RoomRent , main="StarRating & RoomRent",labels=row.names(count),xlab="StarRating",ylab="RoomRent")
count <- aggregate(cbind(Airport,RoomRent)~StarRating,data=hotel,mean)
count # StarRating of Hotels on the basis of Airport distance from hotel and Room Rent
#Scatterplot of Airport distance from hotel against Room Rent
scatterplot(count$Airport,count$RoomRent , main="Airport distance & RoomRent",labels=row.names(count),xlab="Airport distance from hotel",ylab="RoomRent")
#Graph(Combined jitterplot and boxplot) of Airport distance from hotel against StarRating of hotel
qplot(StarRating, Airport, data = count,geom=c("boxplot", "jitter"))
count <- xtabs(~FreeWifi+FreeBreakfast+HasSwimmingPool,data=hotel)
ftable(prop.table(count))*100 #Proportion of hotel on the basis of wifi, breakfast and swimmingpool
HasSwimmingPool No SwimmingPool Yes SwimmingPool
FreeWifi FreeBreakfast
No Wifi No Breakfast 2.4788392 2.1009674
Yes Breakfast 1.9951632 0.8388755
Yes Wifi No Breakfast 18.7197703 11.7896010
Yes Breakfast 41.2258162 20.8509674
count <- aggregate(RoomRent~FreeBreakfast+FreeWifi+HasSwimmingPool,data=hotel,mean)
count
#Graph of difference in money on the basis of breakfast, wifi and swimmingpool
qplot(RoomRent, data = count, facets = FreeBreakfast~FreeWifi+HasSwimmingPool)
#Box and Whisker Plot of Swimming pool or not in hotel according to Room Rent
bwplot(HasSwimmingPool ~ RoomRent, data=count, horizontal=TRUE,
xlab = "Room Rent",ylab="Swimming pool",main="Room Rent & Swimming pool")
#Box and Whisker Plot of Free Breakfast or not in hotel according to Room Rent
bwplot(FreeBreakfast ~ RoomRent, data=count, horizontal=TRUE,
xlab = "Room Rent",ylab="Free Breakfast",main="Room Rent & Free Breakfast")
#Box and Whisker Plot of Free Wifi or not in hotel according to Room Rent
bwplot(FreeWifi ~ RoomRent, data=count, horizontal=TRUE,
xlab = "Room Rent",ylab="Free Wifi",main="Room Rent & Free Wifi")
#plot of room rent according to hotel capacity
plot(hotel$HotelCapacity,hotel$RoomRent,xlab="Hotel Capacity",ylab="Room Rent",main="Room Rent according to hotel capacity ")
#Graph(Histogram + Density) of distribution of Hotel capacity in the data
ggplot(data=hotel) +geom_histogram( aes(HotelCapacity, ..density..) ) +
geom_density( aes(HotelCapacity, ..density..) ) + geom_rug( aes(HotelCapacity) )
scatterplotMatrix(~RoomRent+IsWeekend+IsMetroCity+IsTouristDestination,data=hotel,cex=0.6,spread=FALSE, smoother.args=list(lty=2)
,main="Room Rent versus other variables")
scatterplotMatrix(~RoomRent+IsNewYearEve+Population,data=hotel,cex=0.6,spread=FALSE, smoother.args=list(lty=2)
,main="Room Rent versus other variables")
scatterplotMatrix(~RoomRent+StarRating+FreeWifi+FreeBreakfast,data=hotel,cex=0.6,spread=FALSE, smoother.args=list(lty=2)
,main="Room Rent versus other variables")
scatterplotMatrix(~RoomRent+HotelCapacity+HasSwimmingPool,data=hotel,cex=0.6,spread=FALSE, smoother.args=list(lty=2)
,main="Room Rent versus other variables")
hotel$IsMetroCity = as.integer(hotel$IsMetroCity, levels = c('Yes MetroCity','No MetroCity'),labels = c(1, 0))
hotel$IsWeekend = as.integer(hotel$IsWeekend, levels = c('Yes Weekend','No Weekend'),labels = c(1, 0))
hotel$IsTouristDestination = as.integer(hotel$IsTouristDestination, levels = c('Yes TouristDestination','No TouristDestination'),
labels = c(1, 0))
hotel$IsNewYearEve = as.integer(hotel$IsNewYearEve, levels = c('Yes NewYear','No NewYear'), labels = c(1, 0))
hotel$FreeWifi = as.integer(hotel$FreeWifi, levels = c('Yes Wifi','No Wifi'), labels = c(1, 0))
hotel$FreeBreakfast = as.integer(hotel$FreeBreakfast, levels = c('Yes Breakfast','No Breakfast'), labels = c(1, 0))
hotel$HasSwimmingPool = as.integer(hotel$HasSwimmingPool, levels = c('Yes SwimmingPool','No SwimmingPool'), labels = c(1, 0))
library(Hmisc)
colhotel3 <- c("Population","IsMetroCity","IsWeekend","IsTouristDestination","RoomRent")
corMatrix <- rcorr(as.matrix(hotel[,colhotel3]))
corMatrix
Population IsMetroCity IsWeekend IsTouristDestination RoomRent
Population 1.00 0.77 0.01 -0.05 -0.09
IsMetroCity 0.77 1.00 0.00 0.18 -0.07
IsWeekend 0.01 0.00 1.00 -0.02 0.00
IsTouristDestination -0.05 0.18 -0.02 1.00 0.12
RoomRent -0.09 -0.07 0.00 0.12 1.00
n= 13232
P
Population IsMetroCity IsWeekend IsTouristDestination RoomRent
Population 0.0000 0.1824 0.0000 0.0000
IsMetroCity 0.0000 0.8349 0.0000 0.0000
IsWeekend 0.1824 0.8349 0.0250 0.5983
IsTouristDestination 0.0000 0.0000 0.0250 0.0000
RoomRent 0.0000 0.0000 0.5983 0.0000
colhotel3 <- c("IsNewYearEve","FreeWifi","FreeBreakfast","HasSwimmingPool","StarRating","RoomRent","HotelCapacity")
corMatrix <- rcorr(as.matrix(hotel[,colhotel3]))
corMatrix
IsNewYearEve FreeWifi FreeBreakfast HasSwimmingPool StarRating RoomRent HotelCapacity
IsNewYearEve 1.00 0.00 0.00 0.00 0.00 0.04 0.00
FreeWifi 0.00 1.00 0.16 -0.02 0.02 0.00 -0.01
FreeBreakfast 0.00 0.16 1.00 -0.06 -0.03 -0.01 -0.09
HasSwimmingPool 0.00 -0.02 -0.06 1.00 0.62 0.31 0.51
StarRating 0.00 0.02 -0.03 0.62 1.00 0.37 0.64
RoomRent 0.04 0.00 -0.01 0.31 0.37 1.00 0.16
HotelCapacity 0.00 -0.01 -0.09 0.51 0.64 0.16 1.00
n= 13232
P
IsNewYearEve FreeWifi FreeBreakfast HasSwimmingPool StarRating RoomRent HotelCapacity
IsNewYearEve 0.9974 0.7643 0.8973 0.7860 0.0000 0.8764
FreeWifi 0.9974 0.0000 0.0056 0.0383 0.6765 0.3168
FreeBreakfast 0.7643 0.0000 0.0000 0.0002 0.2497 0.0000
HasSwimmingPool 0.8973 0.0056 0.0000 0.0000 0.0000 0.0000
StarRating 0.7860 0.0383 0.0002 0.0000 0.0000 0.0000
RoomRent 0.0000 0.6765 0.2497 0.0000 0.0000 0.0000
HotelCapacity 0.8764 0.3168 0.0000 0.0000 0.0000 0.0000
library(corrgram)
corrgram(hotel, order=TRUE, upper.panel=panel.cor, main="Variables")
H1: Price of hotel room is higher if starraing is high than Price of hotel room if starrating is low.
t.test(hotel$StarRating,hotel$RoomRent)
Welch Two Sample t-test
data: hotel$StarRating and hotel$RoomRent
t = -85.813, df = 13231, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-5595.491 -5345.575
sample estimates:
mean of x mean of y
3.458933 5473.991838
AS P-value is less than 0.05, our hypothesis is true.
H2: Price of Hotel Room is higher at tourist destination than Price of Hotel Room at non-tourist destination.
t.test(hotel$IsTouristDestination,hotel$RoomRent)
Welch Two Sample t-test
data: hotel$IsTouristDestination and hotel$RoomRent
t = -85.841, df = 13231, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-5597.253 -5347.337
sample estimates:
mean of x mean of y
1.697174 5473.991838
As P-value is less than 0.05, our hypothesis is true.
H3: Price of Hotel Room is higher if the hotel Capacity is full than Price of Hotel Room if the hotel capacity is low.
t.test(hotel$RoomRent,hotel$HotelCapacity)
Welch Two Sample t-test
data: hotel$RoomRent and hotel$HotelCapacity
t = 84.882, df = 13234, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
5286.515 5536.445
sample estimates:
mean of x mean of y
5473.99184 62.51164
Our P-Value is less than 0.05. So, our Hypothesis is true.
Respnse variable is “RoomRent”
We proposed the following model:
\[RoomRent= \alpha_0 + \alpha_1 Population + \alpha_2 CityRank + \alpha_3 IsMetroCity + \alpha_4 IsTouristDestination + \alpha_5 IsWeekend+\alpha_6 IsNewYearEve \] \[ + \alpha_7 StarRating + \alpha_8 Airport + \alpha_9 FreeWifi + \alpha_10 FreeBreakfast +\alpha_11 HotelCapacity +\alpha_12 HasSwimmingPool+\epsilon\] We are considering all the Predictors in our model. Later, those predictors have not significantally effect on Room Rent, we will remove them for good.
fit <- lm(RoomRent~Population+CityRank+IsMetroCity+IsTouristDestination+IsWeekend+IsNewYearEve+StarRating+Airport+FreeWifi+
FreeBreakfast+HotelCapacity+HasSwimmingPool, data=hotel)
summary(fit)
Call:
lm(formula = RoomRent ~ Population + CityRank + IsMetroCity +
IsTouristDestination + IsWeekend + IsNewYearEve + StarRating +
Airport + FreeWifi + FreeBreakfast + HotelCapacity + HasSwimmingPool,
data = hotel)
Residuals:
Min 1Q Median 3Q Max
-11845 -2356 -690 1030 309689
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.353e+04 6.586e+02 -20.542 < 2e-16 ***
Population -1.188e-04 3.592e-05 -3.307 0.000945 ***
CityRank 1.821e+00 1.035e+01 0.176 0.860302
IsMetroCity -6.640e+02 2.164e+02 -3.068 0.002158 **
IsTouristDestination 1.925e+03 1.481e+02 13.001 < 2e-16 ***
IsWeekend -9.076e+01 1.239e+02 -0.733 0.463709
IsNewYearEve 8.826e+02 1.818e+02 4.855 1.22e-06 ***
StarRating 3.592e+03 1.108e+02 32.434 < 2e-16 ***
Airport 9.510e+00 3.171e+00 2.999 0.002709 **
FreeWifi 5.498e+02 2.242e+02 2.452 0.014214 *
FreeBreakfast 1.688e+02 1.233e+02 1.369 0.171163
HotelCapacity -1.028e+01 1.033e+00 -9.945 < 2e-16 ***
HasSwimmingPool 2.153e+03 1.616e+02 13.327 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 6601 on 13219 degrees of freedom
Multiple R-squared: 0.1906, Adjusted R-squared: 0.1898
F-statistic: 259.3 on 12 and 13219 DF, p-value: < 2.2e-16
library(leaps)
model1 <- RoomRent~Population+CityRank+IsMetroCity+IsTouristDestination+IsWeekend+IsNewYearEve+StarRating+Airport+FreeWifi+
FreeBreakfast+HotelCapacity+HasSwimmingPool
leap1 <- regsubsets(model1, data = hotel, nbest=1)
plot(leap1, scale="adjr2")
Respnse variable is “RoomRent”
fit1 <- lm(RoomRent~Population+IsMetroCity+IsTouristDestination+IsNewYearEve+StarRating+Airport+HotelCapacity
+HasSwimmingPool, data=hotel)
summary(fit1)
Call:
lm(formula = RoomRent ~ Population + IsMetroCity + IsTouristDestination +
IsNewYearEve + StarRating + Airport + HotelCapacity + HasSwimmingPool,
data = hotel)
Residuals:
Min 1Q Median 3Q Max
-11798 -2358 -704 1030 309571
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.229e+04 4.509e+02 -27.252 < 2e-16 ***
Population -1.214e-04 2.261e-05 -5.372 7.92e-08 ***
IsMetroCity -6.274e+02 2.132e+02 -2.943 0.003258 **
IsTouristDestination 1.900e+03 1.373e+02 13.843 < 2e-16 ***
IsNewYearEve 8.429e+02 1.739e+02 4.847 1.27e-06 ***
StarRating 3.613e+03 1.103e+02 32.742 < 2e-16 ***
Airport 9.544e+00 2.711e+00 3.520 0.000432 ***
HotelCapacity -1.055e+01 1.027e+00 -10.268 < 2e-16 ***
HasSwimmingPool 2.133e+03 1.598e+02 13.354 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 6602 on 13223 degrees of freedom
Multiple R-squared: 0.19, Adjusted R-squared: 0.1895
F-statistic: 387.6 on 8 and 13223 DF, p-value: < 2.2e-16
summary(fit)$adj.r.squared # model1
[1] 0.1898256
summary(fit1)$adj.r.squared # model 2
[1] 0.1894769
AIC(fit) # model1
[1] 270314.1
AIC(fit1) # model2
[1] 270315.8
Here, Adj. R-Squared value of model 1 is slightly larger then model 2 and also AIC valus of model 1 is slightly lesser then model2. So, model 1 wins.
library(coefplot)
coefplot(fit,intercept= FALSE, outerCI=1.96)
newfit <- glm(RoomRent~Population+CityRank+IsMetroCity+IsTouristDestination+IsWeekend+IsNewYearEve+StarRating+Airport+FreeWifi+
FreeBreakfast+HotelCapacity+HasSwimmingPool ,data=hotel)
summary(newfit)
Call:
glm(formula = RoomRent ~ Population + CityRank + IsMetroCity +
IsTouristDestination + IsWeekend + IsNewYearEve + StarRating +
Airport + FreeWifi + FreeBreakfast + HotelCapacity + HasSwimmingPool,
data = hotel)
Deviance Residuals:
Min 1Q Median 3Q Max
-11845 -2356 -690 1030 309689
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.353e+04 6.586e+02 -20.542 < 2e-16 ***
Population -1.188e-04 3.592e-05 -3.307 0.000945 ***
CityRank 1.821e+00 1.035e+01 0.176 0.860302
IsMetroCity -6.640e+02 2.164e+02 -3.068 0.002158 **
IsTouristDestination 1.925e+03 1.481e+02 13.001 < 2e-16 ***
IsWeekend -9.076e+01 1.239e+02 -0.733 0.463709
IsNewYearEve 8.826e+02 1.818e+02 4.855 1.22e-06 ***
StarRating 3.592e+03 1.108e+02 32.434 < 2e-16 ***
Airport 9.510e+00 3.171e+00 2.999 0.002709 **
FreeWifi 5.498e+02 2.242e+02 2.452 0.014214 *
FreeBreakfast 1.688e+02 1.233e+02 1.369 0.171163
HotelCapacity -1.028e+01 1.033e+00 -9.945 < 2e-16 ***
HasSwimmingPool 2.153e+03 1.616e+02 13.327 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for gaussian family taken to be 43566804)
Null deviance: 7.1149e+11 on 13231 degrees of freedom
Residual deviance: 5.7591e+11 on 13219 degrees of freedom
AIC: 270314
Number of Fisher Scoring iterations: 2
Here, we can see that with logistic regression, FreeWifi has significance over Room Rent. Now, Removing variables in logistic regression model2 that have no significance over Room Rent to improve the model.
newfit1 <- glm(RoomRent~Population+IsMetroCity+IsTouristDestination+IsNewYearEve+StarRating+Airport+FreeWifi+
HotelCapacity+HasSwimmingPool ,data=hotel)
summary(newfit1)
Call:
glm(formula = RoomRent ~ Population + IsMetroCity + IsTouristDestination +
IsNewYearEve + StarRating + Airport + FreeWifi + HotelCapacity +
HasSwimmingPool, data = hotel)
Deviance Residuals:
Min 1Q Median 3Q Max
-11839 -2385 -691 1045 309532
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.343e+04 6.187e+02 -21.698 < 2e-16 ***
Population -1.244e-04 2.263e-05 -5.499 3.88e-08 ***
IsMetroCity -6.369e+02 2.132e+02 -2.988 0.00282 **
IsTouristDestination 1.918e+03 1.374e+02 13.958 < 2e-16 ***
IsNewYearEve 8.430e+02 1.739e+02 4.849 1.26e-06 ***
StarRating 3.598e+03 1.104e+02 32.582 < 2e-16 ***
Airport 1.001e+01 2.716e+00 3.684 0.00023 ***
FreeWifi 5.952e+02 2.217e+02 2.685 0.00726 **
HotelCapacity -1.040e+01 1.029e+00 -10.115 < 2e-16 ***
HasSwimmingPool 2.147e+03 1.598e+02 13.434 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for gaussian family taken to be 43565102)
Null deviance: 7.1149e+11 on 13231 degrees of freedom
Residual deviance: 5.7602e+11 on 13222 degrees of freedom
AIC: 270311
Number of Fisher Scoring iterations: 2
coefplot(newfit1,intercept= FALSE, outerCI=1.96)
Linear Regression model 1 has AIC value = 270314.1 Logisitc Regression model 2 has AIC value = 270311
So, Logistic Regression model 2 is better than Linear Regression model 1.
Hypothesis :-
H1: Price of hotel room is higher if starraing is high than Price of hotel room if starrating is low.
H2: Price of Hotel Room is higher at tourist destination than Price of Hotel Room at non-tourist destination.
H3: Price of Hotel Room is higher if the hotel Capacity is full than Price of Hotel Room if the hotel capacity is low.
Running T-Test on these three hypothesis, we found that these hypothesis are true.
We have considered all 18 factors on room rent(response variable) to see how the pricing strategy of hotel room changes and we found that not all these 18 factors have significance effect on pricing strategy of hotel rooms. Pricing Startegy of a hotel room changes positvely if it has a Swimming Pool, Free Wifi, Star Rating of hotel is high, on new year eve, at tourist destination and negatively related with metro city means if it is a metro city, there might be some possible chances of price of a hotel room is low and also population factor, Airport distance from hotel and hotel capacity has also a significance effect over the pricing strategy of hotel room. Overall, these are the factors which effect the pricing starategy of a hotel room.