This project is about the hotel room pricing in the indian market over different cities. It tells you about what affects the pricing system of a hotel room. Like whether the city is a tourist place or whetherit is a weekend that affets the pricing of a hotel room.There are many other things which also affects the pricing of a hotel room like breakfast, swimming pool, any special occasion(like new years eve), etc.
The objective of this project is to identify the factors that matter the most. This dataset consistof data from different hoels located in different cities.
Alt text
Alt text
Alt text
The data cosist of many variables like the city name, population of that city, whether the city is tourist place or not, whether the city is metro city or not, what is the room rent, is it a 5star hotel or not, hotel adress and description, whether there is a near long weekend, what is capacity of the hotel, free breakfast and wifi is included or not, is there any swimming pool for guests or not
hotels.df <- read.csv(paste("Cities42.csv", sep=""))
attach(hotels.df)
head(hotels.df)
## CityName Population CityRank IsMetroCity IsTouristDestination IsWeekend
## 1 Mumbai 12442373 0 1 1 1
## 2 Mumbai 12442373 0 1 1 0
## 3 Mumbai 12442373 0 1 1 1
## 4 Mumbai 12442373 0 1 1 1
## 5 Mumbai 12442373 0 1 1 0
## 6 Mumbai 12442373 0 1 1 1
## IsNewYearEve Date HotelName RoomRent StarRating Airport
## 1 0 Dec 18 2016 Vivanta by Taj 12375 5 21
## 2 0 Dec 21 2016 Vivanta by Taj 10250 5 21
## 3 0 Dec 24 2016 Vivanta by Taj 9900 5 21
## 4 0 Dec 25 2016 Vivanta by Taj 10350 5 21
## 5 0 Dec 28 2016 Vivanta by Taj 12000 5 21
## 6 1 Dec 31 2016 Vivanta by Taj 11475 5 21
## HotelAddress HotelPincode
## 1 90 Cuffe Parade, Colaba, Mumbai, Maharashtra 400005
## 2 91 Cuffe Parade, Colaba, Mumbai, Maharashtra 400006
## 3 92 Cuffe Parade, Colaba, Mumbai, Maharashtra 400007
## 4 93 Cuffe Parade, Colaba, Mumbai, Maharashtra 400008
## 5 94 Cuffe Parade, Colaba, Mumbai, Maharashtra 400009
## 6 95 Cuffe Parade, Colaba, Mumbai, Maharashtra 400010
## HotelDescription FreeWifi FreeBreakfast
## 1 Luxury hotel with spa, near Gateway of India 1 0
## 2 Luxury hotel with spa, near Gateway of India 1 0
## 3 Luxury hotel with spa, near Gateway of India 1 0
## 4 Luxury hotel with spa, near Gateway of India 1 0
## 5 Luxury hotel with spa, near Gateway of India 1 0
## 6 Luxury hotel with spa, near Gateway of India 1 0
## HotelCapacity HasSwimmingPool
## 1 287 1
## 2 287 1
## 3 287 1
## 4 287 1
## 5 287 1
## 6 287 1
dim(hotels.df)
## [1] 13232 19
library(psych)
describe(hotels.df)
## vars n mean sd median trimmed
## CityName* 1 13232 18.07 11.72 16 17.29
## Population 2 13232 4416836.87 4258386.00 3046163 4040816.22
## CityRank 3 13232 14.83 13.51 9 13.30
## IsMetroCity 4 13232 0.28 0.45 0 0.23
## IsTouristDestination 5 13232 0.70 0.46 1 0.75
## IsWeekend 6 13232 0.62 0.48 1 0.65
## IsNewYearEve 7 13232 0.12 0.33 0 0.03
## Date* 8 13232 14.30 2.69 14 14.39
## HotelName* 9 13232 841.19 488.16 827 841.18
## RoomRent 10 13232 5473.99 7333.12 4000 4383.33
## StarRating 11 13232 3.46 0.76 3 3.40
## Airport 12 13232 21.16 22.76 15 16.39
## HotelAddress* 13 13232 1202.53 582.17 1261 1233.25
## HotelPincode 14 13232 397430.26 259837.50 395003 388540.47
## HotelDescription* 15 13224 581.34 363.26 567 575.37
## FreeWifi 16 13232 0.93 0.26 1 1.00
## FreeBreakfast 17 13232 0.65 0.48 1 0.69
## HotelCapacity 18 13232 62.51 76.66 34 46.03
## HasSwimmingPool 19 13232 0.36 0.48 0 0.32
## mad min max range skew
## CityName* 11.86 1.0 42 41.0 0.48
## Population 3846498.95 8096.0 12442373 12434277.0 0.68
## CityRank 11.86 0.0 44 44.0 0.69
## IsMetroCity 0.00 0.0 1 1.0 0.96
## IsTouristDestination 0.00 0.0 1 1.0 -0.86
## IsWeekend 0.00 0.0 1 1.0 -0.51
## IsNewYearEve 0.00 0.0 1 1.0 2.28
## Date* 2.97 1.0 20 19.0 -0.77
## HotelName* 641.97 1.0 1670 1669.0 0.01
## RoomRent 2653.85 299.0 322500 322201.0 16.75
## StarRating 0.74 0.0 5 5.0 0.48
## Airport 11.12 0.2 124 123.8 2.73
## HotelAddress* 668.65 1.0 2108 2107.0 -0.37
## HotelPincode 257975.37 100025.0 7000157 6900132.0 9.99
## HotelDescription* 472.95 1.0 1226 1225.0 0.11
## FreeWifi 0.00 0.0 1 1.0 -3.25
## FreeBreakfast 0.00 0.0 1 1.0 -0.62
## HotelCapacity 28.17 0.0 600 600.0 2.95
## HasSwimmingPool 0.00 0.0 1 1.0 0.60
## kurtosis se
## CityName* -0.88 0.10
## Population -1.08 37019.65
## CityRank -0.76 0.12
## IsMetroCity -1.08 0.00
## IsTouristDestination -1.26 0.00
## IsWeekend -1.74 0.00
## IsNewYearEve 3.18 0.00
## Date* 1.92 0.02
## HotelName* -1.25 4.24
## RoomRent 582.06 63.75
## StarRating 0.25 0.01
## Airport 7.89 0.20
## HotelAddress* -0.88 5.06
## HotelPincode 249.76 2258.86
## HotelDescription* -1.25 3.16
## FreeWifi 8.57 0.00
## FreeBreakfast -1.61 0.00
## HotelCapacity 11.39 0.67
## HasSwimmingPool -1.64 0.00
## 'data.frame': 13232 obs. of 19 variables:
## $ CityName : Factor w/ 42 levels "Agra","Ahmedabad",..: 26 26 26 26 26 26 26 26 26 26 ...
## $ Population : int 12442373 12442373 12442373 12442373 12442373 12442373 12442373 12442373 12442373 12442373 ...
## $ CityRank : int 0 0 0 0 0 0 0 0 0 0 ...
## $ IsMetroCity : int 1 1 1 1 1 1 1 1 1 1 ...
## $ IsTouristDestination: int 1 1 1 1 1 1 1 1 1 1 ...
## $ IsWeekend : int 1 0 1 1 0 1 0 1 1 0 ...
## $ IsNewYearEve : int 0 0 0 0 0 1 0 0 0 0 ...
## $ Date : Factor w/ 20 levels "04-Jan-16","04-Jan-17",..: 11 12 13 14 15 16 17 18 11 12 ...
## $ HotelName : Factor w/ 1670 levels "14 Square Amanora",..: 1635 1635 1635 1635 1635 1635 1635 1635 1409 1409 ...
## $ RoomRent : int 12375 10250 9900 10350 12000 11475 11220 9225 6800 9350 ...
## $ StarRating : num 5 5 5 5 5 5 5 5 4 4 ...
## $ Airport : num 21 21 21 21 21 21 21 21 20 20 ...
## $ HotelAddress : Factor w/ 2108 levels " H.P. High Court Mall Road, Shimla",..: 925 928 930 933 935 937 940 941 699 746 ...
## $ HotelPincode : int 400005 400006 400007 400008 400009 400010 400011 400012 400039 400040 ...
## $ HotelDescription : Factor w/ 1226 levels "#NAME?","10 star hotel near Queensroad, Amritsar",..: 1030 1030 1030 1030 1030 1030 1030 1030 1006 1006 ...
## $ FreeWifi : int 1 1 1 1 1 1 1 1 1 1 ...
## $ FreeBreakfast : int 0 0 0 0 0 0 0 0 1 1 ...
## $ HotelCapacity : int 287 287 287 287 287 287 287 287 28 28 ...
## $ HasSwimmingPool : int 1 1 1 1 1 1 1 1 0 0 ...
We assume that We create a model that depicts how the variables effect the Room Rent
Our model will be like y = B0 + B1x1 + B2x2 + B3x3 + E
y - Room Rent (dependent variable). B0 - intercept. B1, B2, B3 . - Beta coefficients for different variables x1, x2, x3. x1, x2, x3 - CityRank, MetroCity, TouristDestination (independent variables). E - error term.
m2 <- lm(RoomRent ~ CityRank + IsMetroCity + IsTouristDestination, data = hotels.df)
summary(m2)
##
## Call:
## lm(formula = RoomRent ~ CityRank + IsMetroCity + IsTouristDestination,
## data = hotels.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6239 -2875 -1285 1052 315988
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4309.432 140.413 30.691 < 2e-16 ***
## CityRank 3.627 6.393 0.567 0.57
## IsMetroCity -1415.327 186.746 -7.579 3.72e-14 ***
## IsTouristDestination 2170.105 157.659 13.765 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7249 on 13228 degrees of freedom
## Multiple R-squared: 0.0231, Adjusted R-squared: 0.02288
## F-statistic: 104.3 on 3 and 13228 DF, p-value: < 2.2e-16
We assume that We create a model that depicts how the variables effect the Room Rent
Our model will be like y = B0 + B1x1 + B2x2 + B3x3.. + E
y - Room Rent (dependent variable). B0 - intercept. B1, B2, B3 . - Beta coefficients for different variables x1, x2, x3. x1, x2, x3 - StarRating, Dist to airport, FreeWifi, etc (independent variables). E - error term.
m3 <- lm(RoomRent ~ StarRating + Airport + FreeWifi + FreeBreakfast + HotelCapacity + HasSwimmingPool, data = hotels.df)
summary(m3)
##
## Call:
## lm(formula = RoomRent ~ StarRating + Airport + FreeWifi + FreeBreakfast +
## HotelCapacity + HasSwimmingPool, data = hotels.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10783 -2286 -875 967 310387
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7455.073 396.215 -18.816 <2e-16 ***
## StarRating 3519.285 111.754 31.491 <2e-16 ***
## Airport 25.627 2.604 9.840 <2e-16 ***
## FreeWifi 227.843 226.177 1.007 0.314
## FreeBreakfast -59.313 123.964 -0.478 0.632
## HotelCapacity -14.786 1.009 -14.660 <2e-16 ***
## HasSwimmingPool 2714.146 158.681 17.104 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6687 on 13225 degrees of freedom
## Multiple R-squared: 0.1689, Adjusted R-squared: 0.1685
## F-statistic: 447.9 on 6 and 13225 DF, p-value: < 2.2e-16
We can clearly see that Free Wifi and Free Breakfast doesn’t significantly effect the room rents as its p-value > 0.05 and Rest factors effects the Roomrent significantly as its p-value < 0.05
We assume that We create a model that depicts how the variables effect the Room Rent
Our model will be like y = B0 + B1x1 + B2x2 + B3x3 + E
y - Room Rent (dependent variable). B0 - intercept. B1, B2, B3 . - Beta coefficients for different variables x1, x2, x3. x1, x2, x3 - dates, external factors, internal factors (independent variables). E - error term.
m4 <- lm(RoomRent ~ IsNewYearEve + IsMetroCity + IsTouristDestination + StarRating + Airport + HotelCapacity + HasSwimmingPool, data = hotels.df)
summary(m4)
##
## Call:
## lm(formula = RoomRent ~ IsNewYearEve + IsMetroCity + IsTouristDestination +
## StarRating + Airport + HotelCapacity + HasSwimmingPool, data = hotels.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11621 -2342 -706 1039 309463
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8362.318 351.306 -23.804 < 2e-16 ***
## IsNewYearEve 843.295 174.085 4.844 1.29e-06 ***
## IsMetroCity -1502.844 137.569 -10.924 < 2e-16 ***
## IsTouristDestination 2074.969 133.499 15.543 < 2e-16 ***
## StarRating 3583.270 110.317 32.482 < 2e-16 ***
## Airport 11.057 2.699 4.096 4.22e-05 ***
## HotelCapacity -11.252 1.020 -11.030 < 2e-16 ***
## HasSwimmingPool 2211.800 159.259 13.888 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6609 on 13224 degrees of freedom
## Multiple R-squared: 0.1882, Adjusted R-squared: 0.1878
## F-statistic: 438 on 7 and 13224 DF, p-value: < 2.2e-16
Room Rent = f(NewYearseve, IsMetroCity, IsTouristDestination, StarRating, Distance from the airport, HotelCapacity and HasSwimmingPool)
StarRating
IsTouristDestination
HotelCapacity
HasSwimmingPool
IsNewYearEve
Airport
IsMetroCity
mytable <- with(hotels.df, table(StarRating))
mytable
## StarRating
## 0 1 2 2.5 3 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4 4.1
## 16 8 440 632 5953 8 16 8 1752 8 24 16 32 2463 24
## 4.3 4.4 4.5 4.7 4.8 5
## 16 8 376 8 16 1408
mytable1 <- with(hotels.df, table(IsMetroCity))
mytable1
## IsMetroCity
## 0 1
## 9472 3760
mytable2 <- with(hotels.df, table(FreeBreakfast))
mytable2
## FreeBreakfast
## 0 1
## 4643 8589
mytable3 <- with(hotels.df, table(CityRank))
mytable3
## CityRank
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## 712 2048 656 416 536 424 512 80 600 768 32 128 16 136 160
## 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
## 432 448 624 128 264 40 224 336 392 48 160 120 272 104 456
## 32 33 34 35 36 37 38 39 40 42 43 44
## 48 56 280 64 136 88 128 136 264 144 328 288
mytable4 <- with(hotels.df, table(IsTouristDestination))
mytable4
## IsTouristDestination
## 0 1
## 4007 9225
mytable <- xtabs(~ FreeBreakfast+StarRating, data=hotels.df)
mytable
## StarRating
## FreeBreakfast 0 1 2 2.5 3 3.2 3.3 3.4 3.5 3.6 3.7 3.8
## 0 16 0 216 296 1789 0 8 0 661 8 0 8
## 1 0 8 224 336 4164 8 8 8 1091 0 24 8
## StarRating
## FreeBreakfast 3.9 4 4.1 4.3 4.4 4.5 4.7 4.8 5
## 0 16 783 0 16 0 224 8 0 594
## 1 16 1680 24 0 8 152 0 16 814
mytable1 <- xtabs(~ IsMetroCity+StarRating, data=hotels.df)
mytable1
## StarRating
## IsMetroCity 0 1 2 2.5 3 3.2 3.3 3.4 3.5 3.6 3.7 3.8
## 0 16 8 344 456 4336 8 16 8 1312 0 24 16
## 1 0 0 96 176 1617 0 0 0 440 8 0 0
## StarRating
## IsMetroCity 3.9 4 4.1 4.3 4.4 4.5 4.7 4.8 5
## 0 32 1696 24 16 8 288 8 16 840
## 1 0 767 0 0 0 88 0 0 568
mytable2 <- xtabs(~ IsMetroCity+IsTouristDestination, data=hotels.df)
mytable2
## IsTouristDestination
## IsMetroCity 0 1
## 0 3352 6120
## 1 655 3105
boxplot(hotels.df$CityRank , horizontal =TRUE,main="Rank of the cities",col = "lightblue" )
boxplot(hotels.df$Population , horizontal =TRUE, main="Population",col = "yellow" )
boxplot(hotels.df$StarRating ~ hotels.df$FreeBreakfast, horizontal=TRUE,
ylab="breakfast avalability", xlab="Star ratings", las=1,
main="Analysis of star rating and breakfast avalability",
col=c("pink","yellow")
)
boxplot(hotels.df$RoomRent ~ hotels.df$IsMetroCity, horizontal=TRUE,
ylab="City(metro=1,other=0)", xlab="Room rent", las=1,
main="Analysis of type of city and room rent of hotels",
col=c("red","blue")
)
hist(hotels.df$RoomRent,
main="Analysis of room rents of hotels",
xlab="Rents of room", ylab="Relative frequency",
breaks=30, col="lightblue", freq=FALSE)
hist(hotels.df$StarRating,
main="Analysis of star ratings of hotels",
xlab="Star ratings", ylab="Relative frequency",
breaks=30, col="red", freq=FALSE)
hist(hotels.df$Population, main= "Population" ,xlab="Population" ,col = "peachpuff")
hist(hotels.df$HotelCapacity, main = "Capacity of hotels", xlab = "Hotel Capacity", col = "blue")
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(RoomRent~StarRating, data=hotels.df,
spread=FALSE, smoother.args=list(lty=2),
main="Scatter plot of Star Rating vs Room rent",
ylab="Room Rent",
xlab="Star Rating")
scatterplotMatrix(formula = ~ RoomRent + IsWeekend + IsNewYearEve +Airport , data = hotels.df, pch = 16)
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
library(car)
scatterplot(x = hotels.df$Population , y = hotels.df$CityRank, main="Population Vs City Rank " , xlab="Population", ylab="City rank")
cor(hotels.df[, c(2,3,4,5,6,7,10,11,18)])
## Population CityRank IsMetroCity
## Population 1.0000000000 -0.8353204432 0.7712260105
## CityRank -0.8353204432 1.0000000000 -0.5643937903
## IsMetroCity 0.7712260105 -0.5643937903 1.0000000000
## IsTouristDestination -0.0482029722 0.2807134520 0.1763717063
## IsWeekend 0.0115926802 -0.0072564766 0.0018118005
## IsNewYearEve 0.0007332482 -0.0006326444 0.0006464753
## RoomRent -0.0887280632 0.0939855292 -0.0668397705
## StarRating 0.1341365933 -0.1333810133 0.0776028661
## HotelCapacity 0.2599830516 -0.2561197059 0.1871502153
## IsTouristDestination IsWeekend IsNewYearEve
## Population -0.048202972 0.011592680 0.0007332482
## CityRank 0.280713452 -0.007256477 -0.0006326444
## IsMetroCity 0.176371706 0.001811801 0.0006464753
## IsTouristDestination 1.000000000 -0.019481101 -0.0022663884
## IsWeekend -0.019481101 1.000000000 0.2923820508
## IsNewYearEve -0.002266388 0.292382051 1.0000000000
## RoomRent 0.122502963 0.004580134 0.0384912269
## StarRating -0.040554998 0.006378436 0.0023608970
## HotelCapacity -0.094356091 0.006306507 0.0013526790
## RoomRent StarRating HotelCapacity
## Population -0.088728063 0.134136593 0.259983052
## CityRank 0.093985529 -0.133381013 -0.256119706
## IsMetroCity -0.066839771 0.077602866 0.187150215
## IsTouristDestination 0.122502963 -0.040554998 -0.094356091
## IsWeekend 0.004580134 0.006378436 0.006306507
## IsNewYearEve 0.038491227 0.002360897 0.001352679
## RoomRent 1.000000000 0.369373425 0.157873308
## StarRating 0.369373425 1.000000000 0.637430337
## HotelCapacity 0.157873308 0.637430337 1.000000000
library(corrgram)
corrgram(hotels.df, lower.panel = panel.shade, upper.panel = panel.pie, text.panel = panel.txt, main = "Corrgram of all variables")
cor.test(hotels.df$RoomRent, hotels.df$StarRating)
##
## Pearson's product-moment correlation
##
## data: hotels.df$RoomRent and hotels.df$StarRating
## t = 45.719, df = 13230, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3545660 0.3839956
## sample estimates:
## cor
## 0.3693734
cor.test(hotels.df$RoomRent, hotels.df$IsMetroCity)
##
## Pearson's product-moment correlation
##
## data: hotels.df$RoomRent and hotels.df$IsMetroCity
## t = -7.7053, df = 13230, p-value = 1.399e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08378329 -0.04985761
## sample estimates:
## cor
## -0.06683977
cor.test(hotels.df$RoomRent, hotels.df$CityRank)
##
## Pearson's product-moment correlation
##
## data: hotels.df$RoomRent and hotels.df$CityRank
## t = 10.858, df = 13230, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07707001 0.11084696
## sample estimates:
## cor
## 0.09398553
cor.test(hotels.df$RoomRent, hotels.df$IsNewYearEve)
##
## Pearson's product-moment correlation
##
## data: hotels.df$RoomRent and hotels.df$IsNewYearEve
## t = 4.4306, df = 13230, p-value = 9.472e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.02146637 0.05549377
## sample estimates:
## cor
## 0.03849123
Null Hypothesis - Their is no Difference between the Room Rent on new year’s eve and on other days
t.test(hotels.df$RoomRent ~ hotels.df$IsNewYearEve)
##
## Welch Two Sample t-test
##
## data: hotels.df$RoomRent by hotels.df$IsNewYearEve
## t = -4.1793, df = 2065, p-value = 3.046e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1256.5297 -453.9099
## sample estimates:
## mean in group 0 mean in group 1
## 5367.606 6222.826
P-Value = 3.046e-05 (<0.05) Which is small enough for Rejecting the Null Hupothesis. Hence there is significant difference between the Room Rent on new year’s eve and on other days
Null Hypothesis - Their is no Difference between the Room Rent where wifi is free and other rooms.
t.test(hotels.df$RoomRent ~ hotels.df$FreeWifi)
##
## Welch Two Sample t-test
##
## data: hotels.df$RoomRent by hotels.df$FreeWifi
## t = -0.76847, df = 1804.7, p-value = 0.4423
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -360.5977 157.5701
## sample estimates:
## mean in group 0 mean in group 1
## 5380.004 5481.518
As we can see the P-Value = 0.44 (>0.05) , We Fail To reject the Null Hypothesis. It Shows that Their is No Significant Difference Between the Room Rent where wifi is free and other rooms. Null Hypothesis: Their is no difference in the means of room Rent where free Breakfast is available or not