Notice that the dataset tracks hotel prices on 8 different dates at different hotels across different cities. Please browse the dataset. Dependent Variable RoomRent <- Rent for the cheapest room, double occupancy, in Indian Rupees.
External Factors Date <- We have hotel room rent data for the following 8 dates for each hotel: {Dec 31, Dec 25, Dec 24, Dec 18, Dec 21, Dec 28, Jan 4, Jan 8} IsWeekend <- We use ‘0’ to indicate week days, ‘1’ to indicate weekend dates (Sat / Sun) IsNewYearEve <- 1’ for Dec 31, ‘0’ otherwise CityName <- Name of the City where the Hotel is located e.g. Mumbai` Population <- Population of the City in 2011 CityRank <- Rank order of City by Population (e.g. Mumbai = 0, Delhi = 1, so on) IsMetroCity <- ‘1’ if CityName is {Mumbai, Delhi, Kolkatta, Chennai}, ‘0’ otherwise IsTouristDestination <- We use ‘1’ if the city is primarily a tourist destination, ‘0’ otherwise.
Internal Factors Many Hotel Features can influence the RoomRent. The dataset captures some of these internal factors, as explained below.
HotelName <- e.g. Park Hyatt Goa Resort and Spa StarRating <- e.g. 5 Airport <- Distance between Hotel and closest major Airport HotelAddress <- e.g. Arrossim Beach, Cansaulim, Goa HotelPincode <- 403712 HotelDescription <- e.g. 5-star beachfront resort with spa, near Arossim Beach FreeWifi <- ‘1’ if the hotel offers Free Wifi, ‘0’ otherwise FreeBreakfast <- ‘1’ if the hotel offers Free Breakfast, ‘0’ otherwise HotelCapacity <- e.g. 242. (enter ‘0’ if not available) HasSwimmingPool <- ‘1’ if they have a swimming pool, ‘0’ otherwise
hotelpricing = read.csv("Cities42.csv")
View(hotelpricing)
dim(hotelpricing)
## [1] 13232 19
library(psych)
describe(hotelpricing)
## vars n mean sd median trimmed
## CityName* 1 13232 18.07 11.72 16 17.29
## Population 2 13232 4416836.87 4258386.00 3046163 4040816.22
## CityRank 3 13232 14.83 13.51 9 13.30
## IsMetroCity 4 13232 0.28 0.45 0 0.23
## IsTouristDestination 5 13232 0.70 0.46 1 0.75
## IsWeekend 6 13232 0.62 0.48 1 0.65
## IsNewYearEve 7 13232 0.12 0.33 0 0.03
## Date* 8 13232 14.30 2.69 14 14.39
## HotelName* 9 13232 841.19 488.16 827 841.18
## RoomRent 10 13232 5473.99 7333.12 4000 4383.33
## StarRating 11 13232 3.46 0.76 3 3.40
## Airport 12 13232 21.16 22.76 15 16.39
## HotelAddress* 13 13232 1202.53 582.17 1261 1233.25
## HotelPincode 14 13232 397430.26 259837.50 395003 388540.47
## HotelDescription* 15 13224 581.34 363.26 567 575.37
## FreeWifi 16 13232 0.93 0.26 1 1.00
## FreeBreakfast 17 13232 0.65 0.48 1 0.69
## HotelCapacity 18 13232 62.51 76.66 34 46.03
## HasSwimmingPool 19 13232 0.36 0.48 0 0.32
## mad min max range skew
## CityName* 11.86 1.0 42 41.0 0.48
## Population 3846498.95 8096.0 12442373 12434277.0 0.68
## CityRank 11.86 0.0 44 44.0 0.69
## IsMetroCity 0.00 0.0 1 1.0 0.96
## IsTouristDestination 0.00 0.0 1 1.0 -0.86
## IsWeekend 0.00 0.0 1 1.0 -0.51
## IsNewYearEve 0.00 0.0 1 1.0 2.28
## Date* 2.97 1.0 20 19.0 -0.77
## HotelName* 641.97 1.0 1670 1669.0 0.01
## RoomRent 2653.85 299.0 322500 322201.0 16.75
## StarRating 0.74 0.0 5 5.0 0.48
## Airport 11.12 0.2 124 123.8 2.73
## HotelAddress* 668.65 1.0 2108 2107.0 -0.37
## HotelPincode 257975.37 100025.0 7000157 6900132.0 9.99
## HotelDescription* 472.95 1.0 1226 1225.0 0.11
## FreeWifi 0.00 0.0 1 1.0 -3.25
## FreeBreakfast 0.00 0.0 1 1.0 -0.62
## HotelCapacity 28.17 0.0 600 600.0 2.95
## HasSwimmingPool 0.00 0.0 1 1.0 0.60
## kurtosis se
## CityName* -0.88 0.10
## Population -1.08 37019.65
## CityRank -0.76 0.12
## IsMetroCity -1.08 0.00
## IsTouristDestination -1.26 0.00
## IsWeekend -1.74 0.00
## IsNewYearEve 3.18 0.00
## Date* 1.92 0.02
## HotelName* -1.25 4.24
## RoomRent 582.06 63.75
## StarRating 0.25 0.01
## Airport 7.89 0.20
## HotelAddress* -0.88 5.06
## HotelPincode 249.76 2258.86
## HotelDescription* -1.25 3.16
## FreeWifi 8.57 0.00
## FreeBreakfast -1.61 0.00
## HotelCapacity 11.39 0.67
## HasSwimmingPool -1.64 0.00
table(hotelpricing$CityName)
##
## Agra Ahmedabad Amritsar Bangalore
## 432 424 136 656
## Bhubaneswar Chandigarh Chennai Darjeeling
## 120 336 416 136
## Delhi Gangtok Goa Guwahati
## 2048 128 624 48
## Haridwar Hyderabad Indore Jaipur
## 48 536 160 768
## Jaisalmer Jodhpur Kanpur Kochi
## 264 224 16 608
## Kolkata Lucknow Madurai Manali
## 512 128 112 288
## Mangalore Mumbai Munnar Mysore
## 104 712 328 160
## Nainital Ooty Panchkula Pune
## 144 136 64 600
## Puri Rajkot Rishikesh Shimla
## 56 128 88 280
## Srinagar Surat Thiruvanthipuram Thrissur
## 40 80 392 32
## Udaipur Varanasi
## 456 264
table(hotelpricing$IsWeekend)
##
## 0 1
## 4991 8241
table(hotelpricing$IsNewYearEve)
##
## 0 1
## 11586 1646
table(hotelpricing$FreeWifi)
##
## 0 1
## 981 12251
table(hotelpricing$FreeBreakfast)
##
## 0 1
## 4643 8589
table(hotelpricing$HasSwimmingPool)
##
## 0 1
## 8524 4708
hist(hotelpricing$StarRating, main = "Star Rating Distribution",col="Blue",xlab = "Stars")
hist(hotelpricing$Airport, main = "Distance to the nearest Airport",col = "Green", xlab = "Distance to the nearest Airport")
hist(hotelpricing$HotelCapacity, main = "Capacity of hotels",col = "Sky Blue", xlab = "Capacity of hotels")
aggregate(hotelpricing$RoomRent, by=list(Weekend = hotelpricing$IsWeekend, Newyear_Serve = hotelpricing$IsNewYearEve), mean)
## Weekend Newyear_Serve x
## 1 0 0 5429.473
## 2 1 0 5320.820
## 3 0 1 8829.500
## 4 1 1 6219.655
aggregate(hotelpricing$RoomRent, by=list(Metro_City = hotelpricing$IsMetroCity, Tourist_Place = hotelpricing$IsTouristDestination), mean)
## Metro_City Tourist_Place x
## 1 0 0 4006.435
## 2 1 0 4646.136
## 3 0 1 6755.728
## 4 1 1 4706.608
aggregate(hotelpricing$RoomRent, by=list(Free_Wifi = hotelpricing$FreeWifi, Free_Breakfast = hotelpricing$FreeBreakfast, SwimmingPool = hotelpricing$HasSwimmingPool), mean)
## Free_Wifi Free_Breakfast SwimmingPool x
## 1 0 0 0 3538.085
## 2 1 0 0 3148.628
## 3 0 1 0 5636.617
## 4 1 1 0 3984.457
## 5 0 0 1 7378.590
## 6 1 0 1 9530.906
## 7 0 1 1 5207.000
## 8 1 1 1 8246.284
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplotMatrix(formula = ~ RoomRent + IsWeekend + IsNewYearEve, data = hotelpricing, pch = 16)
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
#Correlation matrix of Room Rent with IsWeekend and IsnewyearEve
cor(hotelpricing$RoomRent, hotelpricing[ ,c("IsWeekend","IsNewYearEve")])
## IsWeekend IsNewYearEve
## [1,] 0.004580134 0.03849123
library(corrgram)
corrgram(hotelpricing[c("RoomRent","IsWeekend","IsNewYearEve")], upper.panel = panel.pie)
aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsWeekend),mean)
## Group.1 x
## 1 0 5430.835
## 2 1 5500.129
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsWeekend, ylim = c(0,30000), col = "blue")
corrgram(hotelpricing[c("RoomRent","IsWeekend")], upper.panel = panel.pie, lower.panel = panel.cor)
#T-Test
#Null Hypothesis - Their is no difference between Room rent on other variables
t.test(hotelpricing$RoomRent ~ hotelpricing$IsWeekend)
##
## Welch Two Sample t-test
##
## data: hotelpricing$RoomRent by hotelpricing$IsWeekend
## t = -0.51853, df = 9999.4, p-value = 0.6041
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -331.2427 192.6559
## sample estimates:
## mean in group 0 mean in group 1
## 5430.835 5500.129
#The P-Value = 0.6 (which is > 0.05), we fail to reject the Null Hypothesis.
aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsNewYearEve),mean)
## Group.1 x
## 1 0 5367.606
## 2 1 6222.826
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsNewYearEve, col = "grey")
corrgram(hotelpricing[c("RoomRent","IsNewYearEve")], upper.panel = panel.pie, lower.panel = panel.cor)
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsNewYearEve, ylim = c(0,30000), col = "yellow")
#T-test
#Null Hypothesis -> Their is no difference between the Room Rent on new year's eve and on other days.
t.test(hotelpricing$RoomRent ~ hotelpricing$IsNewYearEve)
##
## Welch Two Sample t-test
##
## data: hotelpricing$RoomRent by hotelpricing$IsNewYearEve
## t = -4.1793, df = 2065, p-value = 3.046e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1256.5297 -453.9099
## sample estimates:
## mean in group 0 mean in group 1
## 5367.606 6222.826
#The P-Value = 3.046e-05 (which is <0.05)
aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsNewYearEve),mean)
## Group.1 x
## 1 0 5367.606
## 2 1 6222.826
#Making a linear model
model1 <- lm(RoomRent ~ IsWeekend + IsNewYearEve, data = hotelpricing)
summary(model1)
##
## Call:
## lm(formula = RoomRent ~ IsWeekend + IsNewYearEve, data = hotelpricing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5874 -3031 -1436 808 317180
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5430.5 103.7 52.353 < 2e-16 ***
## IsWeekend -110.4 137.4 -0.803 0.422
## IsNewYearEve 902.6 201.8 4.472 7.82e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7328 on 13229 degrees of freedom
## Multiple R-squared: 0.00153, Adjusted R-squared: 0.001379
## F-statistic: 10.14 on 2 and 13229 DF, p-value: 3.987e-05
#Effect of External factors on Room Rent
scatterplotMatrix(formula = ~ RoomRent + CityRank + IsMetroCity + IsTouristDestination, data = hotelpricing, pch = 16)
cor(hotelpricing$RoomRent, hotelpricing[ ,c("CityRank","IsMetroCity","IsTouristDestination")])
## CityRank IsMetroCity IsTouristDestination
## [1,] 0.09398553 -0.06683977 0.122503
corrgram(hotelpricing[c("RoomRent","CityRank","IsMetroCity","IsTouristDestination")], upper.panel = panel.pie)
aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsMetroCity),mean)
## Group.1 x
## 1 0 5782.794
## 2 1 4696.073
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsMetroCity, col = "purple")
corrgram(hotelpricing[c("RoomRent","IsMetroCity")], upper.panel = panel.pie,lower.panel = panel.cor)
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsMetroCity, ylim = c(0,15000),col = "green")
#T-Test
#Null Hypothesis - There is no difference between the Room Rent of Metro Cities and other cities
t.test(hotelpricing$RoomRent ~ hotelpricing$IsMetroCity)
##
## Welch Two Sample t-test
##
## data: hotelpricing$RoomRent by hotelpricing$IsMetroCity
## t = 10.721, df = 13224, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 888.0308 1285.4102
## sample estimates:
## mean in group 0 mean in group 1
## 5782.794 4696.073
#The P-Value = 2.2e-16 (<0.05) Therefore we reject the Null Hypothesis.
aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsMetroCity),mean)
## Group.1 x
## 1 0 5782.794
## 2 1 4696.073
#Room rents based on Tourist Destination
aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsTouristDestination),mean)
## Group.1 x
## 1 0 4111.003
## 2 1 6066.024
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsTouristDestination, col = "Pink")
corrgram(hotelpricing[c("RoomRent","IsTouristDestination")], upper.panel = panel.pie, lower.panel = panel.cor)
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsTouristDestination, ylim = c(0,30000), col = "green")
#T-Test
#Null Hypothesis - There is no Difference between the Room Rent of Tourist Destination Cities and other Cities.
t.test(hotelpricing$RoomRent ~ hotelpricing$IsTouristDestination)
##
## Welch Two Sample t-test
##
## data: hotelpricing$RoomRent by hotelpricing$IsTouristDestination
## t = -19.449, df = 12888, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2152.059 -1757.983
## sample estimates:
## mean in group 0 mean in group 1
## 4111.003 6066.024
#The P-Value = 2.2e-16 (<0.05), therefore we reject the Null Hupothesis.
#Linear Model
model2 <- lm(RoomRent ~ CityRank + IsMetroCity + IsTouristDestination, data = hotelpricing)
summary(model2)
##
## Call:
## lm(formula = RoomRent ~ CityRank + IsMetroCity + IsTouristDestination,
## data = hotelpricing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6239 -2875 -1285 1052 315988
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4309.432 140.413 30.691 < 2e-16 ***
## CityRank 3.627 6.393 0.567 0.57
## IsMetroCity -1415.327 186.746 -7.579 3.72e-14 ***
## IsTouristDestination 2170.105 157.659 13.765 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7249 on 13228 degrees of freedom
## Multiple R-squared: 0.0231, Adjusted R-squared: 0.02288
## F-statistic: 104.3 on 3 and 13228 DF, p-value: < 2.2e-16
#Effect of internal factors on room rents
scatterplotMatrix(formula = ~ RoomRent + StarRating + Airport + FreeWifi + FreeBreakfast + HotelCapacity + HasSwimmingPool, data = hotelpricing)
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit positive part of the spread
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
#Correlation matrix of Room Rent with Internal Factors
cor(hotelpricing$RoomRent, hotelpricing[,c("StarRating", "Airport", "FreeWifi", "FreeBreakfast", "HotelCapacity", "HasSwimmingPool")])
## StarRating Airport FreeWifi FreeBreakfast HotelCapacity
## [1,] 0.3693734 0.04965324 0.003627002 -0.01000637 0.1578733
## HasSwimmingPool
## [1,] 0.3116577
corrgram(hotelpricing[c("RoomRent", "StarRating", "Airport", "FreeWifi", "FreeBreakfast", "HotelCapacity", "HasSwimmingPool")], upper.panel = panel.pie)
aggregate(hotelpricing$RoomRent, by=list(Ratings = hotelpricing$StarRating), mean)
## Ratings x
## 1 0.0 7237.125
## 2 1.0 686.625
## 3 2.0 2783.166
## 4 2.5 2520.816
## 5 3.0 3694.811
## 6 3.2 15937.500
## 7 3.3 2841.062
## 8 3.4 23437.500
## 9 3.5 4843.346
## 10 3.6 7769.500
## 11 3.7 6701.958
## 12 3.8 5400.062
## 13 3.9 13062.750
## 14 4.0 6393.105
## 15 4.1 19075.000
## 16 4.3 7423.125
## 17 4.4 5563.500
## 18 4.5 8699.920
## 19 4.7 10125.000
## 20 4.8 46752.812
## 21 5.0 12398.221
boxplot(hotelpricing$RoomRent ~ hotelpricing$StarRating, ylim = c(0,50000), col = "grey")
plot(hotelpricing$RoomRent, hotelpricing$StarRating, xlab = "Room Rents", ylab = "Star Ratings", main = "Plot of ratings vs rent")
#Testing Correlation
cor(hotelpricing$RoomRent , hotelpricing$StarRating)
## [1] 0.3693734
corrgram(hotelpricing[c("RoomRent","StarRating")], upper.panel = panel.pie, lower.panel = panel.cor)
cor.test(hotelpricing$RoomRent , hotelpricing$StarRating)
##
## Pearson's product-moment correlation
##
## data: hotelpricing$RoomRent and hotelpricing$StarRating
## t = 45.719, df = 13230, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3545660 0.3839956
## sample estimates:
## cor
## 0.3693734
#Distance from Airport and Room Rent
plot(hotelpricing$RoomRent, hotelpricing$Airport)
cor(hotelpricing$RoomRent, hotelpricing$Airport)
## [1] 0.04965324
corrgram(hotelpricing[c("RoomRent","Airport")], upper.panel = panel.pie, lower.panel = panel.cor)
cor.test(hotelpricing$RoomRent, hotelpricing$Airport)
##
## Pearson's product-moment correlation
##
## data: hotelpricing$RoomRent and hotelpricing$Airport
## t = 5.7183, df = 13230, p-value = 1.099e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03264192 0.06663581
## sample estimates:
## cor
## 0.04965324
#Free Wifi and Room Rents
aggregate(hotelpricing$RoomRent, by=list(freewifi = hotelpricing$FreeWifi), mean)
## freewifi x
## 1 0 5380.004
## 2 1 5481.518
boxplot(hotelpricing$RoomRent ~ hotelpricing$FreeWifi, ylim = c(0,30000))
corrgram(hotelpricing[c("RoomRent","FreeWifi")], upper.panel = panel.pie, lower.panel = panel.cor)
#Null Hypothesis: Their is no difference in the means of room Rent where free wifi is available or not
t.test(hotelpricing$RoomRent~ hotelpricing$FreeWifi)
##
## Welch Two Sample t-test
##
## data: hotelpricing$RoomRent by hotelpricing$FreeWifi
## t = -0.76847, df = 1804.7, p-value = 0.4423
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -360.5977 157.5701
## sample estimates:
## mean in group 0 mean in group 1
## 5380.004 5481.518
#As p-value = 0.44 (>0.05) so we fail to reject the Null hypothesis.
#Free Breakfast and Room Rent
aggregate(hotelpricing$RoomRent, by=list(freebreakfast = hotelpricing$FreeBreakfast), mean)
## freebreakfast x
## 1 0 5573.790
## 2 1 5420.044
boxplot(hotelpricing$RoomRent ~ hotelpricing$FreeBreakfast, ylim = c(0,30000))
corrgram(hotelpricing[c("RoomRent","FreeBreakfast")], upper.panel = panel.pie, lower.panel = panel.cor)
#T-Testing
#Null Hypothesis: There is no difference in the means of Room Rent where free Breakfast is available or not
t.test(hotelpricing$RoomRent~ hotelpricing$FreeBreakfast)
##
## Welch Two Sample t-test
##
## data: hotelpricing$RoomRent by hotelpricing$FreeBreakfast
## t = 0.98095, df = 6212.3, p-value = 0.3267
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -153.5017 460.9935
## sample estimates:
## mean in group 0 mean in group 1
## 5573.790 5420.044
#capacity of the hotel and Room Rent
plot(hotelpricing$HotelCapacity, hotelpricing$RoomRent, xlab = "Hotel Capacity", ylab = "Room Rent")
cor(hotelpricing$RoomRent, hotelpricing$HotelCapacity)
## [1] 0.1578733
corrgram(hotelpricing[c("RoomRent","HotelCapacity")], upper.panel = panel.pie, lower.panel = panel.cor)
cor.test(hotelpricing$RoomRent, hotelpricing$HotelCapacity)
##
## Pearson's product-moment correlation
##
## data: hotelpricing$RoomRent and hotelpricing$HotelCapacity
## t = 18.389, df = 13230, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1412142 0.1744430
## sample estimates:
## cor
## 0.1578733
#Sincesince the p-value <0.05, we reject the Null hypothesis.
#Effect of swimming pool of Room Rent
aggregate(hotelpricing$RoomRent, by=list(HasSwimmingPool = hotelpricing$HasSwimmingPool), mean)
## HasSwimmingPool x
## 1 0 3775.566
## 2 1 8549.052
boxplot(hotelpricing$RoomRent ~ hotelpricing$HasSwimmingPool, ylim = c(0,15000), col = "blue")
corrgram(hotelpricing[c("RoomRent","HasSwimmingPool")], upper.panel = panel.pie, lower.panel = panel.cor)
#T-Testing
#Null Hypothesis: There is no difference in the means of Room Rent where Swimming Pool is available.
t.test(hotelpricing$RoomRent~ hotelpricing$HasSwimmingPool)
##
## Welch Two Sample t-test
##
## data: hotelpricing$RoomRent by hotelpricing$HasSwimmingPool
## t = -29.013, df = 5011.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5096.030 -4450.942
## sample estimates:
## mean in group 0 mean in group 1
## 3775.566 8549.052
#Since the p-value <0.05, so we reject the Null hypothesis.
#Creating a model for Internal Factors
model3= lm(RoomRent ~ StarRating + Airport + FreeWifi + FreeBreakfast + HotelCapacity + HasSwimmingPool, data = hotelpricing)
summary(model3)
##
## Call:
## lm(formula = RoomRent ~ StarRating + Airport + FreeWifi + FreeBreakfast +
## HotelCapacity + HasSwimmingPool, data = hotelpricing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10783 -2286 -875 967 310387
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7455.073 396.215 -18.816 <2e-16 ***
## StarRating 3519.285 111.754 31.491 <2e-16 ***
## Airport 25.627 2.604 9.840 <2e-16 ***
## FreeWifi 227.843 226.177 1.007 0.314
## FreeBreakfast -59.313 123.964 -0.478 0.632
## HotelCapacity -14.786 1.009 -14.660 <2e-16 ***
## HasSwimmingPool 2714.146 158.681 17.104 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6687 on 13225 degrees of freedom
## Multiple R-squared: 0.1689, Adjusted R-squared: 0.1685
## F-statistic: 447.9 on 6 and 13225 DF, p-value: < 2.2e-16
model4 <- lm(RoomRent ~ IsNewYearEve + IsMetroCity + IsTouristDestination + StarRating + Airport + HotelCapacity + HasSwimmingPool, data = hotelpricing)
summary(model4)
##
## Call:
## lm(formula = RoomRent ~ IsNewYearEve + IsMetroCity + IsTouristDestination +
## StarRating + Airport + HotelCapacity + HasSwimmingPool, data = hotelpricing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11621 -2342 -706 1039 309463
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8362.318 351.306 -23.804 < 2e-16 ***
## IsNewYearEve 843.295 174.085 4.844 1.29e-06 ***
## IsMetroCity -1502.844 137.569 -10.924 < 2e-16 ***
## IsTouristDestination 2074.969 133.499 15.543 < 2e-16 ***
## StarRating 3583.270 110.317 32.482 < 2e-16 ***
## Airport 11.057 2.699 4.096 4.22e-05 ***
## HotelCapacity -11.252 1.020 -11.030 < 2e-16 ***
## HasSwimmingPool 2211.800 159.259 13.888 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6609 on 13224 degrees of freedom
## Multiple R-squared: 0.1882, Adjusted R-squared: 0.1878
## F-statistic: 438 on 7 and 13224 DF, p-value: < 2.2e-16
After analyzing all the variables, we can say that there are 7 variables which are affecting the price of the Room of the hotels.
StarRating IsTouristDestination HotelCapacity HasSwimmingPool IsNewYearEve Airport IsMetroCity
End Of Project