Problem Definition

  1. Analysis of Hotel Prices Data of hotels of 42 cities.(Summarize, plots etc.)
  2. Visualization of Data
  3. Some T and Chi square tests through data
  4. Correlation between dependent and Independent variables
  5. Find out which all columns / features impact Price of hotel room
  6. Predict the hotel prices with some dummy values.

Attributes:

Notice that the dataset tracks hotel prices on 8 different dates at different hotels across different cities. Please browse the dataset. Dependent Variable RoomRent <- Rent for the cheapest room, double occupancy, in Indian Rupees.

Independent Variables

External Factors Date <- We have hotel room rent data for the following 8 dates for each hotel: {Dec 31, Dec 25, Dec 24, Dec 18, Dec 21, Dec 28, Jan 4, Jan 8} IsWeekend <- We use ‘0’ to indicate week days, ‘1’ to indicate weekend dates (Sat / Sun) IsNewYearEve <- 1’ for Dec 31, ‘0’ otherwise CityName <- Name of the City where the Hotel is located e.g. Mumbai` Population <- Population of the City in 2011 CityRank <- Rank order of City by Population (e.g. Mumbai = 0, Delhi = 1, so on) IsMetroCity <- ‘1’ if CityName is {Mumbai, Delhi, Kolkatta, Chennai}, ‘0’ otherwise IsTouristDestination <- We use ‘1’ if the city is primarily a tourist destination, ‘0’ otherwise.

Internal Factors Many Hotel Features can influence the RoomRent. The dataset captures some of these internal factors, as explained below.

HotelName <- e.g. Park Hyatt Goa Resort and Spa StarRating <- e.g. 5 Airport <- Distance between Hotel and closest major Airport HotelAddress <- e.g. Arrossim Beach, Cansaulim, Goa HotelPincode <- 403712 HotelDescription <- e.g. 5-star beachfront resort with spa, near Arossim Beach FreeWifi <- ‘1’ if the hotel offers Free Wifi, ‘0’ otherwise FreeBreakfast <- ‘1’ if the hotel offers Free Breakfast, ‘0’ otherwise HotelCapacity <- e.g. 242. (enter ‘0’ if not available) HasSwimmingPool <- ‘1’ if they have a swimming pool, ‘0’ otherwise

hotelpricing = read.csv("Cities42.csv")

View(hotelpricing)

dim(hotelpricing)
## [1] 13232    19

Summarising Data

library(psych)
describe(hotelpricing)
##                      vars     n       mean         sd  median    trimmed
## CityName*               1 13232      18.07      11.72      16      17.29
## Population              2 13232 4416836.87 4258386.00 3046163 4040816.22
## CityRank                3 13232      14.83      13.51       9      13.30
## IsMetroCity             4 13232       0.28       0.45       0       0.23
## IsTouristDestination    5 13232       0.70       0.46       1       0.75
## IsWeekend               6 13232       0.62       0.48       1       0.65
## IsNewYearEve            7 13232       0.12       0.33       0       0.03
## Date*                   8 13232      14.30       2.69      14      14.39
## HotelName*              9 13232     841.19     488.16     827     841.18
## RoomRent               10 13232    5473.99    7333.12    4000    4383.33
## StarRating             11 13232       3.46       0.76       3       3.40
## Airport                12 13232      21.16      22.76      15      16.39
## HotelAddress*          13 13232    1202.53     582.17    1261    1233.25
## HotelPincode           14 13232  397430.26  259837.50  395003  388540.47
## HotelDescription*      15 13224     581.34     363.26     567     575.37
## FreeWifi               16 13232       0.93       0.26       1       1.00
## FreeBreakfast          17 13232       0.65       0.48       1       0.69
## HotelCapacity          18 13232      62.51      76.66      34      46.03
## HasSwimmingPool        19 13232       0.36       0.48       0       0.32
##                             mad      min      max      range  skew
## CityName*                 11.86      1.0       42       41.0  0.48
## Population           3846498.95   8096.0 12442373 12434277.0  0.68
## CityRank                  11.86      0.0       44       44.0  0.69
## IsMetroCity                0.00      0.0        1        1.0  0.96
## IsTouristDestination       0.00      0.0        1        1.0 -0.86
## IsWeekend                  0.00      0.0        1        1.0 -0.51
## IsNewYearEve               0.00      0.0        1        1.0  2.28
## Date*                      2.97      1.0       20       19.0 -0.77
## HotelName*               641.97      1.0     1670     1669.0  0.01
## RoomRent                2653.85    299.0   322500   322201.0 16.75
## StarRating                 0.74      0.0        5        5.0  0.48
## Airport                   11.12      0.2      124      123.8  2.73
## HotelAddress*            668.65      1.0     2108     2107.0 -0.37
## HotelPincode          257975.37 100025.0  7000157  6900132.0  9.99
## HotelDescription*        472.95      1.0     1226     1225.0  0.11
## FreeWifi                   0.00      0.0        1        1.0 -3.25
## FreeBreakfast              0.00      0.0        1        1.0 -0.62
## HotelCapacity             28.17      0.0      600      600.0  2.95
## HasSwimmingPool            0.00      0.0        1        1.0  0.60
##                      kurtosis       se
## CityName*               -0.88     0.10
## Population              -1.08 37019.65
## CityRank                -0.76     0.12
## IsMetroCity             -1.08     0.00
## IsTouristDestination    -1.26     0.00
## IsWeekend               -1.74     0.00
## IsNewYearEve             3.18     0.00
## Date*                    1.92     0.02
## HotelName*              -1.25     4.24
## RoomRent               582.06    63.75
## StarRating               0.25     0.01
## Airport                  7.89     0.20
## HotelAddress*           -0.88     5.06
## HotelPincode           249.76  2258.86
## HotelDescription*       -1.25     3.16
## FreeWifi                 8.57     0.00
## FreeBreakfast           -1.61     0.00
## HotelCapacity           11.39     0.67
## HasSwimmingPool         -1.64     0.00
table(hotelpricing$CityName)
## 
##             Agra        Ahmedabad         Amritsar        Bangalore 
##              432              424              136              656 
##      Bhubaneswar       Chandigarh          Chennai       Darjeeling 
##              120              336              416              136 
##            Delhi          Gangtok              Goa         Guwahati 
##             2048              128              624               48 
##         Haridwar        Hyderabad           Indore           Jaipur 
##               48              536              160              768 
##        Jaisalmer          Jodhpur           Kanpur            Kochi 
##              264              224               16              608 
##          Kolkata          Lucknow          Madurai           Manali 
##              512              128              112              288 
##        Mangalore           Mumbai           Munnar           Mysore 
##              104              712              328              160 
##         Nainital             Ooty        Panchkula             Pune 
##              144              136               64              600 
##             Puri           Rajkot        Rishikesh           Shimla 
##               56              128               88              280 
##         Srinagar            Surat Thiruvanthipuram         Thrissur 
##               40               80              392               32 
##          Udaipur         Varanasi 
##              456              264
table(hotelpricing$IsWeekend)
## 
##    0    1 
## 4991 8241
table(hotelpricing$IsNewYearEve)
## 
##     0     1 
## 11586  1646
table(hotelpricing$FreeWifi)
## 
##     0     1 
##   981 12251
table(hotelpricing$FreeBreakfast)
## 
##    0    1 
## 4643 8589
table(hotelpricing$HasSwimmingPool)
## 
##    0    1 
## 8524 4708
hist(hotelpricing$StarRating, main = "Star Rating Distribution",col="Blue",xlab = "Stars")

hist(hotelpricing$Airport, main = "Distance to the nearest Airport",col = "Green", xlab = "Distance to the nearest Airport")

hist(hotelpricing$HotelCapacity, main = "Capacity of hotels",col = "Sky Blue", xlab = "Capacity of hotels")

Contigency Table

aggregate(hotelpricing$RoomRent, by=list(Weekend = hotelpricing$IsWeekend, Newyear_Serve = hotelpricing$IsNewYearEve), mean)
##   Weekend Newyear_Serve        x
## 1       0             0 5429.473
## 2       1             0 5320.820
## 3       0             1 8829.500
## 4       1             1 6219.655
aggregate(hotelpricing$RoomRent, by=list(Metro_City = hotelpricing$IsMetroCity, Tourist_Place = hotelpricing$IsTouristDestination), mean)
##   Metro_City Tourist_Place        x
## 1          0             0 4006.435
## 2          1             0 4646.136
## 3          0             1 6755.728
## 4          1             1 4706.608
aggregate(hotelpricing$RoomRent, by=list(Free_Wifi = hotelpricing$FreeWifi, Free_Breakfast = hotelpricing$FreeBreakfast, SwimmingPool = hotelpricing$HasSwimmingPool), mean)
##   Free_Wifi Free_Breakfast SwimmingPool        x
## 1         0              0            0 3538.085
## 2         1              0            0 3148.628
## 3         0              1            0 5636.617
## 4         1              1            0 3984.457
## 5         0              0            1 7378.590
## 6         1              0            1 9530.906
## 7         0              1            1 5207.000
## 8         1              1            1 8246.284

Analyzing effect of various variable on Room Rent

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplotMatrix(formula = ~ RoomRent + IsWeekend + IsNewYearEve, data = hotelpricing, pch = 16)
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

#Correlation matrix of Room Rent with IsWeekend and IsnewyearEve
cor(hotelpricing$RoomRent, hotelpricing[ ,c("IsWeekend","IsNewYearEve")])
##        IsWeekend IsNewYearEve
## [1,] 0.004580134   0.03849123
library(corrgram)
corrgram(hotelpricing[c("RoomRent","IsWeekend","IsNewYearEve")], upper.panel = panel.pie)

aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsWeekend),mean)
##   Group.1        x
## 1       0 5430.835
## 2       1 5500.129
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsWeekend, ylim = c(0,30000), col = "blue")

corrgram(hotelpricing[c("RoomRent","IsWeekend")], upper.panel = panel.pie, lower.panel = panel.cor)

#T-Test
#Null Hypothesis - Their is no difference between Room rent on other variables
t.test(hotelpricing$RoomRent ~ hotelpricing$IsWeekend)
## 
##  Welch Two Sample t-test
## 
## data:  hotelpricing$RoomRent by hotelpricing$IsWeekend
## t = -0.51853, df = 9999.4, p-value = 0.6041
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -331.2427  192.6559
## sample estimates:
## mean in group 0 mean in group 1 
##        5430.835        5500.129
#The P-Value = 0.6 (which is > 0.05), we fail to reject the Null Hypothesis.

aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsNewYearEve),mean)
##   Group.1        x
## 1       0 5367.606
## 2       1 6222.826
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsNewYearEve, col = "grey")

corrgram(hotelpricing[c("RoomRent","IsNewYearEve")], upper.panel = panel.pie, lower.panel = panel.cor)

boxplot(hotelpricing$RoomRent ~ hotelpricing$IsNewYearEve, ylim = c(0,30000), col = "yellow")

#T-test
#Null Hypothesis -> Their is no difference between the Room Rent on new year's eve and on other days.
t.test(hotelpricing$RoomRent ~ hotelpricing$IsNewYearEve)
## 
##  Welch Two Sample t-test
## 
## data:  hotelpricing$RoomRent by hotelpricing$IsNewYearEve
## t = -4.1793, df = 2065, p-value = 3.046e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1256.5297  -453.9099
## sample estimates:
## mean in group 0 mean in group 1 
##        5367.606        6222.826
#The P-Value = 3.046e-05 (which is <0.05) 

aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsNewYearEve),mean)
##   Group.1        x
## 1       0 5367.606
## 2       1 6222.826
#Making a linear model
model1 <- lm(RoomRent ~ IsWeekend + IsNewYearEve, data = hotelpricing)
summary(model1)
## 
## Call:
## lm(formula = RoomRent ~ IsWeekend + IsNewYearEve, data = hotelpricing)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -5874  -3031  -1436    808 317180 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    5430.5      103.7  52.353  < 2e-16 ***
## IsWeekend      -110.4      137.4  -0.803    0.422    
## IsNewYearEve    902.6      201.8   4.472 7.82e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7328 on 13229 degrees of freedom
## Multiple R-squared:  0.00153,    Adjusted R-squared:  0.001379 
## F-statistic: 10.14 on 2 and 13229 DF,  p-value: 3.987e-05
#Effect of External factors on Room Rent
scatterplotMatrix(formula = ~ RoomRent + CityRank + IsMetroCity + IsTouristDestination, data = hotelpricing, pch = 16)

cor(hotelpricing$RoomRent, hotelpricing[ ,c("CityRank","IsMetroCity","IsTouristDestination")])
##        CityRank IsMetroCity IsTouristDestination
## [1,] 0.09398553 -0.06683977             0.122503
corrgram(hotelpricing[c("RoomRent","CityRank","IsMetroCity","IsTouristDestination")], upper.panel = panel.pie)

aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsMetroCity),mean)
##   Group.1        x
## 1       0 5782.794
## 2       1 4696.073
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsMetroCity, col = "purple")

corrgram(hotelpricing[c("RoomRent","IsMetroCity")], upper.panel = panel.pie,lower.panel = panel.cor)

boxplot(hotelpricing$RoomRent ~ hotelpricing$IsMetroCity, ylim = c(0,15000),col = "green")

#T-Test
#Null Hypothesis - There is no difference between the Room Rent of Metro Cities and other cities
t.test(hotelpricing$RoomRent ~ hotelpricing$IsMetroCity)
## 
##  Welch Two Sample t-test
## 
## data:  hotelpricing$RoomRent by hotelpricing$IsMetroCity
## t = 10.721, df = 13224, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   888.0308 1285.4102
## sample estimates:
## mean in group 0 mean in group 1 
##        5782.794        4696.073
#The P-Value = 2.2e-16 (<0.05) Therefore we reject the Null Hypothesis.

aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsMetroCity),mean)
##   Group.1        x
## 1       0 5782.794
## 2       1 4696.073
#Room rents based on Tourist Destination
aggregate(hotelpricing$RoomRent, by=list(hotelpricing$IsTouristDestination),mean)
##   Group.1        x
## 1       0 4111.003
## 2       1 6066.024
boxplot(hotelpricing$RoomRent ~ hotelpricing$IsTouristDestination, col = "Pink")

corrgram(hotelpricing[c("RoomRent","IsTouristDestination")], upper.panel = panel.pie, lower.panel = panel.cor)

boxplot(hotelpricing$RoomRent ~ hotelpricing$IsTouristDestination, ylim = c(0,30000), col = "green")

#T-Test
#Null Hypothesis - There is no Difference between the Room Rent of Tourist Destination Cities and other Cities.
t.test(hotelpricing$RoomRent ~ hotelpricing$IsTouristDestination)
## 
##  Welch Two Sample t-test
## 
## data:  hotelpricing$RoomRent by hotelpricing$IsTouristDestination
## t = -19.449, df = 12888, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2152.059 -1757.983
## sample estimates:
## mean in group 0 mean in group 1 
##        4111.003        6066.024
#The P-Value = 2.2e-16 (<0.05), therefore we reject the Null Hupothesis.

#Linear Model
model2 <- lm(RoomRent ~ CityRank + IsMetroCity + IsTouristDestination, data = hotelpricing)
summary(model2)
## 
## Call:
## lm(formula = RoomRent ~ CityRank + IsMetroCity + IsTouristDestination, 
##     data = hotelpricing)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6239  -2875  -1285   1052 315988 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           4309.432    140.413  30.691  < 2e-16 ***
## CityRank                 3.627      6.393   0.567     0.57    
## IsMetroCity          -1415.327    186.746  -7.579 3.72e-14 ***
## IsTouristDestination  2170.105    157.659  13.765  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7249 on 13228 degrees of freedom
## Multiple R-squared:  0.0231, Adjusted R-squared:  0.02288 
## F-statistic: 104.3 on 3 and 13228 DF,  p-value: < 2.2e-16
#Effect of internal factors on room rents

scatterplotMatrix(formula = ~ RoomRent + StarRating + Airport + FreeWifi + FreeBreakfast + HotelCapacity + HasSwimmingPool, data = hotelpricing)
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit positive part of the spread
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

#Correlation matrix of Room Rent with Internal Factors
cor(hotelpricing$RoomRent, hotelpricing[,c("StarRating", "Airport", "FreeWifi", "FreeBreakfast",  "HotelCapacity",  "HasSwimmingPool")])
##      StarRating    Airport    FreeWifi FreeBreakfast HotelCapacity
## [1,]  0.3693734 0.04965324 0.003627002   -0.01000637     0.1578733
##      HasSwimmingPool
## [1,]       0.3116577
corrgram(hotelpricing[c("RoomRent", "StarRating", "Airport", "FreeWifi", "FreeBreakfast",  "HotelCapacity",  "HasSwimmingPool")], upper.panel = panel.pie)

aggregate(hotelpricing$RoomRent, by=list(Ratings = hotelpricing$StarRating), mean)
##    Ratings         x
## 1      0.0  7237.125
## 2      1.0   686.625
## 3      2.0  2783.166
## 4      2.5  2520.816
## 5      3.0  3694.811
## 6      3.2 15937.500
## 7      3.3  2841.062
## 8      3.4 23437.500
## 9      3.5  4843.346
## 10     3.6  7769.500
## 11     3.7  6701.958
## 12     3.8  5400.062
## 13     3.9 13062.750
## 14     4.0  6393.105
## 15     4.1 19075.000
## 16     4.3  7423.125
## 17     4.4  5563.500
## 18     4.5  8699.920
## 19     4.7 10125.000
## 20     4.8 46752.812
## 21     5.0 12398.221
boxplot(hotelpricing$RoomRent ~ hotelpricing$StarRating, ylim = c(0,50000), col = "grey")

plot(hotelpricing$RoomRent, hotelpricing$StarRating, xlab = "Room Rents", ylab = "Star Ratings", main = "Plot of ratings vs rent")

#Testing Correlation

cor(hotelpricing$RoomRent , hotelpricing$StarRating)
## [1] 0.3693734
corrgram(hotelpricing[c("RoomRent","StarRating")], upper.panel = panel.pie, lower.panel = panel.cor)

cor.test(hotelpricing$RoomRent , hotelpricing$StarRating)
## 
##  Pearson's product-moment correlation
## 
## data:  hotelpricing$RoomRent and hotelpricing$StarRating
## t = 45.719, df = 13230, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3545660 0.3839956
## sample estimates:
##       cor 
## 0.3693734
#Distance from Airport and Room Rent
plot(hotelpricing$RoomRent, hotelpricing$Airport)

cor(hotelpricing$RoomRent, hotelpricing$Airport)
## [1] 0.04965324
corrgram(hotelpricing[c("RoomRent","Airport")], upper.panel = panel.pie, lower.panel = panel.cor)

cor.test(hotelpricing$RoomRent, hotelpricing$Airport)
## 
##  Pearson's product-moment correlation
## 
## data:  hotelpricing$RoomRent and hotelpricing$Airport
## t = 5.7183, df = 13230, p-value = 1.099e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03264192 0.06663581
## sample estimates:
##        cor 
## 0.04965324
#Free Wifi and Room Rents
aggregate(hotelpricing$RoomRent, by=list(freewifi = hotelpricing$FreeWifi), mean)
##   freewifi        x
## 1        0 5380.004
## 2        1 5481.518
boxplot(hotelpricing$RoomRent ~ hotelpricing$FreeWifi, ylim = c(0,30000))

corrgram(hotelpricing[c("RoomRent","FreeWifi")], upper.panel = panel.pie, lower.panel = panel.cor)

#Null Hypothesis: Their is no difference in the means of room Rent where free wifi is available or not
t.test(hotelpricing$RoomRent~ hotelpricing$FreeWifi)
## 
##  Welch Two Sample t-test
## 
## data:  hotelpricing$RoomRent by hotelpricing$FreeWifi
## t = -0.76847, df = 1804.7, p-value = 0.4423
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -360.5977  157.5701
## sample estimates:
## mean in group 0 mean in group 1 
##        5380.004        5481.518
#As p-value = 0.44 (>0.05) so we fail to reject the Null hypothesis. 


#Free Breakfast and Room Rent
aggregate(hotelpricing$RoomRent, by=list(freebreakfast = hotelpricing$FreeBreakfast), mean)
##   freebreakfast        x
## 1             0 5573.790
## 2             1 5420.044
boxplot(hotelpricing$RoomRent ~ hotelpricing$FreeBreakfast, ylim = c(0,30000))

corrgram(hotelpricing[c("RoomRent","FreeBreakfast")], upper.panel = panel.pie, lower.panel = panel.cor)

#T-Testing
#Null Hypothesis: There is no difference in the means of Room Rent where free Breakfast is available or not

t.test(hotelpricing$RoomRent~ hotelpricing$FreeBreakfast)
## 
##  Welch Two Sample t-test
## 
## data:  hotelpricing$RoomRent by hotelpricing$FreeBreakfast
## t = 0.98095, df = 6212.3, p-value = 0.3267
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -153.5017  460.9935
## sample estimates:
## mean in group 0 mean in group 1 
##        5573.790        5420.044
#capacity of the hotel and Room Rent
plot(hotelpricing$HotelCapacity, hotelpricing$RoomRent, xlab = "Hotel Capacity", ylab = "Room Rent")

cor(hotelpricing$RoomRent, hotelpricing$HotelCapacity)
## [1] 0.1578733
corrgram(hotelpricing[c("RoomRent","HotelCapacity")], upper.panel = panel.pie, lower.panel = panel.cor)

cor.test(hotelpricing$RoomRent, hotelpricing$HotelCapacity)
## 
##  Pearson's product-moment correlation
## 
## data:  hotelpricing$RoomRent and hotelpricing$HotelCapacity
## t = 18.389, df = 13230, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1412142 0.1744430
## sample estimates:
##       cor 
## 0.1578733
#Sincesince the p-value <0.05, we reject the Null hypothesis.



#Effect of swimming pool of Room Rent
aggregate(hotelpricing$RoomRent, by=list(HasSwimmingPool = hotelpricing$HasSwimmingPool), mean)
##   HasSwimmingPool        x
## 1               0 3775.566
## 2               1 8549.052
boxplot(hotelpricing$RoomRent ~ hotelpricing$HasSwimmingPool, ylim = c(0,15000), col = "blue")

corrgram(hotelpricing[c("RoomRent","HasSwimmingPool")], upper.panel = panel.pie, lower.panel = panel.cor)

#T-Testing
#Null Hypothesis: There is no difference in the means of Room Rent where Swimming Pool is available.
t.test(hotelpricing$RoomRent~ hotelpricing$HasSwimmingPool)
## 
##  Welch Two Sample t-test
## 
## data:  hotelpricing$RoomRent by hotelpricing$HasSwimmingPool
## t = -29.013, df = 5011.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5096.030 -4450.942
## sample estimates:
## mean in group 0 mean in group 1 
##        3775.566        8549.052
#Since the p-value <0.05, so we reject the Null hypothesis.

#Creating a model for Internal Factors
model3= lm(RoomRent ~ StarRating + Airport + FreeWifi + FreeBreakfast + HotelCapacity + HasSwimmingPool, data = hotelpricing)
summary(model3)
## 
## Call:
## lm(formula = RoomRent ~ StarRating + Airport + FreeWifi + FreeBreakfast + 
##     HotelCapacity + HasSwimmingPool, data = hotelpricing)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -10783  -2286   -875    967 310387 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -7455.073    396.215 -18.816   <2e-16 ***
## StarRating       3519.285    111.754  31.491   <2e-16 ***
## Airport            25.627      2.604   9.840   <2e-16 ***
## FreeWifi          227.843    226.177   1.007    0.314    
## FreeBreakfast     -59.313    123.964  -0.478    0.632    
## HotelCapacity     -14.786      1.009 -14.660   <2e-16 ***
## HasSwimmingPool  2714.146    158.681  17.104   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6687 on 13225 degrees of freedom
## Multiple R-squared:  0.1689, Adjusted R-squared:  0.1685 
## F-statistic: 447.9 on 6 and 13225 DF,  p-value: < 2.2e-16
model4 <- lm(RoomRent ~ IsNewYearEve + IsMetroCity + IsTouristDestination + StarRating + Airport + HotelCapacity + HasSwimmingPool, data = hotelpricing)
summary(model4)
## 
## Call:
## lm(formula = RoomRent ~ IsNewYearEve + IsMetroCity + IsTouristDestination + 
##     StarRating + Airport + HotelCapacity + HasSwimmingPool, data = hotelpricing)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -11621  -2342   -706   1039 309463 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -8362.318    351.306 -23.804  < 2e-16 ***
## IsNewYearEve           843.295    174.085   4.844 1.29e-06 ***
## IsMetroCity          -1502.844    137.569 -10.924  < 2e-16 ***
## IsTouristDestination  2074.969    133.499  15.543  < 2e-16 ***
## StarRating            3583.270    110.317  32.482  < 2e-16 ***
## Airport                 11.057      2.699   4.096 4.22e-05 ***
## HotelCapacity          -11.252      1.020 -11.030  < 2e-16 ***
## HasSwimmingPool       2211.800    159.259  13.888  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6609 on 13224 degrees of freedom
## Multiple R-squared:  0.1882, Adjusted R-squared:  0.1878 
## F-statistic:   438 on 7 and 13224 DF,  p-value: < 2.2e-16

After analyzing all the variables, we can say that there are 7 variables which are affecting the price of the Room of the hotels.

StarRating IsTouristDestination HotelCapacity HasSwimmingPool IsNewYearEve Airport IsMetroCity

End Of Project