Introduction

The hotel industry in India thrives largely due to the growth in tourism and travel. Due to the increase in tourism with rising foreign and domestic tourists, hotel sector is bound to grow. There is an emergence of budget hotels in India to cater to the majority of the population who seek affordable stay. International companies are also increasingly looking at setting up such hotels. Imbalance in increase in tourists both domestic and foreign not been supported with equal number of rooms is a latent source of opportunity for growth.

A hotel has different types of rooms on the basis of room size, location, view, décor, furnishings, amenities, etc. Therefore the front office generally has more than one room rate category depending on the types of rooms.

Overview

We are concerned with pricing startegy of hotel industry based on the evidences collected from the 42 different cities dependent on several parameters. Through this we try to establish any sort of influence of TouristDestination on Roomrents of the hotels. The data is collected from 42 cities for 8 different dates. We will do empirical study based on the dataset available with us.

Read your dataset in R and visualize the length and breadth of your dataset.

hotel <- read.csv(paste("Cities42.csv", sep=""))
View(hotel)
dim(hotel)
## [1] 13232    20

Create a descriptive statistics (min, max, median etc) of each variable

library(psych)
describe(hotel)
##                      vars     n       mean         sd    median    trimmed
## X                       1 13232    6616.50    3819.89    6616.5    6616.50
## CityName*               2 13232      18.07      11.72      16.0      17.29
## Population              3 13232 4416836.87 4258386.00 3046163.0 4040816.22
## CityRank                4 13232      14.83      13.51       9.0      13.30
## IsMetroCity             5 13232       0.28       0.45       0.0       0.23
## IsTouristDestination    6 13232       0.70       0.46       1.0       0.75
## IsWeekend               7 13232       0.62       0.48       1.0       0.65
## IsNewYearEve            8 13232       0.12       0.33       0.0       0.03
## Date*                   9 13232      14.26       2.82      14.0      14.39
## HotelName*             10 13232     841.19     488.16     827.0     841.18
## RoomRent               11 13232    5473.99    7333.12    4000.0    4383.33
## StarRating             12 13232       3.46       0.76       3.0       3.40
## Airport                13 13232      21.16      22.76      15.0      16.39
## HotelAddress*          14 13232    1202.53     582.17    1261.0    1233.25
## HotelPincode           15 13232  397430.26  259837.50  395003.0  388540.47
## HotelDescription*      16 13224     581.34     363.26     567.0     575.37
## FreeWifi               17 13232       0.93       0.26       1.0       1.00
## FreeBreakfast          18 13232       0.65       0.48       1.0       0.69
## HotelCapacity          19 13232      62.51      76.66      34.0      46.03
## HasSwimmingPool        20 13232       0.36       0.48       0.0       0.32
##                             mad      min      max      range  skew
## X                       4904.44      1.0    13232    13231.0  0.00
## CityName*                 11.86      1.0       42       41.0  0.48
## Population           3846498.95   8096.0 12442373 12434277.0  0.68
## CityRank                  11.86      0.0       44       44.0  0.69
## IsMetroCity                0.00      0.0        1        1.0  0.96
## IsTouristDestination       0.00      0.0        1        1.0 -0.86
## IsWeekend                  0.00      0.0        1        1.0 -0.51
## IsNewYearEve               0.00      0.0        1        1.0  2.28
## Date*                      2.97      1.0       20       19.0 -1.05
## HotelName*               641.97      1.0     1670     1669.0  0.01
## RoomRent                2653.85    299.0   322500   322201.0 16.75
## StarRating                 0.74      0.0        5        5.0  0.48
## Airport                   11.12      0.2      124      123.8  2.73
## HotelAddress*            668.65      1.0     2108     2107.0 -0.37
## HotelPincode          257975.37 100025.0  7000157  6900132.0  9.99
## HotelDescription*        472.95      1.0     1226     1225.0  0.11
## FreeWifi                   0.00      0.0        1        1.0 -3.25
## FreeBreakfast              0.00      0.0        1        1.0 -0.62
## HotelCapacity             28.17      0.0      600      600.0  2.95
## HasSwimmingPool            0.00      0.0        1        1.0  0.60
##                      kurtosis       se
## X                       -1.20    33.21
## CityName*               -0.88     0.10
## Population              -1.08 37019.65
## CityRank                -0.76     0.12
## IsMetroCity             -1.08     0.00
## IsTouristDestination    -1.26     0.00
## IsWeekend               -1.74     0.00
## IsNewYearEve             3.18     0.00
## Date*                    2.93     0.02
## HotelName*              -1.25     4.24
## RoomRent               582.06    63.75
## StarRating               0.25     0.01
## Airport                  7.89     0.20
## HotelAddress*           -0.88     5.06
## HotelPincode           249.76  2258.86
## HotelDescription*       -1.25     3.16
## FreeWifi                 8.57     0.00
## FreeBreakfast           -1.61     0.00
## HotelCapacity           11.39     0.67
## HasSwimmingPool         -1.64     0.00

Create one-way contingency tables for the categorical variables in your dataset.

mytable <- with(hotel,table(CityName))
mytable
## CityName
##             Agra        Ahmedabad         Amritsar        Bangalore 
##              432              424              136              656 
##      Bhubaneswar       Chandigarh          Chennai       Darjeeling 
##              120              336              416              136 
##            Delhi          Gangtok              Goa         Guwahati 
##             2048              128              624               48 
##         Haridwar        Hyderabad           Indore           Jaipur 
##               48              536              160              768 
##        Jaisalmer          Jodhpur           Kanpur            Kochi 
##              264              224               16              608 
##          Kolkata          Lucknow          Madurai           Manali 
##              512              128              112              288 
##        Mangalore           Mumbai           Munnar           Mysore 
##              104              712              328              160 
##         Nainital             Ooty        Panchkula             Pune 
##              144              136               64              600 
##             Puri           Rajkot        Rishikesh           Shimla 
##               56              128               88              280 
##         Srinagar            Surat Thiruvanthipuram         Thrissur 
##               40               80              392               32 
##          Udaipur         Varanasi 
##              456              264
mytable2 <- with(hotel,table(FreeBreakfast))
mytable2
## FreeBreakfast
##    0    1 
## 4643 8589
mytable3 <-with(hotel,table(HasSwimmingPool))
mytable3
## HasSwimmingPool
##    0    1 
## 8524 4708
mytable4 <- with(hotel, table(StarRating))
mytable4
## StarRating
##    0    1    2  2.5    3  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9    4  4.1 
##   16    8  440  632 5953    8   16    8 1752    8   24   16   32 2463   24 
##  4.3  4.4  4.5  4.7  4.8    5 
##   16    8  376    8   16 1408
mytable5 <- with(hotel, table(IsMetroCity))
mytable5
## IsMetroCity
##    0    1 
## 9472 3760
mytable6 <- with(hotel, table(IsTouristDestination))
mytable6
## IsTouristDestination
##    0    1 
## 4007 9225

Create two-way contingency tables for the categorical variables in your dataset.

aggregate(hotel$RoomRent, by=list(weekend = hotel$IsWeekend, newyearseve = hotel$IsNewYearEve), mean)
##   weekend newyearseve        x
## 1       0           0 5429.473
## 2       1           0 5320.820
## 3       0           1 8829.500
## 4       1           1 6219.655
aggregate(hotel$RoomRent, by=list(Metrocity = hotel$IsMetroCity, TouristPlace = hotel$IsTouristDestination), mean)
##   Metrocity TouristPlace        x
## 1         0            0 4006.435
## 2         1            0 4646.136
## 3         0            1 6755.728
## 4         1            1 4706.608
aggregate(hotel$RoomRent, by=list(freewifi = hotel$FreeWifi, freeBreakfast = hotel$FreeBreakfast, swimmingPool = hotel$HasSwimmingPool), mean)
##   freewifi freeBreakfast swimmingPool        x
## 1        0             0            0 3538.085
## 2        1             0            0 3148.628
## 3        0             1            0 5636.617
## 4        1             1            0 3984.457
## 5        0             0            1 7378.590
## 6        1             0            1 9530.906
## 7        0             1            1 5207.000
## 8        1             1            1 8246.284

Draw a boxplot of the variables that belong to your study.

boxplot(hotel$StarRating,
        xlab="Star Rating of the hotel",
        main="Box plot of Star Rating of hotel",
        col = "yellow",
        horizontal = TRUE)

boxplot(hotel$Airport,
        xlab="Distance between Hotel and closest major Airport(in km)",
        main="Box plot of Airport Distance of hotel",
        col = "yellow",
        horizontal = TRUE)

boxplot(hotel$HotelCapacity,
        xlab="Hotel Capacity",
        main="Box plot of Hotel Capacity",
        col = "yellow",
        horizontal = TRUE)

Draw Histograms for your suitable data fields.

hist(hotel$StarRating, main = "star Rating Distribution", xlab = "Stars")

hist(hotel$Airport, main = "Distrubtion of distance to the nearest major airport", xlab = "Dist to the nearest major Airport in km")

hist(hotel$HotelCapacity, main = "distribution of capacity of hotels", xlab = "capacity of hotels")

hist(hotel$RoomRent, 
     main="Analysis of room rents of hotels",
     xlab="Rents of room",ylab="Relative frequency",
     xlim = c(0,30000),breaks=30,
     col="green")

hotel$FreeWifi=factor(hotel$FreeWifi, levels=c(0,1), labels=c("No","Yes"))
plot(hotel$FreeWifi,col="yellow",main="Has Wifi?")

hotel$FreeBreakfast=factor(hotel$FreeBreakfast, levels=c(0,1), labels=c("No","Yes"))
plot(hotel$FreeBreakfast,col="blue",main="Has Free Breakfast?")

hotel$HasSwimmingPool=factor(hotel$HasSwimmingPool, levels=c(0,1), labels=c("No","Yes"))
plot(hotel$HasSwimmingPool,col="green",main="Swimming pool?")

Draw suitable plot for your data fields.

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplotMatrix(formula = ~ RoomRent + IsWeekend + IsNewYearEve, data = hotel, pch = 16)
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

#Create a correlation matrix.

cor(hotel$RoomRent, hotel[,c("IsWeekend","IsNewYearEve")])
##        IsWeekend IsNewYearEve
## [1,] 0.004580134   0.03849123

Visualize your correlation matrix using corrgram.

library(corrgram)
corrgram(hotel, order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Hotel Pricing Corrgram")

#Create a scatter plot matrix for your data set.

scatterplotMatrix(formula = ~ RoomRent + StarRating +Airport+ IsWeekend + IsNewYearEve ,
                  data = hotel , 
                  main="Scatter plot matrix of external factors")
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

scatterplotMatrix(formula = ~ RoomRent + FreeWifi + FreeBreakfast + HotelCapacity + HasSwimmingPool, 
                  data = hotel ,
                  main="Scatter plot matrix of internal factors")
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit positive part of the spread

T-tests

Null Hypothesis:Their is no Difference between the Room Rent on new year’s eve and on other days

t.test(RoomRent~IsMetroCity,data=hotel)
## 
##  Welch Two Sample t-test
## 
## data:  RoomRent by IsMetroCity
## t = 10.721, df = 13224, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   888.0308 1285.4102
## sample estimates:
## mean in group 0 mean in group 1 
##        5782.794        4696.073

Since p-value<0.05, we reject the null hypothesis.Therefore , we can conclude that the room rents of Hotels in non-metro cities is more than that of metro cities.

Null Hypothesis:There is no significant difference between the Room Rent of Hotels on normal Eve and New Year’s Eve.

t.test(RoomRent~IsNewYearEve,data=hotel)
## 
##  Welch Two Sample t-test
## 
## data:  RoomRent by IsNewYearEve
## t = -4.1793, df = 2065, p-value = 3.046e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1256.5297  -453.9099
## sample estimates:
## mean in group 0 mean in group 1 
##        5367.606        6222.826

Since p-value<0.05, we reject the null hypothesis. Therefore, we can say that the room rents of Hotels on normal days are cheaper than that on New Year’s Eve.

Regression Analysis

In order to test Hypothesis , we proposed the following model:

RoomRent= b0 + b1StarRating + b2Airport + b3FreeWifi + b4FreeBreakfast + b5* HotelCapacity + b6HasSwimmingPool + b7IsMetroCity + b8IsNewYearEve + b9IsTouristDestination + error

Linearregression  <- lm(RoomRent ~ StarRating + Airport + FreeWifi + FreeBreakfast + HotelCapacity + HasSwimmingPool+IsMetroCity+IsNewYearEve 
+IsTouristDestination, data = hotel)

summary(Linearregression)
## 
## Call:
## lm(formula = RoomRent ~ StarRating + Airport + FreeWifi + FreeBreakfast + 
##     HotelCapacity + HasSwimmingPool + IsMetroCity + IsNewYearEve + 
##     IsTouristDestination, data = hotel)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -11696  -2375   -701   1063 309539 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -8906.418    405.396 -21.970  < 2e-16 ***
## StarRating            3564.570    110.489  32.262  < 2e-16 ***
## Airport                 11.265      2.710   4.157 3.24e-05 ***
## FreeWifiYes            485.597    224.134   2.167   0.0303 *  
## FreeBreakfastYes       182.992    123.296   1.484   0.1378    
## HotelCapacity          -10.990      1.026 -10.714  < 2e-16 ***
## HasSwimmingPoolYes    2227.069    159.327  13.978  < 2e-16 ***
## IsMetroCity          -1548.328    138.527 -11.177  < 2e-16 ***
## IsNewYearEve           844.123    174.046   4.850 1.25e-06 ***
## IsTouristDestination  2113.725    134.336  15.735  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6607 on 13222 degrees of freedom
## Multiple R-squared:  0.1887, Adjusted R-squared:  0.1881 
## F-statistic: 341.7 on 9 and 13222 DF,  p-value: < 2.2e-16

Result

The starRating and Tourist Destination coefficient is positive which will increase the price of the hotels. SwimmingPool facility the internal feature will positively affect the hotel prices. NewYearEve will certainly make the Tourist Destination much more hot favourite which will further esclate the prices.

Conclusion

Factors like StarRating, IsMetroCity, IsTouristDestination and IsNewYearEve affect RoomRent much much significantly. Factors like FreeWifi and FreeBreakfast have a very little effect on Roomrent as their confidence interval is closer to zero. Factors like Distance from Airport and Hotel Capacity are not statistically significant for RoomRent as their confidence interval includes zero.