Project Title : Hotel Room Pricing In The Indian Market

Name : Palash Badjatya

E-mail : palash.badjatya01@gmail.com

College : Acropolis Technical Campus

Introduction

This project is about the hotel room pricing in the indian market over different cities. It tells you about what affects the pricing system of a hotel room. Like whether the city is a tourist place or whetherit is a weekend that affets the pricing of a hotel room.There are many other things which also affects the pricing of a hotel room like breakfast, swimming pool, any special occasion(like new years eve), etc.

Overview of the study

The objective of this project is to identify the factors that matter the most. This dataset consistof data from different hoels located in different cities.

Data

The data cosist of many variables like the city name, population of that city, whether the city is tourist place or not, whether the city is metro city or not, what is the room rent, is it a 5star hotel or not, hotel adress and description, whether there is a near long weekend, what is capacity of the hotel, free breakfast and wifi is included or not, is there any swimming pool for guests or not

hotels.df <- read.csv(paste("Cities42.csv", sep=""))
attach(hotels.df)
head(hotels.df)
##   CityName Population CityRank IsMetroCity IsTouristDestination IsWeekend
## 1   Mumbai   12442373        0           1                    1         1
## 2   Mumbai   12442373        0           1                    1         0
## 3   Mumbai   12442373        0           1                    1         1
## 4   Mumbai   12442373        0           1                    1         1
## 5   Mumbai   12442373        0           1                    1         0
## 6   Mumbai   12442373        0           1                    1         1
##   IsNewYearEve        Date      HotelName RoomRent StarRating Airport
## 1            0 Dec 18 2016 Vivanta by Taj    12375          5      21
## 2            0 Dec 21 2016 Vivanta by Taj    10250          5      21
## 3            0 Dec 24 2016 Vivanta by Taj     9900          5      21
## 4            0 Dec 25 2016 Vivanta by Taj    10350          5      21
## 5            0 Dec 28 2016 Vivanta by Taj    12000          5      21
## 6            1 Dec 31 2016 Vivanta by Taj    11475          5      21
##                                   HotelAddress HotelPincode
## 1 90 Cuffe Parade, Colaba, Mumbai, Maharashtra       400005
## 2 91 Cuffe Parade, Colaba, Mumbai, Maharashtra       400006
## 3 92 Cuffe Parade, Colaba, Mumbai, Maharashtra       400007
## 4 93 Cuffe Parade, Colaba, Mumbai, Maharashtra       400008
## 5 94 Cuffe Parade, Colaba, Mumbai, Maharashtra       400009
## 6 95 Cuffe Parade, Colaba, Mumbai, Maharashtra       400010
##                               HotelDescription FreeWifi FreeBreakfast
## 1 Luxury hotel with spa, near Gateway of India        1             0
## 2 Luxury hotel with spa, near Gateway of India        1             0
## 3 Luxury hotel with spa, near Gateway of India        1             0
## 4 Luxury hotel with spa, near Gateway of India        1             0
## 5 Luxury hotel with spa, near Gateway of India        1             0
## 6 Luxury hotel with spa, near Gateway of India        1             0
##   HotelCapacity HasSwimmingPool
## 1           287               1
## 2           287               1
## 3           287               1
## 4           287               1
## 5           287               1
## 6           287               1
dim(hotels.df)
## [1] 13232    19
library(psych)
describe(hotels.df)
##                      vars     n       mean         sd  median    trimmed
## CityName*               1 13232      18.07      11.72      16      17.29
## Population              2 13232 4416836.87 4258386.00 3046163 4040816.22
## CityRank                3 13232      14.83      13.51       9      13.30
## IsMetroCity             4 13232       0.28       0.45       0       0.23
## IsTouristDestination    5 13232       0.70       0.46       1       0.75
## IsWeekend               6 13232       0.62       0.48       1       0.65
## IsNewYearEve            7 13232       0.12       0.33       0       0.03
## Date*                   8 13232      14.30       2.69      14      14.39
## HotelName*              9 13232     841.19     488.16     827     841.18
## RoomRent               10 13232    5473.99    7333.12    4000    4383.33
## StarRating             11 13232       3.46       0.76       3       3.40
## Airport                12 13232      21.16      22.76      15      16.39
## HotelAddress*          13 13232    1202.53     582.17    1261    1233.25
## HotelPincode           14 13232  397430.26  259837.50  395003  388540.47
## HotelDescription*      15 13224     581.34     363.26     567     575.37
## FreeWifi               16 13232       0.93       0.26       1       1.00
## FreeBreakfast          17 13232       0.65       0.48       1       0.69
## HotelCapacity          18 13232      62.51      76.66      34      46.03
## HasSwimmingPool        19 13232       0.36       0.48       0       0.32
##                             mad      min      max      range  skew
## CityName*                 11.86      1.0       42       41.0  0.48
## Population           3846498.95   8096.0 12442373 12434277.0  0.68
## CityRank                  11.86      0.0       44       44.0  0.69
## IsMetroCity                0.00      0.0        1        1.0  0.96
## IsTouristDestination       0.00      0.0        1        1.0 -0.86
## IsWeekend                  0.00      0.0        1        1.0 -0.51
## IsNewYearEve               0.00      0.0        1        1.0  2.28
## Date*                      2.97      1.0       20       19.0 -0.77
## HotelName*               641.97      1.0     1670     1669.0  0.01
## RoomRent                2653.85    299.0   322500   322201.0 16.75
## StarRating                 0.74      0.0        5        5.0  0.48
## Airport                   11.12      0.2      124      123.8  2.73
## HotelAddress*            668.65      1.0     2108     2107.0 -0.37
## HotelPincode          257975.37 100025.0  7000157  6900132.0  9.99
## HotelDescription*        472.95      1.0     1226     1225.0  0.11
## FreeWifi                   0.00      0.0        1        1.0 -3.25
## FreeBreakfast              0.00      0.0        1        1.0 -0.62
## HotelCapacity             28.17      0.0      600      600.0  2.95
## HasSwimmingPool            0.00      0.0        1        1.0  0.60
##                      kurtosis       se
## CityName*               -0.88     0.10
## Population              -1.08 37019.65
## CityRank                -0.76     0.12
## IsMetroCity             -1.08     0.00
## IsTouristDestination    -1.26     0.00
## IsWeekend               -1.74     0.00
## IsNewYearEve             3.18     0.00
## Date*                    1.92     0.02
## HotelName*              -1.25     4.24
## RoomRent               582.06    63.75
## StarRating               0.25     0.01
## Airport                  7.89     0.20
## HotelAddress*           -0.88     5.06
## HotelPincode           249.76  2258.86
## HotelDescription*       -1.25     3.16
## FreeWifi                 8.57     0.00
## FreeBreakfast           -1.61     0.00
## HotelCapacity           11.39     0.67
## HasSwimmingPool         -1.64     0.00

One way contingency table

mytable <- with(hotels.df, table(StarRating))
mytable
## StarRating
##    0    1    2  2.5    3  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9    4  4.1 
##   16    8  440  632 5953    8   16    8 1752    8   24   16   32 2463   24 
##  4.3  4.4  4.5  4.7  4.8    5 
##   16    8  376    8   16 1408
mytable1 <- with(hotels.df, table(IsMetroCity))
mytable1
## IsMetroCity
##    0    1 
## 9472 3760
mytable2 <- with(hotels.df, table(FreeBreakfast))
mytable2
## FreeBreakfast
##    0    1 
## 4643 8589
mytable3 <- with(hotels.df, table(CityRank))
mytable3
## CityRank
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
##  712 2048  656  416  536  424  512   80  600  768   32  128   16  136  160 
##   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30 
##  432  448  624  128  264   40  224  336  392   48  160  120  272  104  456 
##   32   33   34   35   36   37   38   39   40   42   43   44 
##   48   56  280   64  136   88  128  136  264  144  328  288
mytable4 <- with(hotels.df, table(IsTouristDestination))
mytable4
## IsTouristDestination
##    0    1 
## 4007 9225

Two way contingency table

mytable <- xtabs(~ FreeBreakfast+StarRating, data=hotels.df)
mytable
##              StarRating
## FreeBreakfast    0    1    2  2.5    3  3.2  3.3  3.4  3.5  3.6  3.7  3.8
##             0   16    0  216  296 1789    0    8    0  661    8    0    8
##             1    0    8  224  336 4164    8    8    8 1091    0   24    8
##              StarRating
## FreeBreakfast  3.9    4  4.1  4.3  4.4  4.5  4.7  4.8    5
##             0   16  783    0   16    0  224    8    0  594
##             1   16 1680   24    0    8  152    0   16  814
mytable1 <- xtabs(~ IsMetroCity+StarRating, data=hotels.df)
mytable1
##            StarRating
## IsMetroCity    0    1    2  2.5    3  3.2  3.3  3.4  3.5  3.6  3.7  3.8
##           0   16    8  344  456 4336    8   16    8 1312    0   24   16
##           1    0    0   96  176 1617    0    0    0  440    8    0    0
##            StarRating
## IsMetroCity  3.9    4  4.1  4.3  4.4  4.5  4.7  4.8    5
##           0   32 1696   24   16    8  288    8   16  840
##           1    0  767    0    0    0   88    0    0  568
mytable2 <- xtabs(~ IsMetroCity+IsTouristDestination, data=hotels.df)
mytable2
##            IsTouristDestination
## IsMetroCity    0    1
##           0 3352 6120
##           1  655 3105

Boxplots

boxplot(hotels.df$CityRank  , horizontal =TRUE,main="Rank of the cities",col = "lightblue" )

boxplot(hotels.df$Population  , horizontal =TRUE, main="Population",col = "yellow" )

boxplot(hotels.df$StarRating ~ hotels.df$FreeBreakfast, horizontal=TRUE,
           ylab="breakfast avalability", xlab="Star ratings", las=1,
           main="Analysis of star rating and breakfast avalability",
           col=c("pink","yellow")
           )

boxplot(hotels.df$RoomRent ~ hotels.df$IsMetroCity, horizontal=TRUE,
           ylab="City(metro=1,other=0)", xlab="Room rent", las=1,
           main="Analysis of type of city and room rent of hotels",
           col=c("red","blue")
           )

Histograms

hist(hotels.df$RoomRent, 
      main="Analysis of room rents of hotels",
      xlab="Rents of room", ylab="Relative frequency",
      breaks=30, col="lightblue", freq=FALSE)

hist(hotels.df$StarRating, 
      main="Analysis of star ratings of hotels",
      xlab="Star ratings", ylab="Relative frequency",
      breaks=30, col="red", freq=FALSE)

hist(hotels.df$Population, main= "Population" ,xlab="Population" ,col = "peachpuff")

hist(hotels.df$HotelCapacity, main = "Capacity of hotels", xlab = "Hotel Capacity", col = "blue")

Scatterplots

library(car) 
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(RoomRent~StarRating,     data=hotels.df,
            spread=FALSE, smoother.args=list(lty=2),
            main="Scatter plot of Star Rating vs Room rent",
            ylab="Room Rent",
            xlab="Star Rating")

scatterplotMatrix(formula = ~ RoomRent + IsWeekend + IsNewYearEve +Airport , data = hotels.df, pch = 16)
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

library(car)
scatterplot(x = hotels.df$Population , y = hotels.df$CityRank, main="Population Vs City Rank " , xlab="Population", ylab="City rank")

####Correlation Matrix

cor(hotels.df[, c(2,3,4,5,6,7,10,11,18)])
##                         Population      CityRank   IsMetroCity
## Population            1.0000000000 -0.8353204432  0.7712260105
## CityRank             -0.8353204432  1.0000000000 -0.5643937903
## IsMetroCity           0.7712260105 -0.5643937903  1.0000000000
## IsTouristDestination -0.0482029722  0.2807134520  0.1763717063
## IsWeekend             0.0115926802 -0.0072564766  0.0018118005
## IsNewYearEve          0.0007332482 -0.0006326444  0.0006464753
## RoomRent             -0.0887280632  0.0939855292 -0.0668397705
## StarRating            0.1341365933 -0.1333810133  0.0776028661
## HotelCapacity         0.2599830516 -0.2561197059  0.1871502153
##                      IsTouristDestination    IsWeekend  IsNewYearEve
## Population                   -0.048202972  0.011592680  0.0007332482
## CityRank                      0.280713452 -0.007256477 -0.0006326444
## IsMetroCity                   0.176371706  0.001811801  0.0006464753
## IsTouristDestination          1.000000000 -0.019481101 -0.0022663884
## IsWeekend                    -0.019481101  1.000000000  0.2923820508
## IsNewYearEve                 -0.002266388  0.292382051  1.0000000000
## RoomRent                      0.122502963  0.004580134  0.0384912269
## StarRating                   -0.040554998  0.006378436  0.0023608970
## HotelCapacity                -0.094356091  0.006306507  0.0013526790
##                          RoomRent   StarRating HotelCapacity
## Population           -0.088728063  0.134136593   0.259983052
## CityRank              0.093985529 -0.133381013  -0.256119706
## IsMetroCity          -0.066839771  0.077602866   0.187150215
## IsTouristDestination  0.122502963 -0.040554998  -0.094356091
## IsWeekend             0.004580134  0.006378436   0.006306507
## IsNewYearEve          0.038491227  0.002360897   0.001352679
## RoomRent              1.000000000  0.369373425   0.157873308
## StarRating            0.369373425  1.000000000   0.637430337
## HotelCapacity         0.157873308  0.637430337   1.000000000

Corrgrams

library(corrgram)
corrgram(hotels.df, lower.panel = panel.shade, upper.panel = panel.pie, text.panel = panel.txt, main = "Corrgram of all  variables")

Correlation tests

cor.test(hotels.df$RoomRent, hotels.df$StarRating)
## 
##  Pearson's product-moment correlation
## 
## data:  hotels.df$RoomRent and hotels.df$StarRating
## t = 45.719, df = 13230, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3545660 0.3839956
## sample estimates:
##       cor 
## 0.3693734
cor.test(hotels.df$RoomRent, hotels.df$IsMetroCity)
## 
##  Pearson's product-moment correlation
## 
## data:  hotels.df$RoomRent and hotels.df$IsMetroCity
## t = -7.7053, df = 13230, p-value = 1.399e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08378329 -0.04985761
## sample estimates:
##         cor 
## -0.06683977
cor.test(hotels.df$RoomRent, hotels.df$CityRank)
## 
##  Pearson's product-moment correlation
## 
## data:  hotels.df$RoomRent and hotels.df$CityRank
## t = 10.858, df = 13230, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.07707001 0.11084696
## sample estimates:
##        cor 
## 0.09398553
cor.test(hotels.df$RoomRent, hotels.df$IsNewYearEve)
## 
##  Pearson's product-moment correlation
## 
## data:  hotels.df$RoomRent and hotels.df$IsNewYearEve
## t = 4.4306, df = 13230, p-value = 9.472e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.02146637 0.05549377
## sample estimates:
##        cor 
## 0.03849123

T test

Null Hypothesis - Their is no Difference between the Room Rent on new year’s eve and on other days

t.test(hotels.df$RoomRent ~ hotels.df$IsNewYearEve)
## 
##  Welch Two Sample t-test
## 
## data:  hotels.df$RoomRent by hotels.df$IsNewYearEve
## t = -4.1793, df = 2065, p-value = 3.046e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1256.5297  -453.9099
## sample estimates:
## mean in group 0 mean in group 1 
##        5367.606        6222.826

P-Value = 3.046e-05 (<0.05) Which is small enough for Rejecting the Null Hupothesis. Hence there is significant difference between the Room Rent on new year’s eve and on other days

Null Hypothesis - Their is no Difference between the Room Rent where wifi is free and other rooms.

t.test(hotels.df$RoomRent ~ hotels.df$FreeWifi)
## 
##  Welch Two Sample t-test
## 
## data:  hotels.df$RoomRent by hotels.df$FreeWifi
## t = -0.76847, df = 1804.7, p-value = 0.4423
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -360.5977  157.5701
## sample estimates:
## mean in group 0 mean in group 1 
##        5380.004        5481.518

As we can see the P-Value = 0.44 (>0.05) , We Fail To reject the Null Hypothesis. It Shows that Their is No Significant Difference Between the Room Rent where wifi is free and other rooms. Null Hypothesis: Their is no difference in the means of room Rent where free Breakfast is available or not