Use data set MBA Salaries.
1. Analysis of Hotel Prices Data of hotels of 42 cities.(Summarize, plots etc.)
2. Visualization of Data
3. Some T and Chi square tests through data
4. Correlation between dependent and Independent variables
5. Find out which all columns / features impact Price of hotel room
6. Predict the hotel prices with some dummy values.
The data was collected from www.hotels.in in October 2016.
Size: 2523KB 13232 observations of 19 variables:
Attributes:
Notice that the dataset tracks hotel prices on 8 different dates at different hotels across different cities. Please browse the dataset.
Dependent Variable
RoomRent <- Rent for the cheapest room, double occupancy, in Indian Rupees.
Independent Variables
External Factors
Date <- We have hotel room rent data for the following 8 dates for each hotel: {Dec 31, Dec 25, Dec 24, Dec 18, Dec 21, Dec 28, Jan 4, Jan 8} IsWeekend <- We use ‘0’ to indicate week days, ‘1’ to indicate weekend dates (Sat / Sun)
IsNewYearEve <- 1’ for Dec 31, ‘0’ otherwise CityName <- Name of the City where the Hotel is located e.g. Mumbai`
Population <- Population of the City in 2011
CityRank <- Rank order of City by Population (e.g. Mumbai = 0, Delhi = 1, so on)
IsMetroCity <- ‘1’ if CityName is {Mumbai, Delhi, Kolkatta, Chennai}, ‘0’ otherwise
IsTouristDestination <- We use ‘1’ if the city is primarily a tourist destination, ‘0’ otherwise.
Internal Factors Many Hotel Features can influence the RoomRent. The dataset captures some of these internal factors, as explained below.
HotelName <- e.g. Park Hyatt Goa Resort and Spa
StarRating <- e.g. 5
Airport <- Distance between Hotel and closest major Airport
HotelAddress <- e.g. Arrossim Beach, Cansaulim, Goa
HotelPincode <- 403712
HotelDescription <- e.g. 5-star beachfront resort with spa, near Arossim Beach
FreeWifi <- ‘1’ if the hotel offers Free Wifi, ‘0’ otherwise
FreeBreakfast <- ‘1’ if the hotel offers Free Breakfast, ‘0’ otherwise
HotelCapacity <- e.g. 242. (enter ‘0’ if not available)
HasSwimmingPool <- ‘1’ if they have a swimming pool, ‘0’ otherwise
Setup
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(corrgram)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(vcd)
## Loading required package: grid
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
## The following object is masked from 'package:dplyr':
##
## recode
library(corrplot)
library(coefplot)
Functions
detect_outliers <- function(inp, na.rm=TRUE) {
i.qnt <- quantile(inp, probs=c(.25, .75), na.rm=na.rm)
i.max <- 1.5 * IQR(inp, na.rm=na.rm)
otp <- inp
otp[inp < (i.qnt[1] - i.max)] <- NA
otp[inp > (i.qnt[2] + i.max)] <- NA
#inp <- count(inp[is.na(otp)])
sum(is.na(otp))
}
Non_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
Remove_Outliers <- function ( z, na.rm = TRUE){
Out <- Non_outliers(z)
Out <-as.data.frame (Out)
z <- Out$Out[match(z, Out$Out)]
z
}
Graph_Boxplot <- function (input, na.rm = TRUE){
Plot <- ggplot(dfrModel, aes(x="", y=input)) +
geom_boxplot(aes(fill=input), color="green") +
labs(title="Outliers")
Plot
}
Dataset
setwd("D:/Welingkar/My/IL/Project/Hotel Industry/Data")
dfrModel <- read.csv("./Cities42.csv", header=T, stringsAsFactors=F)
intRowCount <- nrow(dfrModel)
head(dfrModel)
## CityName Population CityRank IsMetroCity IsTouristDestination IsWeekend
## 1 Mumbai 12442373 0 1 1 1
## 2 Mumbai 12442373 0 1 1 0
## 3 Mumbai 12442373 0 1 1 1
## 4 Mumbai 12442373 0 1 1 1
## 5 Mumbai 12442373 0 1 1 0
## 6 Mumbai 12442373 0 1 1 1
## IsNewYearEve Date HotelName RoomRent StarRating Airport
## 1 0 Dec 18 2016 Vivanta by Taj 12375 5 21
## 2 0 Dec 21 2016 Vivanta by Taj 10250 5 21
## 3 0 Dec 24 2016 Vivanta by Taj 9900 5 21
## 4 0 Dec 25 2016 Vivanta by Taj 10350 5 21
## 5 0 Dec 28 2016 Vivanta by Taj 12000 5 21
## 6 1 Dec 31 2016 Vivanta by Taj 11475 5 21
## HotelAddress HotelPincode
## 1 90 Cuffe Parade, Colaba, Mumbai, Maharashtra 400005
## 2 91 Cuffe Parade, Colaba, Mumbai, Maharashtra 400006
## 3 92 Cuffe Parade, Colaba, Mumbai, Maharashtra 400007
## 4 93 Cuffe Parade, Colaba, Mumbai, Maharashtra 400008
## 5 94 Cuffe Parade, Colaba, Mumbai, Maharashtra 400009
## 6 95 Cuffe Parade, Colaba, Mumbai, Maharashtra 400010
## HotelDescription FreeWifi FreeBreakfast
## 1 Luxury hotel with spa, near Gateway of India 1 0
## 2 Luxury hotel with spa, near Gateway of India 1 0
## 3 Luxury hotel with spa, near Gateway of India 1 0
## 4 Luxury hotel with spa, near Gateway of India 1 0
## 5 Luxury hotel with spa, near Gateway of India 1 0
## 6 Luxury hotel with spa, near Gateway of India 1 0
## HotelCapacity HasSwimmingPool
## 1 287 1
## 2 287 1
## 3 287 1
## 4 287 1
## 5 287 1
## 6 287 1
Observation 1. There are total ‘intRowCount’ data records in the file.
As there are Non Numeric data as well in the given dataset, so we are going to remove the non numeric data.
Data_Cleaning
dfrModel <- select(dfrModel, -c(CityName, Date, HotelName, HotelAddress, HotelDescription, HotelPincode ))
Summary
#describe(dfrModel$CityName)
describe(dfrModel$Population)[,c(2,3,4,5,8,9)]
## n mean sd median min max
## X1 13232 4416837 4258386 3046163 8096 12442373
#describe(dfrModel$CityRank)[,c(2,3,4,5,8,9)]
#describe(dfrModel$IsMetroCity)[,c(2,3,4,5,8,9)]
#describe(dfrModel$IsTouristDestination)[,c(2,3,4,5,8,9)]
#describe(dfrModel$IsWeekend)[,c(2,3,4,5,8,9)]
#describe(dfrModel$IsNewYearEve)[,c(2,3,4,5,8,9)]
#describe(dfrModel$Date)[,c(2,3,4,5,8,9)]
#describe(dfrModel$HotelName)[,c(2,3,4,5,8,9)]
describe(dfrModel$RoomRent)[,c(2,3,4,5,8,9)]
## n mean sd median min max
## X1 13232 5473.99 7333.12 4000 299 322500
describe(dfrModel$StarRating)[,c(2,3,4,5,8,9)]
## n mean sd median min max
## X1 13232 3.46 0.76 3 0 5
describe(dfrModel$Airport)[,c(2,3,4,5,8,9)]
## n mean sd median min max
## X1 13232 21.16 22.76 15 0.2 124
#describe(dfrModel$HotelAddress)[,c(2,3,4,5,8,9)]
#describe(dfrModel$HotelPincode)[,c(2,3,4,5,8,9)]
#describe(dfrModel$HotelDescription)[,c(2,3,4,5,8,9)]
#describe(dfrModel$FreeWifi)[,c(2,3,4,5,8,9)]
#describe(dfrModel$FreeBreakfast)[,c(2,3,4,5,8,9)]
describe(dfrModel$HotelCapacity)[,c(2,3,4,5,8,9)]
## n mean sd median min max
## X1 13232 62.51 76.66 34 0 600
#describe(dfrModel$HasSwimmingPool)[,c(2,3,4,5,8,9)]
Observations
Dependent Variable is
Y = Hotel Rent
Independent Variable is
X1 = Star Rating
X2 = IsTouristDestination
X3 = Airport Distance
X4 = Hotel Capacity
Box Plot
#Graph_Boxplot(dfrModel$CityName)
#Graph_Boxplot(dfrModel$Population)
#Graph_Boxplot(dfrModel$CityRank)
#Graph_Boxplot(dfrModel$IsMetroCity)
#Graph_Boxplot(dfrModel$IsTouristDestination)
#Graph_Boxplot(dfrModel$IsWeekend)
#Graph_Boxplot(dfrModel$IsNewYearEve)
#Graph_Boxplot(dfrModel$Date)
#Graph_Boxplot(dfrModel$HotelName)
#Graph_Boxplot(dfrModel$RoomRent)
Graph_Boxplot(dfrModel$StarRating)
Graph_Boxplot(dfrModel$Airport)
#Graph_Boxplot(dfrModel$HotelAddress)
#Graph_Boxplot(dfrModel$HotelPincode)
#Graph_Boxplot(dfrModel$HotelDescription)
#Graph_Boxplot(dfrModel$FreeWifi)
#Graph_Boxplot(dfrModel$FreeBreakfast)
Graph_Boxplot(dfrModel$HotelCapacity)
#Graph_Boxplot(dfrModel$HasSwimmingPool)
Observation
There are few outliers in the datasets
Tables
TouristDestination <- table(dfrModel$IsTouristDestination)
TouristDestination
##
## 0 1
## 4007 9225
prop.table(TouristDestination)
##
## 0 1
## 0.3028265 0.6971735
Observations
Here
1 Implies Tourist Destination
0 Implies Not an tourist destination
Scatter Plot
plot(y=dfrModel$RoomRent, x=dfrModel$Airport,
col="green",
ylim=c(0, 350000), xlim=c(0, 150),
main="Relationship Btw Room Rent and Airport Distance",
ylab="Hotel Rent", xlab="Airport Distance")
scatterplot(dfrModel$Airport, dfrModel$RoomRent , main="Relationship Btw Room Rent and Airport Distance", xlab="Airport Distance", ylab="Hotel Rent")
plot((dfrModel$IsTouristDestination),jitter(dfrModel$RoomRent),
col="green",
ylim=c(0, 350000), xlim=c(0, 5),
main="Relationship Btw Room Rent and Tourist Destination",
ylab="Hotel Rent", xlab="Tourist Destination")
plot(y=dfrModel$RoomRent, x=dfrModel$StarRating,
col="blue",
ylim=c(0, 350000), xlim=c(0, 10),
main="Relationship Btw Room Rent and Star Rating of Hotel",
ylab="Hotel Rent", xlab="Star Rating")
plot(y=dfrModel$RoomRent, x=dfrModel$HotelCapacity,
col="green",
ylim=c(0, 350000), xlim=c(0, 150),
main="Relationship Btw Room Rent and Hotel Capacity",
ylab="Hotel Rent", xlab="Hotel Capacity")
scatterplot(dfrModel$HotelCapacity, dfrModel$RoomRent , main="Relationship Btw Room Rent and Hotel Capacity", xlab="Hotel Capacity", ylab="Hotel Rent")
Observations
1.Above scatter plot is showing some relationship between Hotel rent and other Independent variables.
Correlation Plot
#pairs(dfrModel)
corrplot(corr=cor(dfrModel[ , c(4,7,8,9,12)], use="complete.obs"),
method ="ellipse")
Correlation Matrix
cor(dfrModel[, c(1:13)])
## Population CityRank IsMetroCity
## Population 1.0000000000 -0.8353204432 0.7712260105
## CityRank -0.8353204432 1.0000000000 -0.5643937903
## IsMetroCity 0.7712260105 -0.5643937903 1.0000000000
## IsTouristDestination -0.0482029722 0.2807134520 0.1763717063
## IsWeekend 0.0115926802 -0.0072564766 0.0018118005
## IsNewYearEve 0.0007332482 -0.0006326444 0.0006464753
## RoomRent -0.0887280632 0.0939855292 -0.0668397705
## StarRating 0.1341365933 -0.1333810133 0.0776028661
## Airport -0.2597010198 0.5059119892 -0.2073586125
## FreeWifi 0.1129334410 -0.1214309404 0.0868288677
## FreeBreakfast 0.0364278235 -0.0086837497 0.0513856623
## HotelCapacity 0.2599830516 -0.2561197059 0.1871502153
## HasSwimmingPool 0.0262590820 -0.1029737518 0.0214119243
## IsTouristDestination IsWeekend IsNewYearEve
## Population -0.048202972 0.011592680 7.332482e-04
## CityRank 0.280713452 -0.007256477 -6.326444e-04
## IsMetroCity 0.176371706 0.001811801 6.464753e-04
## IsTouristDestination 1.000000000 -0.019481101 -2.266388e-03
## IsWeekend -0.019481101 1.000000000 2.923821e-01
## IsNewYearEve -0.002266388 0.292382051 1.000000e+00
## RoomRent 0.122502963 0.004580134 3.849123e-02
## StarRating -0.040554998 0.006378436 2.360897e-03
## Airport 0.194422049 -0.002724756 4.598872e-04
## FreeWifi -0.061568821 0.002960828 2.787472e-05
## FreeBreakfast -0.071692559 -0.007612777 -2.606416e-03
## HotelCapacity -0.094356091 0.006306507 1.352679e-03
## HasSwimmingPool 0.042156280 0.004500461 1.122308e-03
## RoomRent StarRating Airport FreeWifi
## Population -0.088728063 0.134136593 -0.2597010198 1.129334e-01
## CityRank 0.093985529 -0.133381013 0.5059119892 -1.214309e-01
## IsMetroCity -0.066839771 0.077602866 -0.2073586125 8.682887e-02
## IsTouristDestination 0.122502963 -0.040554998 0.1944220492 -6.156882e-02
## IsWeekend 0.004580134 0.006378436 -0.0027247555 2.960828e-03
## IsNewYearEve 0.038491227 0.002360897 0.0004598872 2.787472e-05
## RoomRent 1.000000000 0.369373425 0.0496532442 3.627002e-03
## StarRating 0.369373425 1.000000000 -0.0609191837 1.800959e-02
## Airport 0.049653244 -0.060919184 1.0000000000 -9.452368e-02
## FreeWifi 0.003627002 0.018009594 -0.0945236768 1.000000e+00
## FreeBreakfast -0.010006370 -0.032892463 0.0242839409 1.582206e-01
## HotelCapacity 0.157873308 0.637430337 -0.1176720722 -8.703612e-03
## HasSwimmingPool 0.311657734 0.618214699 -0.1416665606 -2.407405e-02
## FreeBreakfast HotelCapacity HasSwimmingPool
## Population 0.036427824 0.259983052 0.026259082
## CityRank -0.008683750 -0.256119706 -0.102973752
## IsMetroCity 0.051385662 0.187150215 0.021411924
## IsTouristDestination -0.071692559 -0.094356091 0.042156280
## IsWeekend -0.007612777 0.006306507 0.004500461
## IsNewYearEve -0.002606416 0.001352679 0.001122308
## RoomRent -0.010006370 0.157873308 0.311657734
## StarRating -0.032892463 0.637430337 0.618214699
## Airport 0.024283941 -0.117672072 -0.141666561
## FreeWifi 0.158220597 -0.008703612 -0.024074046
## FreeBreakfast 1.000000000 -0.087165446 -0.061522132
## HotelCapacity -0.087165446 1.000000000 0.509045809
## HasSwimmingPool -0.061522132 0.509045809 1.000000000
Correlation with Room Rent
Correlation
vctCorr = numeric(0)
for (i in names(dfrModel)){
cor.result <- cor(dfrModel$RoomRent, as.numeric(dfrModel[,i]))
vctCorr <- c(vctCorr, cor.result)
}
dfrCorr <- vctCorr
names(dfrCorr) <- names(dfrModel)
dfrCorr
## Population CityRank IsMetroCity
## -0.088728063 0.093985529 -0.066839771
## IsTouristDestination IsWeekend IsNewYearEve
## 0.122502963 0.004580134 0.038491227
## RoomRent StarRating Airport
## 1.000000000 0.369373425 0.049653244
## FreeWifi FreeBreakfast HotelCapacity
## 0.003627002 -0.010006370 0.157873308
## HasSwimmingPool
## 0.311657734
Visualize
dfrGraph <- gather(dfrModel, variable, value, -RoomRent)
head(dfrGraph)
## RoomRent variable value
## 1 12375 Population 12442373
## 2 10250 Population 12442373
## 3 9900 Population 12442373
## 4 10350 Population 12442373
## 5 12000 Population 12442373
## 6 11475 Population 12442373
ggplot(dfrGraph) +
geom_jitter(aes(value,RoomRent, colour=variable)) +
geom_smooth(aes(value,RoomRent, colour=variable), method=lm, se=FALSE) +
facet_wrap(~variable, scales="free_x") +
labs(title="Relation Of Price With Other Features")
Regression Analysis
Find Best Multi Linear Model for Economy Class
Choose the best linear model by using step(). Choose a model by AIC in a Stepwise Algorithm
In statistics, stepwise regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some prespecified criterion.
The Akaike information criterion (AIC) is a measure of the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Hence, AIC provides a means for model selection.
#?step()
stpModel=step(lm(data=dfrModel, RoomRent~.), trace=0, steps=1000)
stpSummary <- summary(stpModel)
stpSummary
##
## Call:
## lm(formula = RoomRent ~ Population + IsMetroCity + IsTouristDestination +
## IsNewYearEve + StarRating + Airport + FreeWifi + HotelCapacity +
## HasSwimmingPool, data = dfrModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11839 -2385 -691 1045 309532
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.560e+03 4.055e+02 -21.109 < 2e-16 ***
## Population -1.244e-04 2.263e-05 -5.499 3.88e-08 ***
## IsMetroCity -6.369e+02 2.132e+02 -2.988 0.00282 **
## IsTouristDestination 1.918e+03 1.374e+02 13.958 < 2e-16 ***
## IsNewYearEve 8.430e+02 1.739e+02 4.849 1.26e-06 ***
## StarRating 3.598e+03 1.104e+02 32.582 < 2e-16 ***
## Airport 1.001e+01 2.716e+00 3.684 0.00023 ***
## FreeWifi 5.952e+02 2.217e+02 2.685 0.00726 **
## HotelCapacity -1.040e+01 1.029e+00 -10.115 < 2e-16 ***
## HasSwimmingPool 2.147e+03 1.598e+02 13.434 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6600 on 13222 degrees of freedom
## Multiple R-squared: 0.1904, Adjusted R-squared: 0.1899
## F-statistic: 345.5 on 9 and 13222 DF, p-value: < 2.2e-16
Model1
## ------------------------------------------------------------------------
Model1 <- RoomRent ~ Population+CityRank+IsMetroCity+IsTouristDestination+IsWeekend+IsNewYearEve+StarRating+Airport+FreeWifi+FreeBreakfast+HotelCapacity+HasSwimmingPool
fit1 <- lm(Model1, data = dfrModel)
summary(fit1)
##
## Call:
## lm(formula = Model1, data = dfrModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11845 -2356 -690 1030 309689
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.604e+03 4.494e+02 -19.147 < 2e-16 ***
## Population -1.188e-04 3.592e-05 -3.307 0.000945 ***
## CityRank 1.821e+00 1.035e+01 0.176 0.860302
## IsMetroCity -6.640e+02 2.164e+02 -3.068 0.002158 **
## IsTouristDestination 1.925e+03 1.481e+02 13.001 < 2e-16 ***
## IsWeekend -9.076e+01 1.239e+02 -0.733 0.463709
## IsNewYearEve 8.826e+02 1.818e+02 4.855 1.22e-06 ***
## StarRating 3.592e+03 1.108e+02 32.434 < 2e-16 ***
## Airport 9.510e+00 3.171e+00 2.999 0.002709 **
## FreeWifi 5.498e+02 2.242e+02 2.452 0.014214 *
## FreeBreakfast 1.688e+02 1.233e+02 1.369 0.171163
## HotelCapacity -1.028e+01 1.033e+00 -9.945 < 2e-16 ***
## HasSwimmingPool 2.153e+03 1.616e+02 13.327 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6601 on 13219 degrees of freedom
## Multiple R-squared: 0.1906, Adjusted R-squared: 0.1898
## F-statistic: 259.3 on 12 and 13219 DF, p-value: < 2.2e-16
Model Fit
## ------------------------------------------------------------------------
library(leaps)
leap1 <- regsubsets(Model1, data = dfrModel, nbest=1)
# summary(leap1)
plot(leap1, scale="adjr2")
Observations
The best fit model excludes Free Breakfast, City Rank and IS Weekend. Therefore, in our next model, we rerun the regression, excluding these variables.
Model2
## ------------------------------------------------------------------------
Model2 <- RoomRent ~ StarRating+Population+IsMetroCity+IsTouristDestination+IsNewYearEve+Airport+FreeWifi+HotelCapacity+HasSwimmingPool
fit2 <- lm(Model2, data = dfrModel)
summary(fit2)
##
## Call:
## lm(formula = Model2, data = dfrModel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11839 -2385 -691 1045 309532
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.560e+03 4.055e+02 -21.109 < 2e-16 ***
## StarRating 3.598e+03 1.104e+02 32.582 < 2e-16 ***
## Population -1.244e-04 2.263e-05 -5.499 3.88e-08 ***
## IsMetroCity -6.369e+02 2.132e+02 -2.988 0.00282 **
## IsTouristDestination 1.918e+03 1.374e+02 13.958 < 2e-16 ***
## IsNewYearEve 8.430e+02 1.739e+02 4.849 1.26e-06 ***
## Airport 1.001e+01 2.716e+00 3.684 0.00023 ***
## FreeWifi 5.952e+02 2.217e+02 2.685 0.00726 **
## HotelCapacity -1.040e+01 1.029e+00 -10.115 < 2e-16 ***
## HasSwimmingPool 2.147e+03 1.598e+02 13.434 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6600 on 13222 degrees of freedom
## Multiple R-squared: 0.1904, Adjusted R-squared: 0.1899
## F-statistic: 345.5 on 9 and 13222 DF, p-value: < 2.2e-16
Observations of Regression Analysis
Null Hypothesis
There is no dependency between Room rent of hotel and other variables
Alternative Hypothesis
There is dependency between Room Rent and other variables
As per regression model we find out that P Value is less than 0.05 which means we are rejecting the NULL Hypothesis at 95% Confidence interval.
As well as we can see that F value is very high which means means of all the variables differ.
Below are the 9 variables which are affecting the price of the Room of the hotels, As well as they are in the order of significance to affect the room rent
StarRating
Population
IsTouristDestination
HotelCapacity
HasSwimmingPool
IsNewYearEve
Airport
IsMetroCity
Free Wifi
VISUALIZE THE BETA COEFFICIENTS AND THEIR CONFIDENCE INTERVALS FROM MODEL 2
library(coefplot)
coefplot(fit2, intercept= FALSE, outerCI=1.96,coefficients=c("StarRating","Population", "IsMetroCity", "IsTouristDestination", "IsNewYearEve", "Airport", "HotelCapacity", "HasSwimmingPool"))
## Warning: Ignoring unknown aesthetics: xmin, xmax
## ------------------------------------------------------------------------
# the Adjusted R Squared for Model 2 is less than Model 1
summary(fit1)$adj.r.squared
## [1] 0.1898256
summary(fit2)$adj.r.squared
## [1] 0.1898573
# the AIC for Model 2 is less than Model 1
AIC(fit1)
## [1] 270314.1
AIC(fit2)
## [1] 270310.6
Observations
1. We can see that Adjusted R square value is more for model 2 instead of model 1 so model 2 is better
2. As well as AIC Value is less than Model 1, so Model 2 is better
Note: For Regression analysis we exclude the date, Date can be a important factor which can be used in time series forecasting, but as of now we are not using date in our regression model
Below are the 9 variables which are affecting the price of the Room of the hotels, As well as they are in the order of significance to affect the room rent
StarRating
Population
IsTouristDestination
HotelCapacity
HasSwimmingPool
IsNewYearEve
Airport
IsMetroCity
Free Wifi
###########End of the Project#########