R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

# ###STUDY OF THE FACTORS THAT INVOLVE PRICING STRATEGY OF HOTELS IN INDIA
# KULDEEP KURROLIYA
# ABSTRACT
# The purpose of this project is to analyze the pricing strategy of hotels in the Indian hotel industry. Many factors drive hotel room prices which are primarily of two types: external and internal. The objective of this project is to identify the factors that matter the most.
# 
# INTRODUCTION
# The dataset tracks hotel prices on 8 different dates at different hotels across different cities. 
# 
# DATA PREPROCESSING:-
# DEPENDENT VARIABLE 
# DECISION VARIABLE     UNITS   MEANING
# RoomRent  Rupees  Rent for the cheapest room, double occupancy, in Indian Rupees. 
# 
# Some hotels have more than one type of double occupancy room. For simplicity, we picked the cheapest room with double occupancy.
# 
# EXTERNAL FACTORS
# Many external factors can potentially influence the Room Rent. The dataset captures some of these external factors, as explained below.
# 
# VARIABLE  UNITS   MEANING
# Date  Text    We have hotel room rent data for the following 8 dates for each hotel:
# {Dec 31, Dec 25, Dec 24, Dec 18, Dec 21, Dec 28, Jan 4, Jan 8}
# If a hotel is sold out on a given date, assume that the price of the hotel room on the date it is sold out is the maximum price from the sample of dates for which prices are available.
# IsWeekend Dummy   We use '0' to indicate week days, '1' to indicate weekend dates (Sat / Sun)
# IsNewYearEve  Dummy   '1' for Dec 31, '0' otherwise
#       
# CityName  Text    Name of the City where the Hotel is located   e.g. Mumbai`
# Population    Number  Population of the City in 2011 
# 
# CityRank  Dummy   Rank order of City by Population (e.g. Mumbai = 0, Delhi = 1, so on 
# IsMetroCity   Dummy   '1' if CityName is {Mumbai, Delhi, Kolkatta, Chennai}, '0' otherwise
#       
# IsTouristDestination  Dummy   We use '1' if the city is primarily a tourist destination, '0' otherwise. For example, Goa and Agra are primarily tourist destinations. We assume that most people who visit Goa and Agra and stay in their hotels are in these cities primarily for tourism. 
# 
# INTERNAL FACTORS
# Many Hotel Features can influence the Room Rent. The dataset captures some of these internal factors, as explained below.
# 
# VARIABLE  UNITS   MEANING
# HotelName Text    e.g. Park Hyatt Goa Resort and Spa
# StarRating    Number  e.g. 5
# Airport   km  Distance between Hotel and closest major Airport
# HotelAddress  Text    e.g. Arrossim Beach, Cansaulim, Goa
# HotelPincode  Number  403712
# HotelDescription  Text    e.g. 5-star beachfront resort with spa, near Arossim Beach
# FreeWifi  Dummy   '1' if the hotel offers Free Wifi, '0' otherwise
# FreeBreakfast Dummy   '1' if the hotel offers Free Breakfast, '0' otherwise
# HotelCapacity Number  e.g. 242.  (enter '0' if not available)
# HasSwimmingPool   Dummy   '1' if they have a swimming pool, '0' otherwise
# 
# METHOD
#     The dataset was read into R. The data was summarized to understand the mean, median, standard deviation of each variable. The problem was formulated as Y = F(x1, x2, x3..)The Dependent Variable(s) (i.e. the Y in the Y = F(x)) in the Dataset was identified as RoomRent. The three most important Independent variables (i.e. x1, x2, x3) in the dataset
# were taken as StarRating , HotelCapacity and IsATouristDestination. Some visualizations have been shown below to understand the correlation between these parameters.
# 
# 
#  
# Percentage of Hotels having 0-5 Star ratings
#  
#  
# Corrgram in R involving the Independent and Dependent Variables
# The dataset was then fitted by a linear regressor on a training set which consisted of 80% of the sample and predictions were made on the test set which contained 20% of the sample.  
# OBSERVATIONS
# To get the optimal model the adjusted R-Squared was value was looked at. The model which gave the highest R-Squared value was chosen as final. First all the external factors were clubbed together and then the internal factors. Those features which showed a significantly less p value than 0.05 were taken as statistically significant and the final model result is shown below.
# Call:
# lm(formula = RoomRent ~ IsTouristDestination + HasSwimmingPool + 
#     IsNewYearEve + IsMetroCity + StarRating + HotelCapacity, 
#     data = training_set)
# 
# Residuals:
#    Min     1Q Median     3Q    Max 
# -11995  -2373   -711   1049 308998 
# 
# Coefficients:
#                       Estimate Std. Error t value Pr(>|t|)    
# (Intercept)          -8615.856    409.016 -21.065  < 2e-16 ***
# IsTouristDestination  2269.917    150.651  15.067  < 2e-16 ***
# HasSwimmingPool       2112.919    183.645  11.505  < 2e-16 ***
# IsNewYearEve           702.754    203.505   3.453 0.000556 ***
# IsMetroCity          -1660.269    154.920 -10.717  < 2e-16 ***
# StarRating            3730.666    128.298  29.078  < 2e-16 ***
# HotelCapacity          -11.630      1.175  -9.894  < 2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 6955 on 10764 degrees of freedom
# Multiple R-squared:  0.1795,  Adjusted R-squared:  0.179 
# F-statistic: 392.4 on 6 and 10764 DF,  p-value: < 2.2e-16
# 
# CONCLUSION
# The most significant factors include the location of a hotel whether it is in a tourist area or in a metropolitan city, the date of booking falls on a special occasion like New Year Eve, the review it has in terms of rating and the total capacity of the hotel that determines the price of a room.
# 
# REFERNECES
# www.RBloggers.com
# The final project report to be submitted under the internship of Prof Sammer Mathur (IIM Lucknow, CMU) as a part of his data analytics internship in R.
#