Submit both the .Rmd and .html files for grading. You may remove the instructions and example problem above, but do not remove the YAML metadata block or the first, “setup” code chunk. Address the steps that appear below and answer all the questions. Be sure to address each question with code and comments as needed. You may use either base R functions or ggplot2 for the visualizations.
## Dataset Structure:
## tibble [3,141 × 34] (S3: tbl_df/tbl/data.frame)
## $ ID : num [1:3141] 1 2 3 4 5 6 7 8 9 10 ...
## $ Hotel Name : chr [1:3141] "Hotel Comfort Park - Opposite Sri Ramachandra Medical College Porur" "Regenta Central RS Chennai OMR SIPCOT" "Mer Vue Villa, Kovalam, ECR, Chennai" "Hyatt Regency Chennai" ...
## $ City : chr [1:3141] "Chennai" "Chennai" "Chennai" "Chennai" ...
## $ Number of Ratings : num [1:3141] 152 354 26 1227 139 ...
## $ Distance from Center : num [1:3141] 10.2 22.5 28.4 1.6 0.35 9.4 1.6 1.9 2 4.6 ...
## $ Categorized Dist from Centre: chr [1:3141] "Secluded" "Secluded" "Secluded" "Close to City Centre" ...
## $ Metro : logi [1:3141] TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ Staff : num [1:3141] 9 8.5 9.5 8.2 8.7 9.3 8.6 8.7 8 8.3 ...
## $ Facilities : num [1:3141] 8.1 8.2 9.1 7.8 8.1 8.4 8.3 8 7.2 7.8 ...
## $ Cleanliness : num [1:3141] 8.5 8.5 9.4 8 8.3 9 8.7 8.7 7.4 8.3 ...
## $ Value for Money : num [1:3141] 8.3 8.3 8.8 7.5 8.3 8.4 8.2 8.4 7.3 7.7 ...
## $ Location : num [1:3141] 8.3 8.6 9 8.8 9.2 8.6 8.8 8.5 8.4 7.5 ...
## $ Free Wi-Fi : num [1:3141] 7.5 8.5 8.8 7.9 7.3 7.5 7.9 7.5 6.1 7.3 ...
## $ Comfort : num [1:3141] 8.6 8.6 9.2 8 8.4 8.8 8.7 8.6 7.5 8.3 ...
## $ Overall Rating : num [1:3141] 8.2 8.2 9.1 7.5 8.2 8.7 8.2 8.3 7 7.7 ...
## $ 24-hour front desk : num [1:3141] 1 1 0 1 0 0 1 0 1 1 ...
## $ 24-hour security : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
## $ cctv outside property : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
## $ cctv in common areas : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
## $ room service : num [1:3141] 1 0 0 0 0 0 0 0 1 1 ...
## $ family rooms : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
## $ luggage storage : num [1:3141] 1 0 0 0 0 1 0 0 1 1 ...
## $ non-smoking rooms : num [1:3141] 1 0 1 0 0 1 0 0 1 0 ...
## $ flat-screen tv : num [1:3141] 0 0 1 0 0 1 0 0 1 1 ...
## $ air conditioning : num [1:3141] 1 1 1 1 1 1 1 1 1 1 ...
## $ fan : num [1:3141] 1 0 1 0 0 1 0 0 1 0 ...
## $ shower : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
## $ free toiletries : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
## $ towels : num [1:3141] 1 0 1 0 0 1 0 0 1 0 ...
## $ toilet paper : num [1:3141] 1 0 1 0 0 1 0 0 1 0 ...
## $ daily housekeeping : num [1:3141] 1 1 0 0 0 1 1 0 1 1 ...
## $ ironing service : num [1:3141] 1 0 0 0 0 1 0 0 1 1 ...
## $ laundry : num [1:3141] 0 0 0 0 0 0 0 0 1 1 ...
## $ Average Room Price : num [1:3141] 3150 5280 16200 9499 7675 ...
##
## Dataset Summary:
## ID Hotel Name City Number of Ratings
## Min. : 1 Length:3141 Length:3141 Min. : 1
## 1st Qu.: 786 Class :character Class :character 1st Qu.: 47
## Median :1571 Mode :character Mode :character Median : 114
## Mean :1571 Mean : 268
## 3rd Qu.:2356 3rd Qu.: 283
## Max. :3141 Max. :6180
## Distance from Center Categorized Dist from Centre Metro
## Min. : 0.100 Length:3141 Mode :logical
## 1st Qu.: 2.200 Class :character FALSE:1562
## Median : 4.300 Mode :character TRUE :1579
## Mean : 6.644
## 3rd Qu.: 9.800
## Max. :47.200
## Staff Facilities Cleanliness Value for Money
## Min. : 2.500 Min. : 2.500 Min. : 2.500 Min. : 2.500
## 1st Qu.: 7.600 1st Qu.: 6.900 1st Qu.: 7.200 1st Qu.: 7.000
## Median : 8.200 Median : 7.600 Median : 7.900 Median : 7.600
## Mean : 8.101 Mean : 7.486 Mean : 7.803 Mean : 7.543
## 3rd Qu.: 8.700 3rd Qu.: 8.200 3rd Qu.: 8.600 3rd Qu.: 8.200
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
## Location Free Wi-Fi Comfort Overall Rating
## Min. : 2.50 Min. : 2.500 Min. : 2.500 Min. : 1.000
## 1st Qu.: 7.60 1st Qu.: 6.300 1st Qu.: 7.200 1st Qu.: 6.800
## Median : 8.20 Median : 7.500 Median : 7.900 Median : 7.500
## Mean : 8.09 Mean : 7.181 Mean : 7.806 Mean : 7.385
## 3rd Qu.: 8.70 3rd Qu.: 8.300 3rd Qu.: 8.500 3rd Qu.: 8.100
## Max. :10.00 Max. :10.000 Max. :10.000 Max. :10.000
## 24-hour front desk 24-hour security cctv outside property cctv in common areas
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.8835 Mean :0.5336 Mean :0.5062 Mean :0.5635
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## room service family rooms luggage storage non-smoking rooms
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000
## Median :1.0000 Median :0.0000 Median :1.0000 Median :1.0000
## Mean :0.5989 Mean :0.4709 Mean :0.5514 Mean :0.7641
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## flat-screen tv air conditioning fan shower
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :1.0000 Median :0.0000 Median :1.0000
## Mean :0.4957 Mean :0.9319 Mean :0.4788 Mean :0.5466
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## free toiletries towels toilet paper daily housekeeping
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000
## Median :1.0000 Median :1.0000 Median :0.0000 Median :1.0000
## Mean :0.5002 Mean :0.5043 Mean :0.4862 Mean :0.8134
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## ironing service laundry Average Room Price
## Min. :0.0000 Min. :0.0000 Min. : 142
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 1700
## Median :1.0000 Median :1.0000 Median : 2657
## Mean :0.5422 Mean :0.5896 Mean : 3514
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.: 4275
## Max. :1.0000 Max. :1.0000 Max. :35700
#### 1. Descriptive Statistics ####
Do an Exploratory Data Analysis (EDA) and provide appropriate summary statistics and data visualizations to understand the various types of hotels. You are to consider such things as existence of the features (which hotels has which feature and which hotels do not), distribution of ratings, and average room price. Investigate (without using hypothesis testing) if there is any difference across cities, regions, metros, or non-metros. The overall goal is to understand the Overall Rating according to various types of hotels with different facilities. Also note that not all hotels have the exact same number of ratings given to it. Give a description of what you have learned from this EDA..
## Total hotels: 3141
## Variables: 34
## Price Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 142 1700 2657 3514 4275 35700
##
## Rating Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 6.800 7.500 7.385 8.100 10.000
Answer: price range: 142-35700 rupees, average room price is 3514 rupees. Median is at 2657. Rating average is around 7.39/10, median is at 7.5. ratings show constistent rating across the industry.
#### EDA analysis ####
Basic plots:
##
## --- Metro vers Non-Metro Comparison ---
## # A tibble: 2 × 4
## Metro Count Avg_Price Avg_Rating
## <lgl> <int> <dbl> <dbl>
## 1 FALSE 1562 3403. 7.66
## 2 TRUE 1579 3624. 7.12
Answer: Metro hotels have higher average prices “3624” than non-metro hotels “3403”.
Answer: The distribution of room prices show that the prices
are mostly at around 2000-4000 range with the long tail of right
skewedness which tells us that there are some extremely expensive
hotels. The rating distribution shows that the histogram is left-skewed.
Many hotels have a decent rating at around 7-8. Non-Metro hotels offer
better value, since they’re cheaper AND have better ratings. Metro
hotels charge more but deliver worse guest satisfaction. Most likely the
Non-Metro hotels need to compensate for the lack of metro, hence the
ratings are higher.
Top cities analysis
##
## - Top 10 Cities by Hotel Count -
## # A tibble: 10 × 2
## City Count
## <chr> <int>
## 1 Bangalore 421
## 2 Delhi 392
## 3 Mumbai 350
## 4 Chennai 276
## 5 Jaipur 257
## 6 Udaipur 143
## 7 Cochin 142
## 8 Kolkata 140
## 9 Pune 125
## 10 Amristar 114
Answer: Top 3 Cities are Bangalore, Delhi, Mumbai. They dominate with the most hotels. It seems like top 5 cities account for over 50% of all hotels.
##
## --- Common Hotel Features ---
## air conditioning : 2927 hotels ( 93.2 %) have this feature
## 24-hour front desk : 2775 hotels ( 88.3 %) have this feature
## free toiletries : 1571 hotels ( 50 %) have this feature
## daily housekeeping : 2555 hotels ( 81.3 %) have this feature
Answer: 93% of hotels have air conditioner. It seems to be a nearly universal standard. 24-hour front desk is common in 88% of the hotels, which is an important service to have. Daily housekeeping is also very common and is at 81% of hotels. Free toiletries are not very common but exist in 50% of the hotels.
Summary: Based on the EDA of the hotel data set, several key patterns emerge about the Indian hotel industry. The data covers 3,141 hotels with room prices ranging widely from 142 rupees to 35,700 rupees, though most hotels fall in the affordable to mid-range category with an average price of 3,514 rupees. Interestingly while metro hotels tend to be slightly more expensive at 3,624 rupees compared to non-metro hotels at 3,403 rupees, they actually receive lower average ratings (7.12 vs 7.66 out of 10). This might be because travelers find better value in non-metro areas I also think that since the rent and land value is usually high in those areas, the rooms might be smaller. Also the areas with metros are noisier, so it might also contribute to the comfort score. In my opinion it might be that the Non-Metro hotels need to compensate for the lack of metro to attract tourists hence the higher ratings. The hotel distribution is concentrated in major cities like Bangalore Delhi and Mumbai, which together account for over a third of all hotels. When looking at hotel features, air conditioning is almost always present - 93% of hotels have it, followed by 24-hour front desk service at 88% and daily housekeeping at 81%. However only half of the hotels provide free toiletries, making this a key differentiator rather than a part of standard package. Overall the data reveals that price doesn’t always correlate with guest satisfaction, and non-metro hotels appear to offer better overall value to travelers.
### Correlation analysis ###
##
##
## === CORRELATION ANALYSIS ===
## Correlation Matrix:
## Average Room Price Overall Rating Location Cleanliness
## Average Room Price 1.000 0.252 0.284 0.307
## Overall Rating 0.252 1.000 0.735 0.933
## Location 0.284 0.735 1.000 0.706
## Cleanliness 0.307 0.933 0.706 1.000
## Comfort 0.317 0.949 0.739 0.957
## Value for Money 0.105 0.935 0.703 0.910
## Staff 0.213 0.929 0.725 0.902
## Facilities 0.293 0.953 0.736 0.952
## Comfort Value for Money Staff Facilities
## Average Room Price 0.317 0.105 0.213 0.293
## Overall Rating 0.949 0.935 0.929 0.953
## Location 0.739 0.703 0.725 0.736
## Cleanliness 0.957 0.910 0.902 0.952
## Comfort 1.000 0.920 0.905 0.971
## Value for Money 0.920 1.000 0.916 0.935
## Staff 0.905 0.916 1.000 0.916
## Facilities 0.971 0.935 0.916 1.000
The HeatMap
Answer: Looking at the correlation results there are some
interesting patterns that emerge. Room price doesn’t really have strong
connections with any of the ratings, the best ones are comfort and
cleanliness but even those are only around 0.3 which is pretty weak.
Location is a bit better at 0.28 for predicting price which makes sense
cause good locations cost more. The odd thing is value for money has
almost no relationship with price at 0.11 which means expensive hotels
aren’t necessarily seen as better value. Though what really stands out
is how all the rating categories are basically saying the same thing.
The correlations between overall rating and comfort, facilities, and
value for money are all above 0.93 which is very high. Basically, if you
know a hotels comfort score you may already know its overall rating too.
Same with cleanliness and staff ratings they’re all very connected.
Comfort and cleanliness have the strongest correlations with price, with
location also moderately related And for overall rating you could use
pretty much any of the specific ratings since they’re all very similar.
The low correlation between price and value for money is surprising
though made me wonder if paying more actually gets you better value or
not.
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
Anaswer: After reviewing both the correlation values and the
scatter plots, several important patterns became more apparent. The
numerical correlations showed that room price has only weak
relationships with the rating variables, and the scatter plots supported
this result. The Location vs Room Price plot showed a very weak trend,
where the points were somewhat close to the line at lower location
ratings but became widely scattered as the rating increased. This
suggests that location does not reliably explain differences in price.
The Distance from Center vs Room Price graph showed even less structure,
with most hotels clustered close to the center and almost no points
further out. The trend line was nearly flat, indicating that distance
from the center is not a meaningful predictor of room price in this
dataset.In contrast, the relationships between the rating variables were
much stronger. The scatter plot for Comfort and Cleanliness showed a
clear, almost linear pattern, meaning these two factors tend to rise and
fall together. This matches the very high correlations observed among
most of the rating categories, including their strong connections to the
Overall Rating. These results suggest that the individual ratings
largely reflect the same general perception of hotel quality. Another
notable pattern was that Value for Money had a strong positive
relationship with Overall Rating, even though it showed almost no
relationship with price. This indicates that guests judge value based
more on the overall experience rather than how expensive the hotel is.
Overall, the analysis shows that rating variables are highly consistent
with one another, while room price is only weakly related to location or
distance. The scatter plots helped confirm which correlations were truly
meaningful and which ones were misleading when looking at the numbers
alone.
#### 2. CONTINGENCY TABLE ####
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(hotel_data$`air conditioning`, hotel_data$`24-hour front desk`)
## X-squared = 145.02, df = 1, p-value < 2.2e-16
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(hotel_data$`air conditioning`, hotel_data$Metro_Status)
## X-squared = 14.709, df = 1, p-value = 0.0001255
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(hotel_data$`24-hour front desk`, hotel_data$Metro_Status)
## X-squared = 11.501, df = 1, p-value = 0.0006957
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(hotel_data$`daily housekeeping`, hotel_data$Metro_Status)
## X-squared = 0.3108, df = 1, p-value = 0.5772
Answer: The chi-square tests show that most hotel features are not independent of each other. For AC and 24-hour front desk, the p-value is extremely small (2.12e-33), which means there’s a very strong relationship between them – hotels that have AC almost always also have a 24-hour front desk. When looking at AC versus metro status, the p-value is 0.000125, which is still significant, showing that metro hotels are more likely to have AC than non-metro hotels. The same pattern appears for 24-hour front desk and metro status, with a p-value of 0.000695 – metro hotels tend to offer this service more often. However, daily housekeeping is different. The p-value is 0.577, which is much higher than 0.05, meaning there’s no real relationship between metro status and whether hotels offer daily housekeeping. Both metro and non-metro hotels provide this service at similar rates. So basically, luxury features like AC and 24-hour service are more common in metro hotels, but basic services like daily housekeeping are equally available everywhere.
###Inferential statistics###
## Comparing Bangalore and Delhi
##
## --- Room Price Comparison ---
##
## Welch Two Sample t-test
##
## data: city1_data$`Average Room Price` and city2_data$`Average Room Price`
## t = 1.4674, df = 801.39, p-value = 0.1426
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -104.4343 723.0355
## sample estimates:
## mean of x mean of y
## 3663.04 3353.74
##
## --- Overall Rating Comparison ---
##
## Welch Two Sample t-test
##
## data: city1_data$`Overall Rating` and city2_data$`Overall Rating`
## t = 0.35228, df = 810.61, p-value = 0.7247
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1109360 0.1594648
## sample estimates:
## mean of x mean of y
## 7.337530 7.313265
##
## --- 95% Confidence Intervals ---
## Bangalore Price CI: 3349.534 3976.547
## Delhi Price CI: 3082.829 3624.651
## Price Difference CI: -104.4343 723.0355
Answer: Comparing Bangalore and Delhi hotels shows some interesting results. For room prices, Bangalore hotels average 3,663 rupees while Delhi hotels average 3,353 rupees, so Bangalore is about 300 rupees more expensive on average. However, the t-test gives a p-value of 0.1426 which is higher than 0.05, meaning this price difference isnt statistically significant - it could just be due to random chance in our sample. The confidence interval for the price difference goes from -104 to 723 rupees, which includes zero, confirming that we cant be sure there’s a real price difference between the two cities. For overall ratings, both cities are very similar - Bangalore averages 7.34 and Delhi 7.31 out of 10. The p-value here is 0.7247, which is much higher than 0.05, so theres definitely no significant difference in ratings between the two cities. Basically, even though Bangalore looks more expensive on paper, we cant say for sure that it actually is more expensive than Delhi, and both cities have statistically identical guest satisfaction levels.
HYPOTHESIS TEST CONCLUSION
## No statistically significant difference in average room prices between Bangalore and Delhi (p = 0.1426 )
## No statistically significant difference in overall ratings between Bangalore and Delhi (p = 0.7247 )
Answer: The hypothesis tests comparing Bangalore and Delhi hotels show that there are no statistically significant differences between the two cities. For room prices, the p-value of 0.1426 is much higher than the 0.05 significance level, meaning we cannot reject the null hypothesis that both cities have the same average room prices. Even though Bangalore appears more expensive on paper (3,663 vs 3,353 rupees), this difference could easily occur by random chance. Similarly for overall ratings, the p-value of 0.7247 provides strong evidence that guest satisfaction levels are statistically identical between Bangalore and Delhi. Both cities have average ratings around 7.3 out of 10, and the tiny difference between them is not meaningful. The confidence intervals support these conclusions - the price difference interval includes zero, confirming we can’t be sure which city is truly more expensive. From a statistical perspective, Bangalore and Delhi hotels are essentially equivalent in both pricing and quality from the guest’s viewpoint.
### ANOVA ###
## === ANOVA ANALYSIS ===
## Two-Way ANOVA: Price ~ Metro * Distance Category
## Df Sum Sq Mean Sq F value Pr(>F)
## Metro_Status 1 3.823e+07 38234341 4.242 0.0395 *
## Distance_Cat 4 8.377e+07 20941388 2.323 0.0544 .
## Metro_Status:Distance_Cat 4 2.174e+08 54344337 6.029 7.87e-05 ***
## Residuals 3131 2.822e+10 9013760
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## --- Interaction Effect ---
## Significant interaction effect (p = 1e-04 )
## The effect of distance on price differs between metro and non-metro hotels.
##
## --- Main Effects ---
## Metro status significantly affects room prices (p = 0.0395 )
## Distance from city center does not significantly affect room prices (p = 0.0544 )
Answer: The two-way ANOVA results show some interesting
findings about what affects hotel prices. Metro status does have a
significant effect on room prices with a p-value of 0.0395, which is
just below the 0.05 threshold, meaning metro hotels are statistically
more expensive than non-metro hotels. Distance from city center by
itself doesn’t quite reach significance with a p-value of 0.0544, which
is right on the edge. This suggests that distance alone might not be a
strong predictor of price across all hotels. However, the most important
finding is the highly significant interaction effect between metro
status and distance category (p = 0.0000787). This means that how
distance affects price is different for metro hotels versus non-metro
hotels. In other words, the relationship between distance and price
changes depending on whether you’re in a metro city or not. So you cant
just look at distance or metro status alone - you have to consider them
together because they interact with each other in determining hotel
prices. The effect of being far from the city center might be much
stronger in metro areas than in non-metro areas, or vice
versa.
## Full Model Summary:
##
## Call:
## lm(formula = `Average Room Price` ~ `Overall Rating` + Location +
## `Value for Money` + Comfort, data = regression_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6428.3 -1367.4 -482.5 651.5 27939.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3405.01 452.29 -7.528 6.67e-14 ***
## `Overall Rating` 746.88 137.62 5.427 6.16e-08 ***
## Location 548.85 75.65 7.256 5.02e-13 ***
## `Value for Money` -3933.01 123.15 -31.937 < 2e-16 ***
## Comfort 3411.39 141.49 24.111 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2437 on 3136 degrees of freedom
## Multiple R-squared: 0.3477, Adjusted R-squared: 0.3469
## F-statistic: 417.9 on 4 and 3136 DF, p-value: < 2.2e-16
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
##
## Final Model after Stepwise Selection:
##
## Call:
## lm(formula = `Average Room Price` ~ `Overall Rating` + Location +
## `Value for Money` + Comfort, data = regression_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6428.3 -1367.4 -482.5 651.5 27939.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3405.01 452.29 -7.528 6.67e-14 ***
## `Overall Rating` 746.88 137.62 5.427 6.16e-08 ***
## Location 548.85 75.65 7.256 5.02e-13 ***
## `Value for Money` -3933.01 123.15 -31.937 < 2e-16 ***
## Comfort 3411.39 141.49 24.111 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2437 on 3136 degrees of freedom
## Multiple R-squared: 0.3477, Adjusted R-squared: 0.3469
## F-statistic: 417.9 on 4 and 3136 DF, p-value: < 2.2e-16
Answer: The regression results reveal some surprising and counterintuitive relationships. The full model explains about 34.8% of the variation in hotel prices (R-squared = 0.3477), which is much higher than expected. All variables are statistically significant, but the coefficient signs don’t make intuitive sense. Location shows the expected positive relationship - for each 1-point increase in location rating, prices increase by about 549 rupees, which aligns with our correlation findings. However, Value for Money has a strongly negative coefficient (-3933), meaning hotels with higher value ratings actually have lower prices. This contradicts the common assumption that better value means higher prices, but actually supports what guests perceive as good value - more affordable hotels. Comfort has a large positive effect (+3411), showing that comfortable hotels command premium prices. The most puzzling finding is that both Overall Rating and Comfort have large positive effects while Value for Money has a large negative effect, suggesting these ratings capture different aspects of the hotel experience. The stepwise selection kept all variables, indicating each contributes unique information. While the model has good explanatory power, the counterintuitive signs suggest complex relationships between ratings and pricing that don’t follow simple logic.