Submit both the .Rmd and .html files for grading. You may remove the instructions and example problem above, but do not remove the YAML metadata block or the first, “setup” code chunk. Address the steps that appear below and answer all the questions. Be sure to address each question with code and comments as needed. You may use either base R functions or ggplot2 for the visualizations.


Take home exam

##  Dataset Structure:
## tibble [3,141 × 34] (S3: tbl_df/tbl/data.frame)
##  $ ID                          : num [1:3141] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Hotel Name                  : chr [1:3141] "Hotel Comfort Park - Opposite Sri Ramachandra Medical College Porur" "Regenta Central RS Chennai OMR SIPCOT" "Mer Vue Villa, Kovalam, ECR, Chennai" "Hyatt Regency Chennai" ...
##  $ City                        : chr [1:3141] "Chennai" "Chennai" "Chennai" "Chennai" ...
##  $ Number of Ratings           : num [1:3141] 152 354 26 1227 139 ...
##  $ Distance from Center        : num [1:3141] 10.2 22.5 28.4 1.6 0.35 9.4 1.6 1.9 2 4.6 ...
##  $ Categorized Dist from Centre: chr [1:3141] "Secluded" "Secluded" "Secluded" "Close to City Centre" ...
##  $ Metro                       : logi [1:3141] TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ Staff                       : num [1:3141] 9 8.5 9.5 8.2 8.7 9.3 8.6 8.7 8 8.3 ...
##  $ Facilities                  : num [1:3141] 8.1 8.2 9.1 7.8 8.1 8.4 8.3 8 7.2 7.8 ...
##  $ Cleanliness                 : num [1:3141] 8.5 8.5 9.4 8 8.3 9 8.7 8.7 7.4 8.3 ...
##  $ Value for Money             : num [1:3141] 8.3 8.3 8.8 7.5 8.3 8.4 8.2 8.4 7.3 7.7 ...
##  $ Location                    : num [1:3141] 8.3 8.6 9 8.8 9.2 8.6 8.8 8.5 8.4 7.5 ...
##  $ Free Wi-Fi                  : num [1:3141] 7.5 8.5 8.8 7.9 7.3 7.5 7.9 7.5 6.1 7.3 ...
##  $ Comfort                     : num [1:3141] 8.6 8.6 9.2 8 8.4 8.8 8.7 8.6 7.5 8.3 ...
##  $ Overall Rating              : num [1:3141] 8.2 8.2 9.1 7.5 8.2 8.7 8.2 8.3 7 7.7 ...
##  $ 24-hour front desk          : num [1:3141] 1 1 0 1 0 0 1 0 1 1 ...
##  $ 24-hour security            : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
##  $ cctv outside property       : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
##  $ cctv in common areas        : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
##  $ room service                : num [1:3141] 1 0 0 0 0 0 0 0 1 1 ...
##  $ family rooms                : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
##  $ luggage storage             : num [1:3141] 1 0 0 0 0 1 0 0 1 1 ...
##  $ non-smoking rooms           : num [1:3141] 1 0 1 0 0 1 0 0 1 0 ...
##  $ flat-screen tv              : num [1:3141] 0 0 1 0 0 1 0 0 1 1 ...
##  $ air conditioning            : num [1:3141] 1 1 1 1 1 1 1 1 1 1 ...
##  $ fan                         : num [1:3141] 1 0 1 0 0 1 0 0 1 0 ...
##  $ shower                      : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
##  $ free toiletries             : num [1:3141] 1 0 1 0 0 1 0 0 1 1 ...
##  $ towels                      : num [1:3141] 1 0 1 0 0 1 0 0 1 0 ...
##  $ toilet paper                : num [1:3141] 1 0 1 0 0 1 0 0 1 0 ...
##  $ daily housekeeping          : num [1:3141] 1 1 0 0 0 1 1 0 1 1 ...
##  $ ironing service             : num [1:3141] 1 0 0 0 0 1 0 0 1 1 ...
##  $ laundry                     : num [1:3141] 0 0 0 0 0 0 0 0 1 1 ...
##  $ Average Room Price          : num [1:3141] 3150 5280 16200 9499 7675 ...
## 
##  Dataset Summary:
##        ID        Hotel Name            City           Number of Ratings
##  Min.   :   1   Length:3141        Length:3141        Min.   :   1     
##  1st Qu.: 786   Class :character   Class :character   1st Qu.:  47     
##  Median :1571   Mode  :character   Mode  :character   Median : 114     
##  Mean   :1571                                         Mean   : 268     
##  3rd Qu.:2356                                         3rd Qu.: 283     
##  Max.   :3141                                         Max.   :6180     
##  Distance from Center Categorized Dist from Centre   Metro        
##  Min.   : 0.100       Length:3141                  Mode :logical  
##  1st Qu.: 2.200       Class :character             FALSE:1562     
##  Median : 4.300       Mode  :character             TRUE :1579     
##  Mean   : 6.644                                                   
##  3rd Qu.: 9.800                                                   
##  Max.   :47.200                                                   
##      Staff          Facilities      Cleanliness     Value for Money 
##  Min.   : 2.500   Min.   : 2.500   Min.   : 2.500   Min.   : 2.500  
##  1st Qu.: 7.600   1st Qu.: 6.900   1st Qu.: 7.200   1st Qu.: 7.000  
##  Median : 8.200   Median : 7.600   Median : 7.900   Median : 7.600  
##  Mean   : 8.101   Mean   : 7.486   Mean   : 7.803   Mean   : 7.543  
##  3rd Qu.: 8.700   3rd Qu.: 8.200   3rd Qu.: 8.600   3rd Qu.: 8.200  
##  Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000  
##     Location       Free Wi-Fi        Comfort       Overall Rating  
##  Min.   : 2.50   Min.   : 2.500   Min.   : 2.500   Min.   : 1.000  
##  1st Qu.: 7.60   1st Qu.: 6.300   1st Qu.: 7.200   1st Qu.: 6.800  
##  Median : 8.20   Median : 7.500   Median : 7.900   Median : 7.500  
##  Mean   : 8.09   Mean   : 7.181   Mean   : 7.806   Mean   : 7.385  
##  3rd Qu.: 8.70   3rd Qu.: 8.300   3rd Qu.: 8.500   3rd Qu.: 8.100  
##  Max.   :10.00   Max.   :10.000   Max.   :10.000   Max.   :10.000  
##  24-hour front desk 24-hour security cctv outside property cctv in common areas
##  Min.   :0.0000     Min.   :0.0000   Min.   :0.0000        Min.   :0.0000      
##  1st Qu.:1.0000     1st Qu.:0.0000   1st Qu.:0.0000        1st Qu.:0.0000      
##  Median :1.0000     Median :1.0000   Median :1.0000        Median :1.0000      
##  Mean   :0.8835     Mean   :0.5336   Mean   :0.5062        Mean   :0.5635      
##  3rd Qu.:1.0000     3rd Qu.:1.0000   3rd Qu.:1.0000        3rd Qu.:1.0000      
##  Max.   :1.0000     Max.   :1.0000   Max.   :1.0000        Max.   :1.0000      
##   room service     family rooms    luggage storage  non-smoking rooms
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.0000   
##  Median :1.0000   Median :0.0000   Median :1.0000   Median :1.0000   
##  Mean   :0.5989   Mean   :0.4709   Mean   :0.5514   Mean   :0.7641   
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   
##  flat-screen tv   air conditioning      fan             shower      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :1.0000   Median :0.0000   Median :1.0000  
##  Mean   :0.4957   Mean   :0.9319   Mean   :0.4788   Mean   :0.5466  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  free toiletries      towels        toilet paper    daily housekeeping
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000    
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.0000    
##  Median :1.0000   Median :1.0000   Median :0.0000   Median :1.0000    
##  Mean   :0.5002   Mean   :0.5043   Mean   :0.4862   Mean   :0.8134    
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000    
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000    
##  ironing service     laundry       Average Room Price
##  Min.   :0.0000   Min.   :0.0000   Min.   :  142     
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 1700     
##  Median :1.0000   Median :1.0000   Median : 2657     
##  Mean   :0.5422   Mean   :0.5896   Mean   : 3514     
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.: 4275     
##  Max.   :1.0000   Max.   :1.0000   Max.   :35700

EXAM

#### 1. Descriptive Statistics ####

Do an Exploratory Data Analysis (EDA) and provide appropriate summary statistics and data visualizations to understand the various types of hotels. You are to consider such things as existence of the features (which hotels has which feature and which hotels do not), distribution of ratings, and average room price. Investigate (without using hypothesis testing) if there is any difference across cities, regions, metros, or non-metros. The overall goal is to understand the Overall Rating according to various types of hotels with different facilities. Also note that not all hotels have the exact same number of ratings given to it. Give a description of what you have learned from this EDA..

## Total hotels: 3141
## Variables: 34
## Price Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     142    1700    2657    3514    4275   35700
## 
## Rating Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   6.800   7.500   7.385   8.100  10.000

Answer: price range: 142-35700 rupees, average room price is 3514 rupees. Median is at 2657. Rating average is around 7.39/10, median is at 7.5. ratings show constistent rating across the industry.

#### EDA analysis ####

Basic plots:

## 
## --- Metro vers Non-Metro Comparison ---
## # A tibble: 2 × 4
##   Metro Count Avg_Price Avg_Rating
##   <lgl> <int>     <dbl>      <dbl>
## 1 FALSE  1562     3403.       7.66
## 2 TRUE   1579     3624.       7.12

Answer: Metro hotels have higher average prices “3624” than non-metro hotels “3403”.

Answer: The distribution of room prices show that the prices are mostly at around 2000-4000 range with the long tail of right skewedness which tells us that there are some extremely expensive hotels. The rating distribution shows that the histogram is left-skewed. Many hotels have a decent rating at around 7-8. Non-Metro hotels offer better value, since they’re cheaper AND have better ratings. Metro hotels charge more but deliver worse guest satisfaction. Most likely the Non-Metro hotels need to compensate for the lack of metro, hence the ratings are higher.

Top cities analysis

## 
## - Top 10 Cities by Hotel Count -
## # A tibble: 10 × 2
##    City      Count
##    <chr>     <int>
##  1 Bangalore   421
##  2 Delhi       392
##  3 Mumbai      350
##  4 Chennai     276
##  5 Jaipur      257
##  6 Udaipur     143
##  7 Cochin      142
##  8 Kolkata     140
##  9 Pune        125
## 10 Amristar    114

Answer: Top 3 Cities are Bangalore, Delhi, Mumbai. They dominate with the most hotels. It seems like top 5 cities account for over 50% of all hotels.

## 
## --- Common Hotel Features ---
## air conditioning : 2927 hotels ( 93.2 %) have this feature 
## 24-hour front desk : 2775 hotels ( 88.3 %) have this feature 
## free toiletries : 1571 hotels ( 50 %) have this feature 
## daily housekeeping : 2555 hotels ( 81.3 %) have this feature

Answer: 93% of hotels have air conditioner. It seems to be a nearly universal standard. 24-hour front desk is common in 88% of the hotels, which is an important service to have. Daily housekeeping is also very common and is at 81% of hotels. Free toiletries are not very common but exist in 50% of the hotels.

Summary: Based on the EDA of the hotel data set, several key patterns emerge about the Indian hotel industry. The data covers 3,141 hotels with room prices ranging widely from 142 rupees to 35,700 rupees, though most hotels fall in the affordable to mid-range category with an average price of 3,514 rupees. Interestingly while metro hotels tend to be slightly more expensive at 3,624 rupees compared to non-metro hotels at 3,403 rupees, they actually receive lower average ratings (7.12 vs 7.66 out of 10). This might be because travelers find better value in non-metro areas I also think that since the rent and land value is usually high in those areas, the rooms might be smaller. Also the areas with metros are noisier, so it might also contribute to the comfort score. In my opinion it might be that the Non-Metro hotels need to compensate for the lack of metro to attract tourists hence the higher ratings. The hotel distribution is concentrated in major cities like Bangalore Delhi and Mumbai, which together account for over a third of all hotels. When looking at hotel features, air conditioning is almost always present - 93% of hotels have it, followed by 24-hour front desk service at 88% and daily housekeeping at 81%. However only half of the hotels provide free toiletries, making this a key differentiator rather than a part of standard package. Overall the data reveals that price doesn’t always correlate with guest satisfaction, and non-metro hotels appear to offer better overall value to travelers.

### Correlation analysis ###

## 
## 
## === CORRELATION ANALYSIS ===
## Correlation Matrix:
##                    Average Room Price Overall Rating Location Cleanliness
## Average Room Price              1.000          0.252    0.284       0.307
## Overall Rating                  0.252          1.000    0.735       0.933
## Location                        0.284          0.735    1.000       0.706
## Cleanliness                     0.307          0.933    0.706       1.000
## Comfort                         0.317          0.949    0.739       0.957
## Value for Money                 0.105          0.935    0.703       0.910
## Staff                           0.213          0.929    0.725       0.902
## Facilities                      0.293          0.953    0.736       0.952
##                    Comfort Value for Money Staff Facilities
## Average Room Price   0.317           0.105 0.213      0.293
## Overall Rating       0.949           0.935 0.929      0.953
## Location             0.739           0.703 0.725      0.736
## Cleanliness          0.957           0.910 0.902      0.952
## Comfort              1.000           0.920 0.905      0.971
## Value for Money      0.920           1.000 0.916      0.935
## Staff                0.905           0.916 1.000      0.916
## Facilities           0.971           0.935 0.916      1.000

The HeatMap

Answer: Looking at the correlation results there are some interesting patterns that emerge. Room price doesn’t really have strong connections with any of the ratings, the best ones are comfort and cleanliness but even those are only around 0.3 which is pretty weak. Location is a bit better at 0.28 for predicting price which makes sense cause good locations cost more. The odd thing is value for money has almost no relationship with price at 0.11 which means expensive hotels aren’t necessarily seen as better value. Though what really stands out is how all the rating categories are basically saying the same thing. The correlations between overall rating and comfort, facilities, and value for money are all above 0.93 which is very high. Basically, if you know a hotels comfort score you may already know its overall rating too. Same with cleanliness and staff ratings they’re all very connected. Comfort and cleanliness have the strongest correlations with price, with location also moderately related And for overall rating you could use pretty much any of the specific ratings since they’re all very similar. The low correlation between price and value for money is surprising though made me wonder if paying more actually gets you better value or not.

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Anaswer: After reviewing both the correlation values and the scatter plots, several important patterns became more apparent. The numerical correlations showed that room price has only weak relationships with the rating variables, and the scatter plots supported this result. The Location vs Room Price plot showed a very weak trend, where the points were somewhat close to the line at lower location ratings but became widely scattered as the rating increased. This suggests that location does not reliably explain differences in price. The Distance from Center vs Room Price graph showed even less structure, with most hotels clustered close to the center and almost no points further out. The trend line was nearly flat, indicating that distance from the center is not a meaningful predictor of room price in this dataset.In contrast, the relationships between the rating variables were much stronger. The scatter plot for Comfort and Cleanliness showed a clear, almost linear pattern, meaning these two factors tend to rise and fall together. This matches the very high correlations observed among most of the rating categories, including their strong connections to the Overall Rating. These results suggest that the individual ratings largely reflect the same general perception of hotel quality. Another notable pattern was that Value for Money had a strong positive relationship with Overall Rating, even though it showed almost no relationship with price. This indicates that guests judge value based more on the overall experience rather than how expensive the hotel is. Overall, the analysis shows that rating variables are highly consistent with one another, while room price is only weakly related to location or distance. The scatter plots helped confirm which correlations were truly meaningful and which ones were misleading when looking at the numbers alone.

#### 2. CONTINGENCY TABLE ####

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(hotel_data$`air conditioning`, hotel_data$`24-hour front desk`)
## X-squared = 145.02, df = 1, p-value < 2.2e-16
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(hotel_data$`air conditioning`, hotel_data$Metro_Status)
## X-squared = 14.709, df = 1, p-value = 0.0001255
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(hotel_data$`24-hour front desk`, hotel_data$Metro_Status)
## X-squared = 11.501, df = 1, p-value = 0.0006957
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(hotel_data$`daily housekeeping`, hotel_data$Metro_Status)
## X-squared = 0.3108, df = 1, p-value = 0.5772

Answer: The chi-square tests show that most hotel features are not independent of each other. For AC and 24-hour front desk, the p-value is extremely small (2.12e-33), which means there’s a very strong relationship between them – hotels that have AC almost always also have a 24-hour front desk. When looking at AC versus metro status, the p-value is 0.000125, which is still significant, showing that metro hotels are more likely to have AC than non-metro hotels. The same pattern appears for 24-hour front desk and metro status, with a p-value of 0.000695 – metro hotels tend to offer this service more often. However, daily housekeeping is different. The p-value is 0.577, which is much higher than 0.05, meaning there’s no real relationship between metro status and whether hotels offer daily housekeeping. Both metro and non-metro hotels provide this service at similar rates. So basically, luxury features like AC and 24-hour service are more common in metro hotels, but basic services like daily housekeeping are equally available everywhere.

###Inferential statistics###

## Comparing Bangalore and Delhi
## 
## --- Room Price Comparison ---
## 
##  Welch Two Sample t-test
## 
## data:  city1_data$`Average Room Price` and city2_data$`Average Room Price`
## t = 1.4674, df = 801.39, p-value = 0.1426
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -104.4343  723.0355
## sample estimates:
## mean of x mean of y 
##   3663.04   3353.74
## 
## --- Overall Rating Comparison ---
## 
##  Welch Two Sample t-test
## 
## data:  city1_data$`Overall Rating` and city2_data$`Overall Rating`
## t = 0.35228, df = 810.61, p-value = 0.7247
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1109360  0.1594648
## sample estimates:
## mean of x mean of y 
##  7.337530  7.313265
## 
## --- 95% Confidence Intervals ---
## Bangalore Price CI: 3349.534 3976.547
## Delhi Price CI: 3082.829 3624.651
## Price Difference CI: -104.4343 723.0355

Answer: Comparing Bangalore and Delhi hotels shows some interesting results. For room prices, Bangalore hotels average 3,663 rupees while Delhi hotels average 3,353 rupees, so Bangalore is about 300 rupees more expensive on average. However, the t-test gives a p-value of 0.1426 which is higher than 0.05, meaning this price difference isnt statistically significant - it could just be due to random chance in our sample. The confidence interval for the price difference goes from -104 to 723 rupees, which includes zero, confirming that we cant be sure there’s a real price difference between the two cities. For overall ratings, both cities are very similar - Bangalore averages 7.34 and Delhi 7.31 out of 10. The p-value here is 0.7247, which is much higher than 0.05, so theres definitely no significant difference in ratings between the two cities. Basically, even though Bangalore looks more expensive on paper, we cant say for sure that it actually is more expensive than Delhi, and both cities have statistically identical guest satisfaction levels.

HYPOTHESIS TEST CONCLUSION

## No statistically significant difference in average room prices between Bangalore and Delhi (p = 0.1426 )
## No statistically significant difference in overall ratings between Bangalore and Delhi (p = 0.7247 )

Answer: The hypothesis tests comparing Bangalore and Delhi hotels show that there are no statistically significant differences between the two cities. For room prices, the p-value of 0.1426 is much higher than the 0.05 significance level, meaning we cannot reject the null hypothesis that both cities have the same average room prices. Even though Bangalore appears more expensive on paper (3,663 vs 3,353 rupees), this difference could easily occur by random chance. Similarly for overall ratings, the p-value of 0.7247 provides strong evidence that guest satisfaction levels are statistically identical between Bangalore and Delhi. Both cities have average ratings around 7.3 out of 10, and the tiny difference between them is not meaningful. The confidence intervals support these conclusions - the price difference interval includes zero, confirming we can’t be sure which city is truly more expensive. From a statistical perspective, Bangalore and Delhi hotels are essentially equivalent in both pricing and quality from the guest’s viewpoint.

### ANOVA ###

## === ANOVA ANALYSIS ===
## Two-Way ANOVA: Price ~ Metro * Distance Category
##                             Df    Sum Sq  Mean Sq F value   Pr(>F)    
## Metro_Status                 1 3.823e+07 38234341   4.242   0.0395 *  
## Distance_Cat                 4 8.377e+07 20941388   2.323   0.0544 .  
## Metro_Status:Distance_Cat    4 2.174e+08 54344337   6.029 7.87e-05 ***
## Residuals                 3131 2.822e+10  9013760                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## --- Interaction Effect ---
## Significant interaction effect (p = 1e-04 )
## The effect of distance on price differs between metro and non-metro hotels.
## 
## --- Main Effects ---
## Metro status significantly affects room prices (p = 0.0395 )
## Distance from city center does not significantly affect room prices (p = 0.0544 )

Answer: The two-way ANOVA results show some interesting findings about what affects hotel prices. Metro status does have a significant effect on room prices with a p-value of 0.0395, which is just below the 0.05 threshold, meaning metro hotels are statistically more expensive than non-metro hotels. Distance from city center by itself doesn’t quite reach significance with a p-value of 0.0544, which is right on the edge. This suggests that distance alone might not be a strong predictor of price across all hotels. However, the most important finding is the highly significant interaction effect between metro status and distance category (p = 0.0000787). This means that how distance affects price is different for metro hotels versus non-metro hotels. In other words, the relationship between distance and price changes depending on whether you’re in a metro city or not. So you cant just look at distance or metro status alone - you have to consider them together because they interact with each other in determining hotel prices. The effect of being far from the city center might be much stronger in metro areas than in non-metro areas, or vice versa.

## Full Model Summary:
## 
## Call:
## lm(formula = `Average Room Price` ~ `Overall Rating` + Location + 
##     `Value for Money` + Comfort, data = regression_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6428.3 -1367.4  -482.5   651.5 27939.3 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -3405.01     452.29  -7.528 6.67e-14 ***
## `Overall Rating`    746.88     137.62   5.427 6.16e-08 ***
## Location            548.85      75.65   7.256 5.02e-13 ***
## `Value for Money` -3933.01     123.15 -31.937  < 2e-16 ***
## Comfort            3411.39     141.49  24.111  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2437 on 3136 degrees of freedom
## Multiple R-squared:  0.3477, Adjusted R-squared:  0.3469 
## F-statistic: 417.9 on 4 and 3136 DF,  p-value: < 2.2e-16
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Final Model after Stepwise Selection:
## 
## Call:
## lm(formula = `Average Room Price` ~ `Overall Rating` + Location + 
##     `Value for Money` + Comfort, data = regression_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6428.3 -1367.4  -482.5   651.5 27939.3 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -3405.01     452.29  -7.528 6.67e-14 ***
## `Overall Rating`    746.88     137.62   5.427 6.16e-08 ***
## Location            548.85      75.65   7.256 5.02e-13 ***
## `Value for Money` -3933.01     123.15 -31.937  < 2e-16 ***
## Comfort            3411.39     141.49  24.111  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2437 on 3136 degrees of freedom
## Multiple R-squared:  0.3477, Adjusted R-squared:  0.3469 
## F-statistic: 417.9 on 4 and 3136 DF,  p-value: < 2.2e-16

Answer: The regression results reveal some surprising and counterintuitive relationships. The full model explains about 34.8% of the variation in hotel prices (R-squared = 0.3477), which is much higher than expected. All variables are statistically significant, but the coefficient signs don’t make intuitive sense. Location shows the expected positive relationship - for each 1-point increase in location rating, prices increase by about 549 rupees, which aligns with our correlation findings. However, Value for Money has a strongly negative coefficient (-3933), meaning hotels with higher value ratings actually have lower prices. This contradicts the common assumption that better value means higher prices, but actually supports what guests perceive as good value - more affordable hotels. Comfort has a large positive effect (+3411), showing that comfortable hotels command premium prices. The most puzzling finding is that both Overall Rating and Comfort have large positive effects while Value for Money has a large negative effect, suggesting these ratings capture different aspects of the hotel experience. The stepwise selection kept all variables, indicating each contributes unique information. While the model has good explanatory power, the counterintuitive signs suggest complex relationships between ratings and pricing that don’t follow simple logic.