Initial setup and Configure the data set.
Load the data file in variable hotel_data
Data set - Hotels : This data comes from an open hotel booking demand data-set of hotels like City Hotel , Resort Hotel.

summary(hotel_data)
##     hotel            is_canceled       lead_time   arrival_date_year
##  Length:119390      Min.   :0.0000   Min.   :  0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 69   Median :2016     
##                     Mean   :0.3704   Mean   :104   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :737   Max.   :2017     
##                                                                     
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:119390      Min.   : 1.00            Min.   : 1.0             
##  Class :character   1st Qu.:16.00            1st Qu.: 8.0             
##  Mode  :character   Median :28.00            Median :16.0             
##                     Mean   :27.17            Mean   :15.8             
##                     3rd Qu.:38.00            3rd Qu.:23.0             
##                     Max.   :53.00            Max.   :31.0             
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults      
##  Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
##  1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
##  Median : 1.0000         Median : 2.0         Median : 2.000  
##  Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
##  3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
##  Max.   :19.0000         Max.   :50.0         Max.   :55.000  
##                                                               
##     children           babies              meal             country         
##  Min.   : 0.0000   Min.   : 0.000000   Length:119390      Length:119390     
##  1st Qu.: 0.0000   1st Qu.: 0.000000   Class :character   Class :character  
##  Median : 0.0000   Median : 0.000000   Mode  :character   Mode  :character  
##  Mean   : 0.1039   Mean   : 0.007949                                        
##  3rd Qu.: 0.0000   3rd Qu.: 0.000000                                        
##  Max.   :10.0000   Max.   :10.000000                                        
##  NA's   :4                                                                  
##  market_segment     distribution_channel is_repeated_guest
##  Length:119390      Length:119390        Min.   :0.00000  
##  Class :character   Class :character     1st Qu.:0.00000  
##  Mode  :character   Mode  :character     Median :0.00000  
##                                          Mean   :0.03191  
##                                          3rd Qu.:0.00000  
##                                          Max.   :1.00000  
##                                                           
##  previous_cancellations previous_bookings_not_canceled reserved_room_type
##  Min.   : 0.00000       Min.   : 0.0000                Length:119390     
##  1st Qu.: 0.00000       1st Qu.: 0.0000                Class :character  
##  Median : 0.00000       Median : 0.0000                Mode  :character  
##  Mean   : 0.08712       Mean   : 0.1371                                  
##  3rd Qu.: 0.00000       3rd Qu.: 0.0000                                  
##  Max.   :26.00000       Max.   :72.0000                                  
##                                                                          
##  assigned_room_type booking_changes   deposit_type          agent          
##  Length:119390      Min.   : 0.0000   Length:119390      Length:119390     
##  Class :character   1st Qu.: 0.0000   Class :character   Class :character  
##  Mode  :character   Median : 0.0000   Mode  :character   Mode  :character  
##                     Mean   : 0.2211                                        
##                     3rd Qu.: 0.0000                                        
##                     Max.   :21.0000                                        
##                                                                            
##    company          days_in_waiting_list customer_type           adr         
##  Length:119390      Min.   :  0.000      Length:119390      Min.   :  -6.38  
##  Class :character   1st Qu.:  0.000      Class :character   1st Qu.:  69.29  
##  Mode  :character   Median :  0.000      Mode  :character   Median :  94.58  
##                     Mean   :  2.321                         Mean   : 101.83  
##                     3rd Qu.:  0.000                         3rd Qu.: 126.00  
##                     Max.   :391.000                         Max.   :5400.00  
##                                                                              
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:119390     
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06252             Mean   :0.5714                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :8.00000             Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:119390          
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 
#head(hotel_data)
Question: A continuous (or ordered integer) column of data that seems most “valuable” given the context of your (Hotel dataset) data.
# Arranging hotel data accoring to adr(Average Daily Rate).
hotel_data <-hotel_data[order(hotel_data$adr,decreasing = TRUE),]
#Display top 10 row for columns hotel and adr.
head(hotel_data[,c("hotel","adr")],10)
##               hotel     adr
## 48516    City Hotel 5400.00
## 111404   City Hotel  510.00
## 15084  Resort Hotel  508.00
## 103913   City Hotel  451.50
## 13143  Resort Hotel  450.00
## 13392  Resort Hotel  437.00
## 39156  Resort Hotel  426.25
## 39569  Resort Hotel  402.00
## 39119  Resort Hotel  397.38
## 13324  Resort Hotel  392.00

In Hotel Dataset, Column “adr”(Average Daily Rate, is a numaric field) could be one of the key valuable continuous variables of the hotel room. adr - defined by dividing the sum of all lodging transactions by the total number of staying nights or the average paid amount for a room per night.It is key metric for observing the financial performance of a hotel. It is a key data point from the hotel dataset for hotel stakeholders like hotel managers, investigators etc. to evaluate revenue generation and pricing strategies for future business. Therefore, ADR “Average Daily Rate” can be considered a valuable continuous variable in hotel data analysis in my hotel dataset.

Question:Select a categorical column of data (explanatory variable) that you expect might influence the response variable.

In my hotel dataset, categorical column - reserved_room_type can be taken as explanatory variable that might influence the response variable - ard (Average Daily Rate).
Null Hypothesis : The average daily rate does not vary significantly across different reserved room types.

ANOVA test
#Determine the unique Category
hotel_data$List_category_room_type <- unique(hotel_data$reserved_room_type)


#Perform ANOVA Test with fitting function lm.
anova_result_table <-anova(lm(hotel_data$adr~hotel_data$List_category_room_type,hotel_data))

#Display Summary of Anova Tables
summary(anova_result_table)
##        Df             Sum Sq             Mean Sq          F value       
##  Min.   :     9   Min.   :     2021   Min.   : 224.6   Min.   :0.08794  
##  1st Qu.: 29852   1st Qu.: 76226891   1st Qu.: 807.0   1st Qu.:0.08794  
##  Median : 59695   Median :152451760   Median :1389.3   Median :0.08794  
##  Mean   : 59695   Mean   :152451760   Mean   :1389.3   Mean   :0.08794  
##  3rd Qu.: 89537   3rd Qu.:228676629   3rd Qu.:1971.7   3rd Qu.:0.08794  
##  Max.   :119380   Max.   :304901498   Max.   :2554.0   Max.   :0.08794  
##                                                        NA's   :1        
##      Pr(>F)      
##  Min.   :0.9998  
##  1st Qu.:0.9998  
##  Median :0.9998  
##  Mean   :0.9998  
##  3rd Qu.:0.9998  
##  Max.   :0.9998  
##  NA's   :1
#Perform AOV test
aov_result <- aov(adr ~ hotel_data$List_category_room_type, data = hotel_data)
summary_aov <- summary.aov(aov_result)
print(summary_aov)
##                                        Df    Sum Sq Mean Sq F value Pr(>F)
## hotel_data$List_category_room_type      9      2021   224.6   0.088      1
## Residuals                          119380 304901498  2554.0
#Box plot for adr and reserved_room_type
boxplot(hotel_data$adr ~ hotel_data$reserved_room_type, ylim = c(0, 350), col = "light gray", ylab = "ADR", xlab = "Reserved Room Type")

Interpretation
1. Determine the unique Category of Room type of any hotel.
2. Perform ANOVA Test with fitting function lm with column - adr and reserve_room_type.
3. Report: F (9,119380): 0.08794 (relatively Low) and P-Value: 0.9998 (which quite high > 0.05)
Conclusion: The P-value is very high approx. 1 (>0.05) which means the null hypothesis is rejected. Based on the above ANOVA test and boxplot, there is a significant difference in average daily rates among different room types. We can conclude that our Assumption - “reserved_room_type has an impact on ADR” is rejected.
Therefore, we reject the null hypothesis and conclude that the average daily rate varies significantly across different room types.

Question: Find a single continuous (or ordered integer, non-binary) column of data that might influence the response variable. Make sure the relationship between this variable and the response is roughly linear.

In my hotel dataset,One continuous Column “lead_time” might inflence the response varaible - “adr - Average Daily Rate”.Lead Time refers to number of days that elapsed between booking and arrival.It is reasonable to assume that there might be a linear relationship between lead time and ADR.For Example, customers who book well in advance might get better rates compared to those who book closer to their arrival date.

Linear regression model using column lead_time as the predictor the ADR as the response variable:
# lm : lm is used to fit linear models, including multivariate ones.It can be used to carry out regression,single stratum analysis of variance and analysis of covariance. (Refence - rstudio help library )
#Fit the linear regression model
lm_model_lead_time_adr <- lm(adr~lead_time,data = hotel_data)


# Summary of the model
summary(lm_model_lead_time_adr)
## 
## Call:
## lm(formula = adr ~ lead_time, data = hotel_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -105.5  -31.4   -7.2   23.9 5296.1 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 104.933697   0.203692  515.16   <2e-16 ***
## lead_time    -0.029829   0.001366  -21.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.44 on 119388 degrees of freedom
## Multiple R-squared:  0.003979,   Adjusted R-squared:  0.00397 
## F-statistic: 476.9 on 1 and 119388 DF,  p-value: < 2.2e-16
Interpret the coefficients:

Coefficient - Slop of the regression line and Interpret the coefficients is the value where line intercept the axix.
Y varaiable = dependent variable = ADR
X varaible = independent variable = Lead Time
✔The Intercept is 104.933697 ( estimated ADR) when lead time is zero which is unlikely hood in our hotel bookings case.
It might not have practical interpretation since it is unlikely to have a lead time zero.
✔ The Coefficients for lead time is -0.029829. It indicates the estimated change in the average daily rate for a room increase in lead time.
In our case,it suggests that for each additional day of lead time, the average daily rate decreases by $0.02983.

Recommendations:
  1. According to the above linear regression model outcome, It is recommended for optimizing their hotel revenue that they should implement dynamic pricing to offer better rates for those customers who book the room in advance.
  2. The linear regression model can also be used in forecasting future demand and this information can be used for formulating pricing strategies so that hotel revenue can be optimized.

Thank You.!!!