HOTEL BOOKINGS DEMAND ANALYSIS

DATA ANALYST: KAPPO STEPHEN

INTRODUCTION

The hotel booking data set provides detailed information about hotel reservations from a variety of guests, spanning across two hotel types: City Hotels and Resort Hotels. This data set contains rich insights into guest behavior, booking patterns, and operational metrics. Key variables include:

Booking Lead Time (the number of days between booking and arrival),

Cancellation Rates,

Length of Stay,

Special Requests made by guests,

Guest Demographics (e.g., country of origin, group size),

Booking Channels, and

Seasonality (booking trends by time of year).

The purpose of this analysis is to uncover valuable trends and patterns, such as how booking behavior varies between different hotel types, what factors influence cancellations, and how external factors like seasonality and special requests impact overall hotel performance. Understanding these dynamics is essential for enhancing customer satisfaction, optimizing revenue, and improving hotel management strategies.

OBJECTIVE

  1. What time of year experiences the highest and lowest demand for bookings?

    (Objective: Analyze seasonal trends in hotel bookings.)

  2. Which type of hotel (city vs resort) tends to have a higher occupancy rate during peak seasons?

    (Objective: Compare demand between city hotels and resort hotels.)

  3. Does the length of stay impact the average daily rate (ADR) for bookings?

    (Objective: Investigate how the length of stay affects pricing.)

  4. What is the average lead time for bookings, and how does it vary by hotel type?

    (Objective: Analyze how far in advance people book different types of hotels.)

  5. How do cancellation rates vary by season and hotel type?

    (Objective: Identify any trends in booking cancellations and potential causes.)

DATA PREPARATION

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# LOAD THE DATASET
hotel_bookings <- read.csv("hotel_bookings.csv")

#STRUCTURE OF THE DATASET
str(hotel_bookings)
## 'data.frame':    119390 obs. of  32 variables:
##  $ hotel                         : chr  "Resort Hotel" "Resort Hotel" "Resort Hotel" "Resort Hotel" ...
##  $ is_canceled                   : int  0 0 0 0 0 0 0 0 1 1 ...
##  $ lead_time                     : int  342 737 7 13 14 14 0 9 85 75 ...
##  $ arrival_date_year             : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ arrival_date_month            : chr  "July" "July" "July" "July" ...
##  $ arrival_date_week_number      : int  27 27 27 27 27 27 27 27 27 27 ...
##  $ arrival_date_day_of_month     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ stays_in_weekend_nights       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ stays_in_week_nights          : int  0 0 1 1 2 2 2 2 3 3 ...
##  $ adults                        : int  2 2 1 1 2 2 2 2 2 2 ...
##  $ children                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "BB" "BB" "BB" "BB" ...
##  $ country                       : chr  "PRT" "PRT" "GBR" "GBR" ...
##  $ market_segment                : chr  "Direct" "Direct" "Direct" "Corporate" ...
##  $ distribution_channel          : chr  "Direct" "Direct" "Direct" "Corporate" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "C" "C" "A" "A" ...
##  $ assigned_room_type            : chr  "C" "C" "C" "A" ...
##  $ booking_changes               : int  3 4 0 0 0 0 0 0 0 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
##  $ agent                         : chr  "NULL" "NULL" "NULL" "304" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient" "Transient" ...
##  $ adr                           : num  0 0 75 75 98 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 0 0 0 1 1 0 1 1 0 ...
##  $ reservation_status            : chr  "Check-Out" "Check-Out" "Check-Out" "Check-Out" ...
##  $ reservation_status_date       : chr  "2015-07-01" "2015-07-01" "2015-07-02" "2015-07-02" ...
#SUMMARY OF THE DATASET
summary(hotel_bookings)
##     hotel            is_canceled       lead_time   arrival_date_year
##  Length:119390      Min.   :0.0000   Min.   :  0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 69   Median :2016     
##                     Mean   :0.3704   Mean   :104   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :737   Max.   :2017     
##                                                                     
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:119390      Min.   : 1.00            Min.   : 1.0             
##  Class :character   1st Qu.:16.00            1st Qu.: 8.0             
##  Mode  :character   Median :28.00            Median :16.0             
##                     Mean   :27.17            Mean   :15.8             
##                     3rd Qu.:38.00            3rd Qu.:23.0             
##                     Max.   :53.00            Max.   :31.0             
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults      
##  Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
##  1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
##  Median : 1.0000         Median : 2.0         Median : 2.000  
##  Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
##  3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
##  Max.   :19.0000         Max.   :50.0         Max.   :55.000  
##                                                               
##     children           babies              meal             country         
##  Min.   : 0.0000   Min.   : 0.000000   Length:119390      Length:119390     
##  1st Qu.: 0.0000   1st Qu.: 0.000000   Class :character   Class :character  
##  Median : 0.0000   Median : 0.000000   Mode  :character   Mode  :character  
##  Mean   : 0.1039   Mean   : 0.007949                                        
##  3rd Qu.: 0.0000   3rd Qu.: 0.000000                                        
##  Max.   :10.0000   Max.   :10.000000                                        
##  NA's   :4                                                                  
##  market_segment     distribution_channel is_repeated_guest
##  Length:119390      Length:119390        Min.   :0.00000  
##  Class :character   Class :character     1st Qu.:0.00000  
##  Mode  :character   Mode  :character     Median :0.00000  
##                                          Mean   :0.03191  
##                                          3rd Qu.:0.00000  
##                                          Max.   :1.00000  
##                                                           
##  previous_cancellations previous_bookings_not_canceled reserved_room_type
##  Min.   : 0.00000       Min.   : 0.0000                Length:119390     
##  1st Qu.: 0.00000       1st Qu.: 0.0000                Class :character  
##  Median : 0.00000       Median : 0.0000                Mode  :character  
##  Mean   : 0.08712       Mean   : 0.1371                                  
##  3rd Qu.: 0.00000       3rd Qu.: 0.0000                                  
##  Max.   :26.00000       Max.   :72.0000                                  
##                                                                          
##  assigned_room_type booking_changes   deposit_type          agent          
##  Length:119390      Min.   : 0.0000   Length:119390      Length:119390     
##  Class :character   1st Qu.: 0.0000   Class :character   Class :character  
##  Mode  :character   Median : 0.0000   Mode  :character   Mode  :character  
##                     Mean   : 0.2211                                        
##                     3rd Qu.: 0.0000                                        
##                     Max.   :21.0000                                        
##                                                                            
##    company          days_in_waiting_list customer_type           adr         
##  Length:119390      Min.   :  0.000      Length:119390      Min.   :  -6.38  
##  Class :character   1st Qu.:  0.000      Class :character   1st Qu.:  69.29  
##  Mode  :character   Median :  0.000      Mode  :character   Median :  94.58  
##                     Mean   :  2.321                         Mean   : 101.83  
##                     3rd Qu.:  0.000                         3rd Qu.: 126.00  
##                     Max.   :391.000                         Max.   :5400.00  
##                                                                              
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:119390     
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06252             Mean   :0.5714                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :8.00000             Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:119390          
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 
  • FROM THE SUMMARY ABOVE : ADR (Average Daily Rate is having a negative min value of (-6.38)
  • However, it doesn’t make sense for an average daily rate to be negative in a hotel booking context, as this would imply the customer is being paid to stay at the hotel.

View Top 6 Rows From The Data Set

head(hotel_bookings)
##          hotel is_canceled lead_time arrival_date_year arrival_date_month
## 1 Resort Hotel           0       342              2015               July
## 2 Resort Hotel           0       737              2015               July
## 3 Resort Hotel           0         7              2015               July
## 4 Resort Hotel           0        13              2015               July
## 5 Resort Hotel           0        14              2015               July
## 6 Resort Hotel           0        14              2015               July
##   arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
## 1                       27                         1                       0
## 2                       27                         1                       0
## 3                       27                         1                       0
## 4                       27                         1                       0
## 5                       27                         1                       0
## 6                       27                         1                       0
##   stays_in_week_nights adults children babies meal country market_segment
## 1                    0      2        0      0   BB     PRT         Direct
## 2                    0      2        0      0   BB     PRT         Direct
## 3                    1      1        0      0   BB     GBR         Direct
## 4                    1      1        0      0   BB     GBR      Corporate
## 5                    2      2        0      0   BB     GBR      Online TA
## 6                    2      2        0      0   BB     GBR      Online TA
##   distribution_channel is_repeated_guest previous_cancellations
## 1               Direct                 0                      0
## 2               Direct                 0                      0
## 3               Direct                 0                      0
## 4            Corporate                 0                      0
## 5                TA/TO                 0                      0
## 6                TA/TO                 0                      0
##   previous_bookings_not_canceled reserved_room_type assigned_room_type
## 1                              0                  C                  C
## 2                              0                  C                  C
## 3                              0                  A                  C
## 4                              0                  A                  A
## 5                              0                  A                  A
## 6                              0                  A                  A
##   booking_changes deposit_type agent company days_in_waiting_list customer_type
## 1               3   No Deposit  NULL    NULL                    0     Transient
## 2               4   No Deposit  NULL    NULL                    0     Transient
## 3               0   No Deposit  NULL    NULL                    0     Transient
## 4               0   No Deposit   304    NULL                    0     Transient
## 5               0   No Deposit   240    NULL                    0     Transient
## 6               0   No Deposit   240    NULL                    0     Transient
##   adr required_car_parking_spaces total_of_special_requests reservation_status
## 1   0                           0                         0          Check-Out
## 2   0                           0                         0          Check-Out
## 3  75                           0                         0          Check-Out
## 4  75                           0                         0          Check-Out
## 5  98                           0                         1          Check-Out
## 6  98                           0                         1          Check-Out
##   reservation_status_date
## 1              2015-07-01
## 2              2015-07-01
## 3              2015-07-02
## 4              2015-07-02
## 5              2015-07-03
## 6              2015-07-03

Checking for Null Values

Checking for the total number of null values in each of the columns

colSums(is.na(hotel_bookings))
##                          hotel                    is_canceled 
##                              0                              0 
##                      lead_time              arrival_date_year 
##                              0                              0 
##             arrival_date_month       arrival_date_week_number 
##                              0                              0 
##      arrival_date_day_of_month        stays_in_weekend_nights 
##                              0                              0 
##           stays_in_week_nights                         adults 
##                              0                              0 
##                       children                         babies 
##                              4                              0 
##                           meal                        country 
##                              0                              0 
##                 market_segment           distribution_channel 
##                              0                              0 
##              is_repeated_guest         previous_cancellations 
##                              0                              0 
## previous_bookings_not_canceled             reserved_room_type 
##                              0                              0 
##             assigned_room_type                booking_changes 
##                              0                              0 
##                   deposit_type                          agent 
##                              0                              0 
##                        company           days_in_waiting_list 
##                              0                              0 
##                  customer_type                            adr 
##                              0                              0 
##    required_car_parking_spaces      total_of_special_requests 
##                              0                              0 
##             reservation_status        reservation_status_date 
##                              0                              0

Replacing NA’s with the Mean Values of Children

hotel_bookings$children[is.na(hotel_bookings$children)] = round(mean(hotel_bookings$children, na.rm = TRUE), 0)

sum(is.na(hotel_bookings$children)) # TO CHECK OR COUNT THE NUMBER OF MISSING VALUES IN CHILDREN COLUMN JUST TO BE SURE OUR CODE ABOVE WORKED.
## [1] 0

Replacing Null values in Agent & Company Column with Zero

NB : the agent and company values are just IDs replacing with 0 won’t create any problem.

hotel_bookings$agent[hotel_bookings$agent == "NULL"] <- 0


hotel_bookings$company[hotel_bookings$company == "NULL"] <- 0

Replacing Null values with Maximum repeating Country

table(hotel_bookings$country) # From the outcome of this code, it shows that country with the code "PRT" i.e PORTUGAL appears the most with a maximum number of 48,590
## 
##   ABW   AGO   AIA   ALB   AND   ARE   ARG   ARM   ASM   ATA   ATF   AUS   AUT 
##     2   362     1    12     7    51   214     8     1     2     1   426  1263 
##   AZE   BDI   BEL   BEN   BFA   BGD   BGR   BHR   BHS   BIH   BLR   BOL   BRA 
##    17     1  2342     3     1    12    75     5     1    13    26    10  2224 
##   BRB   BWA   CAF   CHE   CHL   CHN   CIV   CMR    CN   COL   COM   CPV   CRI 
##     4     1     5  1730    65   999     6    10  1279    71     2    24    19 
##   CUB   CYM   CYP   CZE   DEU   DJI   DMA   DNK   DOM   DZA   ECU   EGY   ESP 
##     8     1    51   171  7287     1     1   435    14   103    27    32  8568 
##   EST   ETH   FIN   FJI   FRA   FRO   GAB   GBR   GEO   GGY   GHA   GIB   GLP 
##    83     3   447     1 10415     5     4 12129    22     3     4    18     2 
##   GNB   GRC   GTM   GUY   HKG   HND   HRV   HUN   IDN   IMN   IND   IRL   IRN 
##     9   128     4     1    29     1   100   230    35     2   152  3375    83 
##   IRQ   ISL   ISR   ITA   JAM   JEY   JOR   JPN   KAZ   KEN   KHM   KIR   KNA 
##    14    57   669  3766     6     8    21   197    19     6     2     1     2 
##   KOR   KWT   LAO   LBN   LBY   LCA   LIE   LKA   LTU   LUX   LVA   MAC   MAR 
##   133    16     2    31     8     1     3     7    81   287    55    16   259 
##   MCO   MDG   MDV   MEX   MKD   MLI   MLT   MMR   MNE   MOZ   MRT   MUS   MWI 
##     4     1    12    85    10     1    18     1     5    67     1     7     2 
##   MYS   MYT   NAM   NCL   NGA   NIC   NLD   NOR   NPL  NULL   NZL   OMN   PAK 
##    28     2     1     1    34     1  2104   607     1   488    74    18    14 
##   PAN   PER   PHL   PLW   POL   PRI   PRT   PRY   PYF   QAT   ROU   RUS   RWA 
##     9    29    40     1   919    12 48590     4     1    15   500   632     2 
##   SAU   SDN   SEN   SGP   SLE   SLV   SMR   SRB   STP   SUR   SVK   SVN   SWE 
##    48     1    11    39     1     2     1   101     2     5    65    57  1024 
##   SYC   SYR   TGO   THA   TJK   TMP   TUN   TUR   TWN   TZA   UGA   UKR   UMI 
##     2     3     2    59     9     3    39   248    51     5     2    68     1 
##   URY   USA   UZB   VEN   VGB   VNM   ZAF   ZMB   ZWE 
##    32  2097     4    26     1     8    80     2     4
hotel_bookings$country[hotel_bookings$country == "NULL"] <- "PRT"
hotel_bookings <- hotel_bookings[hotel_bookings$adr >= 0, ] 
# This method works by subsetting the dataset to exclude rows where the condition (adr >= 0) is false.

Adding a new Variable Called “Occupancy” and also “Length_of_stay”

hotel_bookings <- mutate(hotel_bookings, 
                         Occupancy = adults + children + babies,
                         length_of_stay = stays_in_week_nights + stays_in_weekend_nights)


str(hotel_bookings)
## 'data.frame':    119389 obs. of  34 variables:
##  $ hotel                         : chr  "Resort Hotel" "Resort Hotel" "Resort Hotel" "Resort Hotel" ...
##  $ is_canceled                   : int  0 0 0 0 0 0 0 0 1 1 ...
##  $ lead_time                     : int  342 737 7 13 14 14 0 9 85 75 ...
##  $ arrival_date_year             : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ arrival_date_month            : chr  "July" "July" "July" "July" ...
##  $ arrival_date_week_number      : int  27 27 27 27 27 27 27 27 27 27 ...
##  $ arrival_date_day_of_month     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ stays_in_weekend_nights       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ stays_in_week_nights          : int  0 0 1 1 2 2 2 2 3 3 ...
##  $ adults                        : int  2 2 1 1 2 2 2 2 2 2 ...
##  $ children                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "BB" "BB" "BB" "BB" ...
##  $ country                       : chr  "PRT" "PRT" "GBR" "GBR" ...
##  $ market_segment                : chr  "Direct" "Direct" "Direct" "Corporate" ...
##  $ distribution_channel          : chr  "Direct" "Direct" "Direct" "Corporate" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "C" "C" "A" "A" ...
##  $ assigned_room_type            : chr  "C" "C" "C" "A" ...
##  $ booking_changes               : int  3 4 0 0 0 0 0 0 0 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
##  $ agent                         : chr  "0" "0" "0" "304" ...
##  $ company                       : chr  "0" "0" "0" "0" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient" "Transient" ...
##  $ adr                           : num  0 0 75 75 98 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 0 0 0 1 1 0 1 1 0 ...
##  $ reservation_status            : chr  "Check-Out" "Check-Out" "Check-Out" "Check-Out" ...
##  $ reservation_status_date       : chr  "2015-07-01" "2015-07-01" "2015-07-02" "2015-07-02" ...
##  $ Occupancy                     : num  2 2 1 1 2 2 2 2 2 2 ...
##  $ length_of_stay                : int  0 0 1 1 2 2 2 2 3 3 ...
# The mutate function in R (from the dplyr package) adds a new variable to a dataset. In this case, you're creating a new variable total_guests, which is the sum of the adults, children, and babies columns. Once the mutation is applied, str(hotel_data) will display the structure of the updated dataset.
table(hotel_bookings$Occupancy)
## 
##     0     1     2     3     4     5     6    10    12    20    26    27    40 
##   180 22581 82050 10495  3929   137     1     2     2     2     5     2     1 
##    50    55 
##     1     1
hotel_bookings <- filter(hotel_bookings, is_canceled > 0, Occupancy > 0)



#For each booking, there should be at-least 1 guest, here guests are of three categories: Adults, Children, Babies We have combined all three and added a new column called “total_guests”.

#But in data we have around 180 rows which have 0 guests. So we are removing those rows directly, as booking without guests doesn’t make sense.

EXPLORATORY DATA ANALYSIS

1. Time of year that experiences the highest and lowest demand for bookings

INSIGHT: The data shows that the highest demand for bookings occurs in July and August, while the lowest demand is in January and December.

month_levels <- c('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')

hotel_bookings$arrival_date_month <- factor(hotel_bookings$arrival_date_month, levels = month_levels, ordered = T )

months_count <- table(hotel_bookings$arrival_date_month)

barplot(months_count,
        main = "Seasonal Trend in Hotel Booking",
        xlab = "Months",
        ylab = "Record Count",
        col = "skyblue")

ggplot(data = hotel_bookings, mapping = aes(x = arrival_date_month)) +
  geom_bar(fill = "brown") +
  labs(title = "Seasonal Trend in Hotel Booking", x = "Months", y = "Record Count of Months") +
  theme_minimal()

2. Which type of hotel (city vs resort) tends to have a higher occupancy rate during peak seasons

INSIGHT: City Hotels tend to have a higher occupancy rate during peak seasons, with 69.01% compared to Resort Hotels at 30.73%.

peak_season_data <- filter(hotel_bookings, arrival_date_month %in% c("July", "August"))

occupancy_rates <- peak_season_data %>% 
  group_by(hotel) %>% 
  summarise(total_bookings = n(),
            occupancy_rate = (total_bookings / 10000) * 100)

occupancy_rates
## # A tibble: 2 × 3
##   hotel        total_bookings occupancy_rate
##   <chr>                 <int>          <dbl>
## 1 City Hotel             6901           69.0
## 2 Resort Hotel           3073           30.7
barplot(height = occupancy_rates$occupancy_rate, 
        names.arg = occupancy_rates$hotel,
        main = "Occupancy Rates by Hotel Type during Peak Season",
        xlab = "Hotel Type",
        ylab = "Occupancy Rate (%)",
        col = c("maroon", "darkblue"),
        ylim = c(0, 100))

3. Length of stay vs Average Daily Rate (ADR) for bookings

INSIGHT: The data shows that ADR tends to fluctuate with the length of stay, with some interesting patterns:

  • Short stays (1-2 nights) have lower ADRs

  • There’s a peak in ADR for 5-6 night stays

  • Longer stays (7+ nights) show a slight decrease in ADR

This suggests that hotels might be pricing their rooms to encourage certain lengths of stay.

cor(hotel_bookings$adr, hotel_bookings$length_of_stay)
## [1] 0.09188117
# Calculate Average Daily Rate for each stay

adr_by_stay <- hotel_bookings %>% 
  group_by(length_of_stay) %>% 
  summarise(avg_adr = mean(adr, na.rm = T))

plot(x = adr_by_stay$length_of_stay, y = adr_by_stay$avg_adr,
     main = "Average Daily Rate vs Length of Stay",
     xlab = "Length of Stay",
     ylab = "Average Daily Rate",
     xlim = c(0, 10),
     lty = 1, lwd = 2, pch = 20, col = 'darkblue')
abline(h = mean(adr_by_stay$avg_adr), col = "maroon")
lines(adr_by_stay$length_of_stay, adr_by_stay$avg_adr, col = "black")

4. Average lead time for bookings, and how it vary by hotel type

INSIGHT: The chart below indicates that guests tend to book City Hotels further in advance than Resort Hotels.

lead_time_by_hotel <- hotel_bookings %>% 
  group_by(hotel) %>% 
  summarise(avg_lead_time_by_hotel = mean(lead_time, na.rm = T))

barplot(height = lead_time_by_hotel$avg_lead_time_by_hotel, 
        names.arg = lead_time_by_hotel$hotel,
        main = "Average Lead Time by Hotel Type",
        xlab = "Hotel Type",
        ylab = "Average Lead Time",
        ylim = c(0, 200),
        col = c("darkred", "dodgerblue"))

5. How do cancellation rates vary by season and hotel type?

INSIGHT: With respect to the chart below

  1. City Hotels consistently have higher cancellation rates compared to Resort Hotels across all months.

  2. There are seasonal variations in cancellation rates for both hotel types.

  3. The highest cancellation rates for both hotel types seem to occur in the summer months (July and August).

    Potential causes for these trends:

  1. City Hotels might have more business travelers who are more likely to have changes in their plans.

  2. The higher cancellation rates in summer could be due to changes in vacation plans or overbooking during peak season.

  3. Resort Hotels might have lower cancellation rates due to stricter cancellation policies or because they are often booked for leisure travel, which might be less prone to last-minute changes

cancellation_rates <- hotel_bookings %>% 
  group_by(arrival_date_month, hotel) %>% 
  summarise(cancellation_rate = (sum(is_canceled) / 10000) * 100) 
## `summarise()` has grouped output by 'arrival_date_month'. You can override
## using the `.groups` argument.
barplot(height = cancellation_rates$cancellation_rate,
        names.arg = cancellation_rates$arrival_date_month,
        main = "Cancellation Rate by Season and Hotel Type",
        xlab = "Month",
        ylab = "Cancellation Rate (%)",
        ylim = c(0, 50),
        col = c("yellow4", "tomato"),
        legend.text = c("City Hotel", "Resort Hotel")
)