Scenario

In this analysis of the Hotel bookings Company wants to run the chain of promotional advertisement for the business on bookings from different distribution channels and customer preferences, on the basis of the insights company can easily target the customers and run the advertisment.

Loading the necessary libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)

Loading the data set assigning a data frame

hotel_bookings_df <- read_csv("hotel_bookings.csv")
## Rows: 119390 Columns: 32
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (13): hotel, arrival_date_month, meal, country, market_segment, distrib...
## dbl  (18): is_canceled, lead_time, arrival_date_year, arrival_date_week_numb...
## date  (1): reservation_status_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Preview

head(hotel_bookings_df)
## # A tibble: 6 × 32
##   hotel        is_canceled lead_time arrival_date_year arrival_date_month
##   <chr>              <dbl>     <dbl>             <dbl> <chr>             
## 1 Resort Hotel           0       342              2015 July              
## 2 Resort Hotel           0       737              2015 July              
## 3 Resort Hotel           0         7              2015 July              
## 4 Resort Hotel           0        13              2015 July              
## 5 Resort Hotel           0        14              2015 July              
## 6 Resort Hotel           0        14              2015 July              
## # ℹ 27 more variables: arrival_date_week_number <dbl>,
## #   arrival_date_day_of_month <dbl>, stays_in_weekend_nights <dbl>,
## #   stays_in_week_nights <dbl>, adults <dbl>, children <dbl>, babies <dbl>,
## #   meal <chr>, country <chr>, market_segment <chr>,
## #   distribution_channel <chr>, is_repeated_guest <dbl>,
## #   previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
## #   reserved_room_type <chr>, assigned_room_type <chr>, …
colnames(hotel_bookings_df)
##  [1] "hotel"                          "is_canceled"                   
##  [3] "lead_time"                      "arrival_date_year"             
##  [5] "arrival_date_month"             "arrival_date_week_number"      
##  [7] "arrival_date_day_of_month"      "stays_in_weekend_nights"       
##  [9] "stays_in_week_nights"           "adults"                        
## [11] "children"                       "babies"                        
## [13] "meal"                           "country"                       
## [15] "market_segment"                 "distribution_channel"          
## [17] "is_repeated_guest"              "previous_cancellations"        
## [19] "previous_bookings_not_canceled" "reserved_room_type"            
## [21] "assigned_room_type"             "booking_changes"               
## [23] "deposit_type"                   "agent"                         
## [25] "company"                        "days_in_waiting_list"          
## [27] "customer_type"                  "adr"                           
## [29] "required_car_parking_spaces"    "total_of_special_requests"     
## [31] "reservation_status"             "reservation_status_date"

Data Cleaning Removing missing values from the data

na.omit(hotel_bookings_df)
## # A tibble: 119,386 × 32
##    hotel        is_canceled lead_time arrival_date_year arrival_date_month
##    <chr>              <dbl>     <dbl>             <dbl> <chr>             
##  1 Resort Hotel           0       342              2015 July              
##  2 Resort Hotel           0       737              2015 July              
##  3 Resort Hotel           0         7              2015 July              
##  4 Resort Hotel           0        13              2015 July              
##  5 Resort Hotel           0        14              2015 July              
##  6 Resort Hotel           0        14              2015 July              
##  7 Resort Hotel           0         0              2015 July              
##  8 Resort Hotel           0         9              2015 July              
##  9 Resort Hotel           1        85              2015 July              
## 10 Resort Hotel           1        75              2015 July              
## # ℹ 119,376 more rows
## # ℹ 27 more variables: arrival_date_week_number <dbl>,
## #   arrival_date_day_of_month <dbl>, stays_in_weekend_nights <dbl>,
## #   stays_in_week_nights <dbl>, adults <dbl>, children <dbl>, babies <dbl>,
## #   meal <chr>, country <chr>, market_segment <chr>,
## #   distribution_channel <chr>, is_repeated_guest <dbl>,
## #   previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>, …

Used ‘geom_point’ to make a scatter plot comparing lead_time and number of children in this code chunk: On the x-axis, the plot shows how far in advance a booking is made, with the bookings furthest to the right happening the most in advance. On the y-axis it shows how many children there are in a party.

ggplot(data = hotel_bookings_df) +
  geom_point(mapping = aes(x=lead_time,y=children, color= children))
## Warning: Removed 4 rows containing missing values (`geom_point()`).

Company wants to increase weekend bookings, an important source of revenue for the hotel. Company wants to know what group of guests book the most weekend nights in order to target that group in a new marketing campaign.

ggplot(data = hotel_bookings_df)+
  geom_point(mapping = aes(x=stays_in_weekend_nights, y=children,color = children))
## Warning: Removed 4 rows containing missing values (`geom_point()`).

Used ‘geom_bar’ to show how many transactions are occurring for each different distribution type.This code chunk creates a bar chart with ‘distribution_channel’ on the x axis and ‘count’ on the y axis. There is data for corporate, direct, GDS, TA/TO, and undefined distribution channels. It also includes data from ‘deposit_type’ column as color-coded sections of each bar. There is a legend explaining what each color represents on the right side of the visualization.

ggplot(data = hotel_bookings_df) +
  geom_bar(mapping = aes(x= distribution_channel,fill = deposit_type))+
  labs(title ="Comparision of distribution_channels by deposit_type for hotel bookings")

This bar chart is similar to the previous chart, except that ‘market_segment’ data is being recorded in the color-coded sections of each bar.

ggplot(data = hotel_bookings_df) +
  geom_bar(mapping = aes(x= distribution_channel, fill = market_segment))+
  labs(title ="Comparision of distribution_channels by market_segment for hotel bookings")

Used ‘facet_wrap’ create separate charts for each deposit_type and market_segment to help them understand the differences more clearly. This code chunk creates three bar charts for ‘no_deposit’, non_refund’, and ‘refundable’ deposit_type ‘theme’ function is used to add one piece of code at the end that rotates the text to 45 degrees to make it easier to read explore the differences by deposit_type and market_segment.

ggplot(data = hotel_bookings_df) +
  geom_bar(mapping = aes(x=distribution_channel,fill= market_segment)) +
  facet_wrap(~market_segment~deposit_type) + 
  theme(axis.text.x = element_text(angle = 45))+
  labs(title ="Differences on market_segment and deposit_type for hotel bookings")

Conclusion

Analysis visualize insights as under:-

The scatter Plot 1 many of the advanced bookings are being made by people with 0 children.

Scatter Plot 2 depicts that the stays_in_weekend_nights are booked by people with 0 children.

Bar Chart plot 3 geom_bar shows the distribution types with no deposit in Online TA/TO is highest.

Bar Chart plot 4 geom_bar shows the distribution types with market segment, the highest is Online TA/TO.

Bar Chart plot 5 geom_bar with facet wrap visualizes the deposit_type and market_segment in several distribution channels the highest among is online TA/TO

By the visualizations shown above will help the stackholders to gain insights on hotel booking strategy on different scenarios.

                                         ***End of the Report***