Step 1: Import data

The data in this example is originally from the article Hotel Booking Demand Datasets (https://www.sciencedirect.com/science/article/pii/S2352340918315191), written by Nuno Antonio, Ana Almeida, and Luis Nunes for Data in Brief, Volume 22, February 2019.

The data was downloaded and cleaned by Thomas Mock and Antoine Bichat for #TidyTuesday during the week of February 11th, 2020 (https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-11/readme.md).

More about the dataset : https://www.kaggle.com/jessemostipak/hotel-booking-demand

Reading the file ‘hotel_bookings.csv’ into a data frame:

hotel_bookings <- read.csv("hotel_bookings.csv")

Step 2: View Data

Getting initial impressions of dataset

head(hotel_bookings)
##          hotel is_canceled lead_time arrival_date_year arrival_date_month
## 1 Resort Hotel           0       342              2015               July
## 2 Resort Hotel           0       737              2015               July
## 3 Resort Hotel           0         7              2015               July
## 4 Resort Hotel           0        13              2015               July
## 5 Resort Hotel           0        14              2015               July
## 6 Resort Hotel           0        14              2015               July
##   arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
## 1                       27                         1                       0
## 2                       27                         1                       0
## 3                       27                         1                       0
## 4                       27                         1                       0
## 5                       27                         1                       0
## 6                       27                         1                       0
##   stays_in_week_nights adults children babies meal country market_segment
## 1                    0      2        0      0   BB     PRT         Direct
## 2                    0      2        0      0   BB     PRT         Direct
## 3                    1      1        0      0   BB     GBR         Direct
## 4                    1      1        0      0   BB     GBR      Corporate
## 5                    2      2        0      0   BB     GBR      Online TA
## 6                    2      2        0      0   BB     GBR      Online TA
##   distribution_channel is_repeated_guest previous_cancellations
## 1               Direct                 0                      0
## 2               Direct                 0                      0
## 3               Direct                 0                      0
## 4            Corporate                 0                      0
## 5                TA/TO                 0                      0
## 6                TA/TO                 0                      0
##   previous_bookings_not_canceled reserved_room_type assigned_room_type
## 1                              0                  C                  C
## 2                              0                  C                  C
## 3                              0                  A                  C
## 4                              0                  A                  A
## 5                              0                  A                  A
## 6                              0                  A                  A
##   booking_changes deposit_type agent company days_in_waiting_list customer_type
## 1               3   No Deposit  NULL    NULL                    0     Transient
## 2               4   No Deposit  NULL    NULL                    0     Transient
## 3               0   No Deposit  NULL    NULL                    0     Transient
## 4               0   No Deposit   304    NULL                    0     Transient
## 5               0   No Deposit   240    NULL                    0     Transient
## 6               0   No Deposit   240    NULL                    0     Transient
##   adr required_car_parking_spaces total_of_special_requests reservation_status
## 1   0                           0                         0          Check-Out
## 2   0                           0                         0          Check-Out
## 3  75                           0                         0          Check-Out
## 4  75                           0                         0          Check-Out
## 5  98                           0                         1          Check-Out
## 6  98                           0                         1          Check-Out
##   reservation_status_date
## 1              2015-07-01
## 2              2015-07-01
## 3              2015-07-02
## 4              2015-07-02
## 5              2015-07-03
## 6              2015-07-03
colnames(hotel_bookings)
##  [1] "hotel"                          "is_canceled"                   
##  [3] "lead_time"                      "arrival_date_year"             
##  [5] "arrival_date_month"             "arrival_date_week_number"      
##  [7] "arrival_date_day_of_month"      "stays_in_weekend_nights"       
##  [9] "stays_in_week_nights"           "adults"                        
## [11] "children"                       "babies"                        
## [13] "meal"                           "country"                       
## [15] "market_segment"                 "distribution_channel"          
## [17] "is_repeated_guest"              "previous_cancellations"        
## [19] "previous_bookings_not_canceled" "reserved_room_type"            
## [21] "assigned_room_type"             "booking_changes"               
## [23] "deposit_type"                   "agent"                         
## [25] "company"                        "days_in_waiting_list"          
## [27] "customer_type"                  "adr"                           
## [29] "required_car_parking_spaces"    "total_of_special_requests"     
## [31] "reservation_status"             "reservation_status_date"

Step 3: Loading ‘tidyverse’ and ‘ggplot2’ package

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)

Step 4: Filtering the dataset

  1. Using filter() function to create a data set with only city hotels that are online TA
onlineta_city_hotels <- filter(hotel_bookings, 
                           (hotel=="City Hotel" & 
                             hotel_bookings$market_segment=="Online TA"))

Viewing new data frame onlineta_city_hotels

View(onlineta_city_hotels)
  1. Creating new data frame onlineta_city_hotels_v2:
onlineta_city_hotels_v2 <- hotel_bookings %>%
  filter(hotel=="City Hotel") %>%
  filter(market_segment=="Online TA")

Viewing onlineta_city_hotels_v2 dataset

View(onlineta_city_hotels_v2)

Finding year of earliest and latest hotel booking and saving them to use later in plotting charts

min(hotel_bookings$arrival_date_year)
## [1] 2015
max(hotel_bookings$arrival_date_year)
## [1] 2017
mindate <- min(hotel_bookings$arrival_date_year)
maxdate <- max(hotel_bookings$arrival_date_year)

Step 4: Visual Analysis of Hotel Bookings dataset

Analyzing Distribution Channnels

(1) Initial chart

ggplot(data = hotel_bookings) +
  geom_bar(mapping = aes(x = distribution_channel))+
  labs(title = "Bookings vs Distribution channel", subtitle =paste0("Data from: ", mindate, " to ", maxdate))

(2) Hotel Bookings with respect to other Factors

Check for number bookings for each distribution type - is different depending on whether or not there was a deposit or what market segment they represent.

ggplot(data = hotel_bookings) +
  geom_bar(mapping = aes(x = distribution_channel, fill=deposit_type))+
  labs(title="Deposit type with respect to No. of bookings", subtitle =paste0("Data from: ", mindate, " to ", maxdate))

(3) Market Segment with respect to No. of bookings

ggplot(data = hotel_bookings) +
  geom_bar(mapping = aes(x = distribution_channel, fill=market_segment))+
  labs(title = "Market Segment  with respect to No. of bookings", subtitle =paste0("Data from: ", mindate, " to ", maxdate))

(4) Visualizing Charts with each feature

Creating separate charts for each deposit type and market segment to help them understand the differences more clearly.

ggplot(data = hotel_bookings) +
  geom_bar(mapping = aes(x = distribution_channel)) +
  facet_wrap(~deposit_type)+
  labs(title = "plot for each deposit type", subtitle =paste0("Data from: ", mindate, " to ", maxdate))

(5) Creating a plot for each Market segment

ggplot(data = hotel_bookings) +
  geom_bar(mapping = aes(x = distribution_channel)) +
  facet_wrap(~market_segment)+
  labs(title = "plot for each Market segment", subtitle =paste0("Data from: ", mindate, " to ", maxdate))

(6) Create a single plot with both deposit type and market segment and explore the differences

ggplot(data = hotel_bookings) +
  geom_bar(mapping = aes(x = distribution_channel)) +
  facet_wrap(~deposit_type~market_segment)+
  labs(title = "plot with both deposit type and market segment", subtitle =paste0("Data from: ", mindate, " to ", maxdate))

Ananlyzing Hotel Types

(1) Comparison of market segments by hotel type for hotel bookings

ggplot(data = hotel_bookings) +
  geom_bar(mapping = aes(x = market_segment)) +
  facet_wrap(~hotel) +
  labs(title="Comparison of market segments by hotel type for hotel bookings", 
       subtitle =paste0("Data from: ", mindate, " to ", maxdate),
       x="Market Segment",
       y="Number of Bookings")

(2) Market Segment with respect to Types of hotels

creating a bar chart showing each hotel type and market segment using different colors to represent each market segment:

ggplot(data = hotel_bookings) +
  geom_bar(mapping = aes(x = hotel, fill = market_segment))+
  labs(title = "Market Segment  with respect to Types of hotels", 
       subtitle =paste0("Data from: ", mindate, " to ", maxdate))

(3) Market Segment with respect to Types of hotels in grid View

use the facet_wrap() function to create a separate plot for each market segment:

ggplot(data = hotel_bookings) +
  geom_bar(mapping = aes(x = hotel)) +
  facet_wrap(~market_segment)+
  labs(title="Each Market Segment  with respect to Types of hotels",
       subtitle =paste0("Data from: ", mindate, " to ", maxdate))

Visulaize filtered dataset from step4

Plotting a scatter plot using new filtered data below with either onlineta_city_hotels or onlineta_city_hotels_v2 :

ggplot(data = onlineta_city_hotels) +
  geom_point(mapping = aes(x = lead_time, y = children))+
  labs(title="Online City Hotels", 
       subtitle =paste0("Data from: ", mindate, " to ", maxdate))
## Warning: Removed 1 rows containing missing values (geom_point).