Package Required

library(ggplot2)  # ggplot2 data visualization
library(tidyverse)
library(dplyr)

Data Loading

hotels <- read.csv(file = "/Users/pulkitchauhan/Documents/UC/Data Wrangling/Midterm/hotels.csv")

Visualization

is_canceled

The only data preparation step for this visualization was conversion of is_cancelled to a factor variable, because that is how it is being used.

# Creating a booking status column from is_cancelled as a factor to use it in the plots for comparing bookings and cancellations
hotels <- hotels %>%
  mutate(booking_status = as.factor(is_canceled))
# Which Hotel Type has more bookings and cancellations?
ggplot(data = hotels,aes(hotel,fill = (booking_status))) +
  geom_bar(position = 'dodge') +
  scale_y_continuous(name = "Bookings and Cancellations",labels = scales::comma) +
  xlab("Hotel Type") +
  ggtitle("Which Hotel Type has more bookings and cancellations?") +
  labs(fill = 'Booking Status')

The data that I used is hotels booking data set. The data consists of booking information of a city and resort hotel and it also includes information like when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, and other things.

The visualization that I created is a side by side bar plot and it simply depicts the total number of bookings and cancellations across the type of hotel (city and resort). To depict the cancellations, I’ve used the column is_cancelled which is a categorical variable with a value of 1 if cancelled otherwise 0. As we can clearly see, City Hotel have higher booking and cancellations than Resort Hotel.