Data Introduction

The scope of this article is investigative coding in R using dplyr and ggplot2 packages for an Analytics Programming class. The dataset used for this assignment (https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset) describes 36,275 hotel bookings with specific information about each reservation such as nights stayed, meal plan and/or parking spot requested, and number of children and adults. The original motivation behind the collection of this data was to try and identify what factors would play into someone honoring a reservation that they’ve made or not.

Data Visualizations

Children’s Effect on Meal Plan

The first question that I wanted to ask was how does having kids impact the decision to decide on a meal plan. To investigate this I created 2 separate bar charts showing the meal plan preference, with the first chart being for bookings with children, and the second chart for bookings without children.

For people with and without kids, Meal Plan 1 looks significantly more popular than any other meal plan. Interestingly, out of over 36,000 individual hotel bookings, there was not a single person without kids who selected Meal Plan 3, which makes me very curious what this offer is. Not selecting a meal plan does seem to be proportionally more popular for people without children than those with children.

Lead Time Impact on Price

Lead time is a variable in this data set that shows the amount of time in days between the reservation being made and the day of check in. There are 7 different Room Types for this Data Set, however for the purposes of legibility I will be using data from the 3 most popular room types: Room Type 1, Room Type 4, and Room Type 6.

The average room price is in ascending order, with Room 1 being the cheapest and Room 6 being the most expensive. For Room 1 there doesn’t seem to be any correlation between the lead time and room price, however for the more expensive rooms there does seem to be a discount or some sort of lower fee for booking rooms further in advance. There seems to be a lot of variability in prices for Room 6 when booking with under 100 days of lead time.

Parking Space Requests Impact on Room Price (Part 1)

I wanted to investigate the impact that requesting a parking space would have on the average price of a room. Using basic logic, it would make sense that a parking space would slightly increase the price of a room, so this was my expectation for this chart.

My hypothesis was correct for Rooms 1, 2, 4, and 6. For each of these rooms, the price goes up slightly if there is a parking space requested compared to when there isn’t. Interestingly, for rooms 3, 5, and 7 this is not the case. Room 3 is only booked a small number of times (7 to be exact) and none of these bookings came with a parking space request, so this value doesn’t exist. Room 5 sees a decrease in the price when there is a parking space requested, and room 7 shows a significant decrease when there is a parking space requested. This is a problem with dirty data, as there are instances where the price for a room is 0, which throws off the averages.

Parking Space Requests Impact on Room Price (Part 2)

Out of curiosity, I used the same question in the previous section and set it up in a different visualization to analyze the legibility of the two separate graphs. This is the same data, however I used dodged bar charts with a color differentiation for a required parking space or not.

There is a component to comparing these two visualizations which I think is strictly personal preference. The dodged bar charts in my opinion decrease the cognitive load to see the small uptick in price in rooms 1, 2, 4, and 6. It also highlights the oddities that are room types 3, 5, and 7. Personally, I find it fascinating how much the choice in chart can impact a readers understanding and interpretation of the data, and this visualization is an attempt to highlight the differences in legibility and understanding for the same data portrayed in two different ways.

Special Requests for Guests with Kids and Guests Without

The last visualization that I conducted utilizing this data was investigating a statistic that I found quite entertaining: number of requests made. It is hilarious to me that the hotels involved in this data were tracking the number of times guests were making a request, and I wanted to investigate if the number of requests made was impacted by a having kids or not. Enjoy.

My first observation is gratifying, seeing that the average number of requests didn’t get significantly higher than 1.5 for any of these bars. Once again, Room Type 3 is an outlier, as only 7 requests were made for this room. In every room type, the lowest average request rate was from guests without kids. I chose to interpret this from the perspective of a hotel staff. If you see guests walk in with kids, from this data you can expect that they will be asking you for more special accommodations. I am also keeping whoever stayed in a hotel with 9 or 10 kids in my thoughts, that was very brave. Thank you for taking the time to read this, I hope you have a wonderful rest of your day/night. :)