Hotel Bookings Analysis

Intoduction

This data set was acquired through kaggle. The data was taken from 2015-2017 and features details about hotel booking demand during that time. The data set specifies the type of hotels, the date of booking, date of arrival, lead time, and much more. https://www.kaggle.com/jessemostipak/hotel-booking-demand

Simple Donut Chart

This donut chart visualizes the two types of hotels that this dataframe works with. Between 2015 and 2017, city hotels were booked twice as much as the resort hotels. This can be because city hotels are generally more inexpensive than resorts. City hotels are also more common and provide more convenience for travelers who are not vacationing.


import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/ProgramData/Anaconda3/Library/plugins/platforms'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings("ignore")

path = "U:/"

filename = path + "hotel_bookings.csv"

df = pd.read_csv(filename, nrows=5)
df = pd.read_csv(filename, usecols = ['hotel', 'is_canceled', 'arrival_date_month','lead_time','arrival_date_day_of_month'])

x = df.groupby(['arrival_date_month']).agg({'arrival_date_month':['count'], 'lead_time':['sum', 'mean']}).reset_index()
x.columns = ['Month', 'Count', 'TotalLeadTime', 'AverLead']
x = x.sort_values('Count', ascending=False)

labels = df['hotel'].value_counts().index.tolist()
sizes = df['hotel'].value_counts().tolist()
colors = ["lightgreen","pink"]
plt.pie(sizes,labels=labels,colors=colors,autopct='%1.1f%%',startangle=90, textprops={'fontsize': 14})

hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

plt.title('Types of Hotels Booked (2015 - 2017) ', fontsize=14)

plt.show()

Standard Bar Charts

There are two standard bar chart visualizations, one that includes all the months and one with only the top seven most popular months by booking popularity. The charts are ordered by the highest booked month to the least. It is unsurprising that August is the month with the most hotel bookings because it is the peak of summer and its likely the time when most people go on vacation and book a hotel. It was surprising to see that January was the least booked month. My hypothesis was that November would be the least popular month for hotel bookings, however, this was wrong. It was also very surprising to see that April and October had more bookings than June! These visualizations also include a horizontal line to show where average amount of bookings are.


def pick_colors_according_to_mean_count(this_data): 
    colors=[]
    avg = this_data.Count.mean()
    for each in this_data.Count:
        if each > avg*1.01:
            colors.append('lightcoral')
        elif each < avg*0.99:
            colors.append('green')
        else: 
            colors.append('black')
    return colors        
    
import matplotlib.patches as mpatches

bottom1 = 1
top1 = 4
d1 = x.loc[bottom1:top1]
my_colors1 = pick_colors_according_to_mean_count(d1)

bottom2 = 1
top2 = 11
d2 = x.loc[bottom2:top2]
my_colors2 = pick_colors_according_to_mean_count(d2)

Above = mpatches.Patch(color='lightcoral', label='Above Average')
At = mpatches.Patch(color='black', label='Within 1% of the Average')
Below = mpatches.Patch(color='green', label='Below Average')

fig = plt.figure(figsize=(18, 16))
fig.suptitle('Frequency of Hotel Bookings by Month', 
             fontsize=18, fontweight='bold')

ax1 = fig.add_subplot(2, 1, 1)
ax1.bar(d1.Month, d1.Count, label='Count', color=my_colors1)
#ax1.Legend(fontsize=14)

ax1.legend(handles=[Above, At, Below], fontsize=14)
plt.axhline(d1.Count.mean(), color='black', linestyle='dashed')
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.set_title('Top Months in Descending Order', size=20)
ax1.text(top1+5, d1.Count.mean()+5, 'Mean = ' + str(d1.Count.mean()), rotation=0, fontsize=14 )


ax2 = fig.add_subplot(2, 1, 2)
ax2.bar(d2.Month, d2.Count, label='Count', color=my_colors2)
#ax1.Legend(fontsize=14)

ax2.legend(handles=[Above, At, Below], fontsize=14)
plt.axhline(d1.Count.mean(), color='black', linestyle='solid')
ax2.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax2.set_title('Top 7 Months', size=20)
ax2.text(top2-8, d2.Count.mean()+5, 'Mean = ' + str(d2.Count.mean()), fontsize=14 )

fig.subplots_adjust(hspace = 0.35)

plt.show()

Horizontal Bar Chart

This horizontal bar chart shows the months in descending order from the most popular to the least. Since the bar chart is horizontal, the months are listed on the y-axis and the total count of hotel bookings are listed on the x-axis. This chart is different from the previous one in regards to how each bar has its own unique label at the end, which shows the exact number of hotel bookings.


bottom3 = 1
top3 = 4
d3 = x.loc[bottom3:top3]
d3 = d3.sort_values('Count', ascending=True)
d3.reset_index(inplace=True, drop=True)
my_colors3 = pick_colors_according_to_mean_count(d3)

Above = mpatches.Patch(color='lightcoral', label='Above Average')
At = mpatches.Patch(color='black', label='Within 1% of the Average')
Below = mpatches.Patch(color='green', label='Below Average')


fig =plt.figure(figsize=(18,22))
ax1 = fig.add_subplot(1, 1, 1)
ax1.barh(d3.Month, d3.Count, color=my_colors3)

for row_counter, value_at_row_counter in enumerate(d3.Count):
    if value_at_row_counter > d3.Count.mean()*1.01:
        color='lightcoral'
    elif value_at_row_counter < d3.Count.mean()*0.99:
        color = 'green'
    else:
        color = 'black'
    ax1.text(value_at_row_counter+2, row_counter, str(value_at_row_counter), color='black',
            size=12, fontweight='bold', ha='left', va='center', backgroundcolor='white')
plt.xlim(0, d3.Count.max()*1.1)

ax1.legend(loc='lower right', handles=[Above, At, Below], fontsize=14)
plt.axvline(d3.Count.mean(), color='black', linestyle = 'dashed')
ax1.text(d3.Count.mean()+2, 0, 'Mean = ' + str(d3.Count.mean()), rotation=0, fontsize=14)

ax1.set_title('Most Popular Months', size=20)
ax1.set_xlabel('Count of Hotel Bookings', fontsize=16)
ax1.set_ylabel('Month', fontsize=16)
plt.xticks(fontsize=14)

plt.yticks(fontsize=14)

plt.show()

Dual Axis Bar Chart

The dual axis bar chart lists the months along the x-axis ordered from most popularly booked month to the least. The count of bookings are represented by the gray bar and the average lead time is represented by the green bar. Lead time is the days in advanced in which a hotel is booked from the check in date. For example if a hotel is reserved 25 days in advanced to the check in date, then the lead time for that particular booking will be 25. The lead time for each month was averaged and put into the green bars to complete the dual axis bar chart.The y-axis on the left of the chart represents the total bookings


def autolabel(these_bars, this_ax, place_of_decimals, symbol):
    for each_bar in these_bars:
        height = each_bar.get_height()
        this_ax.text(each_bar.get_x()+each_bar.get_width()/2, height*1.01, symbol+format(height, place_of_decimals),
                     fontsize=11, color='black', ha='center', va='bottom')
                     
fig = plt.figure(figsize=(30, 12))
ax1 = fig.add_subplot(1, 1, 1)
ax2 = ax1.twinx()
bar_width = 0.4

x_pos = np.arange(12)
count_bars = ax1.bar(x_pos-(0.5*bar_width), d1.Count, bar_width, color='gray', edgecolor='black', label='Count of Bookings')
aver_lead_bars = ax2.bar(x_pos+(0.5*bar_width), d1.AverLead, bar_width, color='green', edgecolor='black', label='Average Lead Time')

ax1.set_xlabel('Month', fontsize=18)
ax1.set_ylabel('Count of Bookings', fontsize=18, labelpad=20)
ax2.set_ylabel('Average Lead', fontsize=18, rotation=270, labelpad=20)
ax1.tick_params(axis='y', labelsize=14)
ax2.tick_params(axis='y', labelsize=14)

plt.title('Hotel Bookings and Average Lead Time Analysis\n Most Busy Months to the Least', fontsize=18)
ax1.set_xticks(x_pos)

ax1.set_xticklabels(d1.Month, fontsize=14)

count_color, count_label = ax1.get_legend_handles_labels()
lead_color, lead_label   = ax2.get_legend_handles_labels()
legend = ax1.legend(count_color + lead_color, count_label + lead_label, loc='upper left', frameon=True, ncol=1, shadow=True,
                   borderpad=1, fontsize=14)
ax1.set_ylim(0, d2.Count.max()*1.50)

ax2.set_ylim(0, d2.Count.max()*.0125)

autolabel(count_bars, ax1, '.0f', '')
autolabel(aver_lead_bars, ax2, '.2f', '')

plt.show()

Line Plot of Lead Time by Type of Hotel

This visualization showcases a line plot which has a line to represent the city hotels, and another line for the resort hotels. The x-axis represents the months and the y-axis represents the total lead time. The lead time for every month was summed together to represent which in which months have the most anticipated bookings. The line plot shows that both city and resort hotels are booked at a similar rate. The plot also shows that January is the month with the least amount of lead time, which which is surprising because I assumed people would book New Years vacations well in advanced. This hypothesis was wrong most likely because the frequency of bookings in January is already very low.


lead_df = df.groupby(['arrival_date_month','hotel'])['lead_time'].sum().reset_index(name='TotalLead')
lead_df

import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize = (18,10))
ax = fig.add_subplot(1, 1, 1)

my_colors = {'City Hotel':'darkblue',
             'Resort Hotel':'coral'}


for key, grp in lead_df.groupby(['hotel']):
    grp.plot(ax=ax, kind='line', x='arrival_date_month', y ='TotalLead', color=my_colors[key], label=key, marker='8')

plt.title('Lead Time by Month and by Hotel Type', fontsize=18)
ax.set_xlabel('Month', fontsize=18)
ax.set_ylabel('Total Lead Days (Millions)', fontsize=18, labelpad=20) 
ax.tick_params(axis='x', labelsize=14, rotation=0)
ax.tick_params(axis='y', labelsize=14, rotation=0)

ax.set_xticks(np.arange(12))

ax.yaxis.set_major_formatter( FuncFormatter( lambda x, pos:('%1.1fM')%(x*1e-6)))

plt.show()

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.