Introduction

We have been tasked with completing an analysis on hotel reservation data contained in a relational database housed in Microsoft SQL Server Management System, also known as SSMS. Using SQL specific to SSMS, Transact-SQL (T-SQL), we will clean and analyze the data to answer the following questions:

The tables we will analyze within the database are dbo.[2018], dbo.[2019], and dbo.[2020], which contain the hotel reservation data from years 2018, 2019, and 2020, respectively. Included with these tables is the dbo.market_segments table, which is a short table with columns listing the different customer types within the reservation data. We can be certain that the answers can be gleaned from our data, so we will begin with cleaning the data to be sure that we begin our analysis at the highest level of data integrity.

Cleaning the Data

Remove Null and Irrelevant Data

It is important to remove or replace any and all null values in the tables we will analyze because when comparing an unknown value to any other value, the result is always unknown and not included in the results. These null values will therefore require lines of code that will IS NULL or IS NOT NULL operators in each query to isolate or exclude these values, which slows the performance of each query. Therefore, we will work to remove NULL values where there are found.

First, we will remove rows containing null agent ID numbers for agents that cannot be identified. This will prevent us from obtaining skewed query results when analyzing agent-specific data within the tables. We will use alias ‘x’ in our code.

DELETE x FROM dbo.[2018] x
WHERE x.agent IS NULL

DELETE x FROM dbo.[2019] x
WHERE x.agent IS NULL

DELETE x FROM dbo.[2020] x
WHERE x.agent IS NULL

Next, we will remove the rows containing null values for children, as we will need to determine any trends for reservations that include children without any empty or misleading data.

DELETE x from dbo.[2018] x
WHERE x.children IS NULL

DELETE x from dbo.[2019] x
WHERE x.children IS NULL

DELETE x from dbo.[2020] x
WHERE x.children IS NULL

Lastly, We will not be using the Market Segments table now or in the future, so we will be deleting the corresponding table altogether. In the fixed server role of the sysadmin, we have the required permissions to complete this action. As none of our queries include references to this table, they will not be affected by its deletion

DROP TABLE dbo.market_segments

We will now create a new table by uniting the 2018, 2019, and 2020 tables. This provides an combined table of all applicable years so that all queries can pull from the data altogether, as opposed to lengthy code for each individual table.

SELECT * INTO hotels FROM dbo.[2018]
UNION
SELECT * FROM dbo.[2019]
UNION
SELECT * FROM dbo.[2020];

From here, we will continue the cleaning process on our new table titled ‘hotels.’

First, we will remove the ‘company’ column, as there is no relevant data there since the companies are not equal to guests in our scenario. All value are NULL and the company is therefore inconsequential.

ALTER TABLE hotels
DROP COLUMN company

Add Relevant Columns

Then, we will add a new column titled ‘total_nights’ to our hotels table to show the total nights guests stayed at our hotels.

ALTER TABLE hotels
ADD total_nights AS (stays_in_week_nights + stays_in_weekend_nights) PERSISTED;

We will now add another column titled ‘revenue’ that calculates its namesake as the total daily revenue, or the product of the total nights guests stayed and the average daily room rate.

ALTER TABLE hotels
ADD revenue AS ((stays_in_week_nights + stays_in_weekend_nights)*adr) PERSISTED;

Fortify Data Uniformity

One of the attributes of our data is that the abbreviations used for countries are a uniform length of only 3 characters and are NOT null. Two find the values that do not follow this rule, we will use this query to search them out and identify them.

SELECT [country]
FROM dbo.hotels
WHERE LEN(country) <> 3 OR country IS NULL
GROUP BY country
ORDER BY country
2 records
country
CN
NULL

The results of this query reveals that there are two values in the ‘country’ column that are not 3 characters in length (‘CN’ for Canada) or are NULL. As there is complete data for the country of Canada, we will simply update the values from ‘CN’ to ‘CAN’ for the sake of data quality. As for the NULL country values, we have no way of filling in this data with the correct values. Therefore, the corresponding rows must be removed from the table to maintain the integrity of the data. To ensure accuracy of the query, an OR statement is added where the rows in which the length of characters is longer than 3 for the country in question, this applying directly to the 4 characters in ‘NULL’. When we re-run the original query, there are no results as desired.

UPDATE hotels
SET country = ('CAN')
WHERE country = 'CN';

DELETE FROM hotels
WHERE country IS NULL OR LEN(country) > 3;

SELECT [country]
FROM dbo.hotels
WHERE LEN(country) <> 3 OR country IS NULL
GROUP BY country
ORDER BY country

Analyzing the Data

Should we increase our parking lot size to accommodate a growing number of guests?

To determine whether guest parking should be expanded, we will use the following query to look into whether parking in either hotel has increased significantly over the 3 year period. We will name our columns ‘Total Spaces Required’, ‘Year’, and ‘Hotel’ to identify each of the corresponding values.

The results confirm that we peaked in 2019 for the most parking required in both hotels before we saw a sharp decrease of approximately 57% overall in 2020. Separately, there was a ~232% increase in required parking at the Resort Hotel and a ~534% increase in required parking at the City Hotel from 2018 to 2019. Inversely, there was a ~54% decrease in required parking at the Resort Hotel and a ~64% decrease in required parking at the City Hotel from 2018 to 2019.

SELECT SUM(required_car_parking_spaces) AS 'Total Spaces Required', arrival_date_year AS 'Year', hotel AS 'Hotel'
FROM hotels
GROUP BY arrival_date_year, hotel
ORDER BY arrival_date_year;
6 records
Total Spaces Required Year Hotel
171 2018 City Hotel
813 2018 Resort Hotel
2703 2019 Resort Hotel
1084 2019 City Hotel
383 2020 City Hotel
1246 2020 Resort Hotel

While this can likely be attributed to the quarantine period that coincided with the COVID-19 pandemic, we still cannot ignore the even higher increase of nearly 285% increase overall in parking spaces required from 2018 to 2019.

From this we can surmise that an additional parking lot would be beneficial to the for both hotel chains given the dramatic increase of at least double the previous years parking from 2018 to 2019. It is best to expand this parking (as the budget allows) for locations in more populous areas that see more travelers passing through.

Are we seeing annual growth in our revenue?

Here, we will query the total revenue by hotel by year to determine whether we are seeing the necessary growth in revenue on an annual basis. For organization, we will group them by year and hotel, and we will name our columns ‘Total Revenue’, ‘Hotel’, and ‘Year’. for each respective collection of values. For ease of reading, the ‘Revenue’ values will be formatted as currency.

SELECT FORMAT(ROUND(SUM(revenue), 2), 'C') AS 'Total Revenue', hotel AS 'Hotel', arrival_date_year AS 'Year'
FROM hotel_temp
GROUP BY arrival_date_year, hotel
ORDER BY hotel, arrival_date_year;
6 records
Total Revenue Hotel Year
$1,580,559.92 City Hotel 2018
$10,023,820.00 City Hotel 2019
$7,631,721.47 City Hotel 2020
$2,670,572.08 Resort Hotel 2018
$8,234,325.40 Resort Hotel 2019
$5,788,578.41 Resort Hotel 2020

We can observe from the query results that we did achieve a substantial increase in revenue from 2018 to 2019 in both hotel chains. There was a slight decrease in revenue from 2019 to 2020, but the 2020 revenue was never decreased lower than the initial value for the revenue data- the 2018 revenue. Economic downturns and the widespread quarantine closures of the 2020 COVID-19 pandemic are the most likely factors that would explain why the 2020 revenues did not exceed the previous years’ revenues.

From these results, we confirm that we have been achieving growth in our annual revenue as desired.