We have been tasked with completing an analysis on hotel reservation data contained in a relational database housed in Microsoft SQL Server Management System, also known as SSMS. Using SQL specific to SSMS, Transact-SQL (T-SQL), we will clean and analyze the data to answer the following questions:
The tables we will analyze within the database are dbo.[2018], dbo.[2019], and dbo.[2020], which contain the hotel reservation data from years 2018, 2019, and 2020, respectively. Included with these tables is the dbo.market_segments table, which is a short table with columns listing the different customer types within the reservation data. We can be certain that the answers can be gleaned from our data, so we will begin with cleaning the data to be sure that we begin our analysis at the highest level of data integrity.
It is important to remove or replace any and all null values in the tables we will analyze because when comparing an unknown value to any other value, the result is always unknown and not included in the results. These null values will therefore require lines of code that will IS NULL or IS NOT NULL operators in each query to isolate or exclude these values, which slows the performance of each query. Therefore, we will work to remove NULL values where there are found.
First, we will remove rows containing null agent ID numbers for agents that cannot be identified. This will prevent us from obtaining skewed query results when analyzing agent-specific data within the tables. We will use alias ‘x’ in our code.
DELETE x FROM dbo.[2018] x
WHERE x.agent IS NULL
DELETE x FROM dbo.[2019] x
WHERE x.agent IS NULL
DELETE x FROM dbo.[2020] x
WHERE x.agent IS NULL
Next, we will remove the rows containing null values for children, as we will need to determine any trends for reservations that include children without any empty or misleading data.
DELETE x from dbo.[2018] x
WHERE x.children IS NULL
DELETE x from dbo.[2019] x
WHERE x.children IS NULL
DELETE x from dbo.[2020] x
WHERE x.children IS NULL
Lastly, We will not be using the Market Segments table now or in the future, so we will be deleting the corresponding table altogether. In the fixed server role of the sysadmin, we have the required permissions to complete this action. As none of our queries include references to this table, they will not be affected by its deletion
DROP TABLE dbo.market_segments
We will now create a new table by uniting the 2018, 2019, and 2020 tables. This provides an combined table of all applicable years so that all queries can pull from the data altogether, as opposed to lengthy code for each individual table.
SELECT * INTO hotels FROM dbo.[2018]
UNION
SELECT * FROM dbo.[2019]
UNION
SELECT * FROM dbo.[2020];
From here, we will continue the cleaning process on our new table titled ‘hotels.’
First, we will remove the ‘company’ column, as there is no relevant data there since the companies are not equal to guests in our scenario. All value are NULL and the company is therefore inconsequential.
ALTER TABLE hotels
DROP COLUMN company
Then, we will add a new column titled ‘total_nights’ to our hotels table to show the total nights guests stayed at our hotels.
ALTER TABLE hotels
ADD total_nights AS (stays_in_week_nights + stays_in_weekend_nights) PERSISTED;
We will now add another column titled ‘revenue’ that calculates its namesake as the total daily revenue, or the product of the total nights guests stayed and the average daily room rate.
ALTER TABLE hotels
ADD revenue AS ((stays_in_week_nights + stays_in_weekend_nights)*adr) PERSISTED;
One of the attributes of our data is that the abbreviations used for countries are a uniform length of only 3 characters and are NOT null. Two find the values that do not follow this rule, we will use this query to search them out and identify them.
SELECT [country]
FROM dbo.hotels
WHERE LEN(country) <> 3 OR country IS NULL
GROUP BY country
ORDER BY country
| country |
|---|
| CN |
| NULL |
The results of this query reveals that there are two values in the ‘country’ column that are not 3 characters in length (‘CN’ for Canada) or are NULL. As there is complete data for the country of Canada, we will simply update the values from ‘CN’ to ‘CAN’ for the sake of data quality. As for the NULL country values, we have no way of filling in this data with the correct values. Therefore, the corresponding rows must be removed from the table to maintain the integrity of the data. To ensure accuracy of the query, an OR statement is added where the rows in which the length of characters is longer than 3 for the country in question, this applying directly to the 4 characters in ‘NULL’. When we re-run the original query, there are no results as desired.
UPDATE hotels
SET country = ('CAN')
WHERE country = 'CN';
DELETE FROM hotels
WHERE country IS NULL OR LEN(country) > 3;
SELECT [country]
FROM dbo.hotels
WHERE LEN(country) <> 3 OR country IS NULL
GROUP BY country
ORDER BY country
To determine whether guest parking should be expanded, we will use the following query to look into whether parking in either hotel has increased significantly over the 3 year period. We will name our columns ‘Total Spaces Required’, ‘Year’, and ‘Hotel’ to identify each of the corresponding values.
The results confirm that we peaked in 2019 for the most parking required in both hotels before we saw a sharp decrease of approximately 57% overall in 2020. Separately, there was a ~232% increase in required parking at the Resort Hotel and a ~534% increase in required parking at the City Hotel from 2018 to 2019. Inversely, there was a ~54% decrease in required parking at the Resort Hotel and a ~64% decrease in required parking at the City Hotel from 2018 to 2019.
SELECT SUM(required_car_parking_spaces) AS 'Total Spaces Required', arrival_date_year AS 'Year', hotel AS 'Hotel'
FROM hotels
GROUP BY arrival_date_year, hotel
ORDER BY arrival_date_year;
| Total Spaces Required | Year | Hotel |
|---|---|---|
| 171 | 2018 | City Hotel |
| 813 | 2018 | Resort Hotel |
| 2703 | 2019 | Resort Hotel |
| 1084 | 2019 | City Hotel |
| 383 | 2020 | City Hotel |
| 1246 | 2020 | Resort Hotel |
While this can likely be attributed to the quarantine period that coincided with the COVID-19 pandemic, we still cannot ignore the even higher increase of nearly 285% increase overall in parking spaces required from 2018 to 2019.
Here, we will query the total revenue by hotel by year to determine whether we are seeing the necessary growth in revenue on an annual basis. For organization, we will group them by year and hotel, and we will name our columns ‘Total Revenue’, ‘Hotel’, and ‘Year’. for each respective collection of values. For ease of reading, the ‘Revenue’ values will be formatted as currency.
SELECT FORMAT(ROUND(SUM(revenue), 2), 'C') AS 'Total Revenue', hotel AS 'Hotel', arrival_date_year AS 'Year'
FROM hotel_temp
GROUP BY arrival_date_year, hotel
ORDER BY hotel, arrival_date_year;
| Total Revenue | Hotel | Year |
|---|---|---|
| $1,580,559.92 | City Hotel | 2018 |
| $10,023,820.00 | City Hotel | 2019 |
| $7,631,721.47 | City Hotel | 2020 |
| $2,670,572.08 | Resort Hotel | 2018 |
| $8,234,325.40 | Resort Hotel | 2019 |
| $5,788,578.41 | Resort Hotel | 2020 |
We can observe from the query results that we did achieve a substantial increase in revenue from 2018 to 2019 in both hotel chains. There was a slight decrease in revenue from 2019 to 2020, but the 2020 revenue was never decreased lower than the initial value for the revenue data- the 2018 revenue. Economic downturns and the widespread quarantine closures of the 2020 COVID-19 pandemic are the most likely factors that would explain why the 2020 revenues did not exceed the previous years’ revenues.
As we all know, it is very common for adults to travel with their children and infants, whether on a family vacation or other important travel plans.
According to the NYU School of Professional Studies, travel intent for families in 2019 was 70% and was slightly higher at 79% in 2018.
Therefore, a vital trend to look into would be the total guests who have children and infants staying with them at our hotels. This will open opportunities for our hotels to offer more family-friendly options that will accommodate all guests with little ones along for the trip.
SELECT DISTINCT [arrival_date_year] AS Year,
COUNT(CASE WHEN children > 0 THEN 1 END) AS 'Children',
COUNT(CASE WHEN babies > 0 THEN 1 END) AS 'Infants',
COUNT(CASE WHEN adults > 0 THEN 1 END) AS 'Adults'
FROM hotels
GROUP BY arrival_date_year
ORDER BY arrival_date_year;
| Year | Children | Infants | Adults |
|---|---|---|---|
| 2018 | 714 | 169 | 10915 |
| 2019 | 4434 | 533 | 47022 |
| 2020 | 3256 | 228 | 27778 |
From the year 2018 to 2019, there was a 521% increase (or 5.2X increase) in child guests and a 215% increase (or 2.1X increase) in infant guests. The number of child and infant guests decreased in the year 2020, yet still remained 356% (3.5X) higher in child guests and 35% higher in infant guests, by comparison.
Our agents are a crucial element of our success, as it is their hard work that creates the reservations that generates the very revenue our company depends on to maintain solvency and succeeds among our competitors. Here, we will confirm the overall top performing travel agents, from which we can confirm that the highest individual sales were achieved overall. Isolating this data can help us determine the highest performing agents and which hotel chain they are performing the highest.
SELECT TOP 10 agent as Agent, MAX(revenue) as 'Top Sales', arrival_date_year AS Year, hotel AS 'Hotel'
FROM hotels
GROUP BY arrival_date_year, agent, hotel
ORDER BY [Top Sales] DESC;
| Agent | Top Sales | Year | Hotel |
|---|---|---|---|
| 314 | 6300.00 | 2020 | Resort Hotel |
| 14 | 6148.00 | 2019 | City Hotel |
| 12 | 5400.00 | 2019 | City Hotel |
| 9 | 4967.70 | 2019 | City Hotel |
| 8 | 4868.01 | 2020 | City Hotel |
| 250 | 4452.00 | 2020 | Resort Hotel |
| 9 | 4200.00 | 2020 | City Hotel |
| 240 | 4088.00 | 2020 | Resort Hotel |
| 240 | 4015.06 | 2019 | Resort Hotel |
| 241 | 3890.06 | 2020 | Resort Hotel |