Synopsis

1. Problem Statement: The analysis is to perform a guest analysis to comprehend the types of hotel guests in Portugal, their favorites, and their booking cancellation habits. This analysis will provide Hotel Del Luna the advantage of knowing what to do for their guests at any given place such as marketing channels or time such as guest’s journey as well as an insight on the guest’ perception of expectations and needs. It will ultimately impact guest satisfactions, loyalty, and hotel operations. Also, it will definitely assist in identifying what the hotel needs to deliver to their guests and can subsequently determine the optimal level of investment.


2. Analysis Overview: In the intent of responding to these questions, two data sets with hotel demand data, one of the hotels is a resort hotel and the other is a city hotel, will be used to perform an exploratory and predictive analysis. The exploratory analysis will mine into the data to identity patterns related to hotel guests in Portugal and their liking. Decision tree will be the kind of prediction analysis used to discover the types of customers who are likely to cancel their reservations. The main variables of focus for this analysis will be bookings, guest classifications, country, hotel services and cancellation records.


3. Insights: The exploratory and predictive analyses help understand the clientele, their preferences and their booking cancellation patterns:

  • Couple make more reservations compared to Single, Family, and unclassified group (the number of adults, children and babies were not specified) in both hotel. Couple can be guest of same sex or different sex.

    • City Hotel: About 65% of the clients are couples.
    • Resort Hotel: Approximately 70% of the hotel guests are couple.
  • There are more Adults (66.25 %) and children (58.44 %) coming to the City Hotel compared to the Resort hotel. Resort hotel has more babies (58.69 %) compared to City hotel (41.31%).

  • 39 % of the City hotel guests and 44% of the Resort hotel guest come from Portugal. A great portion of the clients in both hotels comes from other countries in Europe.

    • Top five countries where the City Hotel guests come from: Portugal, France, Germany, United Kingdom, and Spain
    • Top five countries where the City Hotel guests come from: Portugal, United Kingdom, Spain, Ireland, and France
  • In each hotel, Bed & Breakfast is the most popular meal choice and Full board (breakfast, lunch, and dinner) is the least.

  • Among all the room types in both hotel, room A is the most reserved rooms by the guests in each hotel.

  • Most reservations in both hotel are Transient which is when the booking is not part of a group or contract and is not associated to other transient booking.

  • Guests prefers coming to the city hotel during the weekday and during the weekday (Monday through Friday) and stays over the weekend (Saturday-Sunday) for the resort hotel.

  • The city hotel has more reservations in August and May while the resort hotel has more reservation in August and July. January is the month with the less reservations.

  • The majority of the guests does not request a parking as well special requests.

  • In city hotel, 58.27% of clients do not cancel their bookings while in the Resort hotel it is 72.24% of guests.

  • The majority of guests are all new clients.

  • A large amount of the guests in both hotel has canceled less bookings in the past than the number of bookings

  • Guest country, Room assignation, Required Car Parking Spaces, Cancellation Habit, Week Stay Preference, Guest classification and Total of Special Request have a higher correlation with cancelation status for the Resort Hotel.

    • The prediction model reveals that 81.58 % of the reservation is more likely not to be cancelled if it is forecast that the reservation will not be cancelled.
  • Guest country, Total of Special Request, Booking Preference, Room assignation, Cancellation habit, Is Repeated Guest and Guest Classification will influence the choice of cancelling a reservation.

    • The prediction model conveys that 77.31 % of likelihood that reservations will not be cancelled if it is predicted that the guest will not cancel the reservation

Tools/Package used

software used for the analysis are:

  1. Tableau Prep Builder: Used to clean the datasets
  2. Tableau: Used to make graphs for the exploratory analysis
  3. SAS Enterprise Miner: Used to create the modeling for the predictive analysis
  4. library(knitr): For knitting document, include graphics.
  5. library(data.table): Used to display data on the screen in scrollable format
#Tableau Prep Builder
#Tableau Desktop
#SAS Enterprise Miner
library(knitr)

Data Preparation

This part is to prepare the data for the analysis.

Imported Data

1. There are two datasets with hotel demand data and were retrieved from ScienceDirect website: https://www.sciencedirect.com/science/article/pii/S2352340918315191. One of the hotels is a resort hotel and the other is a city hotel. Both datasets have the same structure, with 31 variables describing the 40,060 records of resort hotel and 79,330 observations of city hotel. Each observation corresponds to a hotel booking. Both datasets comprised bookings with arrival date between the 1st of July 2015 and 31st of August 2017, including bookings that are whether cancelled or not. All elements pertinent to the hotel or customer identification were deleted because these are real hotel datasets. Some fields were encrypted instead of designation for privacy reasons. The following variables were coded: ReservedRoomType, Company, and AssignedRoomType. An additional dataset was added to those two datasets.

# Both hotels (city and Resort) tables have the same numbers of variables. 

Hotel_Booking_Metadata <- read.csv("Hotel Booking_Metadata.csv")
datatable(head(Hotel_Booking_Metadata,10))

2. The supplementary data set is a country abbreviation (3-letter abbreviation) with their matching country names and was retrieved from IBAN website: https://www.iban.com/country-codes. The dataset has 2 variables with 251 observations. It will facilitate the country name identification in the two hotel data sets.

Country_Metadata <- read.csv("Country_Metadata.csv")
datatable(head(Country_Metadata))

Data Cleaning

Initial Cleaning Steps

All the steps were performed to combine the three datasets into a single dataset.

  • The first step was to remove all the fields that are not of interest from both hotel datasets. Approximately 12 variables were deleted: Agent, ArrivalDateDayOfMonth, ArrivalDateWeekNumber, ArrivalDateYear, BookingChanges, Company, DaysInWaitingList, DepositType, DistributionChannel, LeadTime, MarketSegment,ReservationStatusDate

  • In the variable names, space was inserted to all the variables names, and some variables were renamed to make each variable name easy to read and clear. The list below shows the variables that were renamed:

Renamed_Fields <- read.csv("Rename_Fields.csv")
datatable(head(Renamed_Fields))
  • The missing values spotted in the Children and country columns for the City Hotel dataset were replaced by 0. It was assumed that no children were listed during the reservation.

  • In the country column, some records were having country names as NULL, so it was changed to Not Specified.

  • Some variables features were changed to their specific names instead of acronyms. For example, the Meal Type column features fours type of meal: BB, FB, HB, SC so they were changed to their actual name such as BB was changed to Bed & Breakfast.

  • For the country code dataset, the missing values was excluded from the dataset because it is a dataset structure issue. And the variables were renamed such as a way that it will match the country column name in both hotel datasets. The ISO ALPHA-3 code was renamed to ISO Country and the Country or area name to Country. In the country variable, some country names were changed to make it short and for some other countries, this character (the) was removed to the country names.

  • Both hotels dataset was combined into a single dataset by using a Union operation. An additional column, named Table names, was automatically created by Tableau Prep to differentiate between the City and Resort hotels. This column was renamed to Hotel Type.

  • After combining the two hotel datasets into a single dataset, the country code dataset was joined to the combined data. A left-join was chosen to join both dataset meaning that only the rows from the combined dataset and corresponding matches from the country code table were included. Both variables, ISO country, was deleted from the joined dataset and the country column name was changed to Guest Country. This join was done to have the name of the country in the dataset instead of their ISO code to facilitate comprehension.

  • The final dataset was named Hotel Bookings. It has 19 variables and 119,390 records.

Additional changes made during the modeling phase:

  • Couples new features were created from the existing variables to accomplish the analysis:
    • Booking: the count of individual reservation

    • Room Assignation: This variable describes whether the guest was assigned the same room that was reserved. It is generated using the Assigned Room Type and the Reserve Room type.

    • Cancellation Habit: This variable describes whether the customer has canceled more bookings in the past than the number of booking the guest did not cancel. It is created by using these variables: Previous Bookings Not Canceled Previous Cancellations

    • Week Stay Preference: it defines whether the guest prefers booking during the weekday, weekend or weekday and weekend. It is calculated by using the Number of Weekend Stays and Number of Weekday Stays

  • After creating the variables, some variables were deleting to avoid redundancy: Number of Weekend Stays, Number of Weekday Stays, Assigned Room Type, Previous Bookings Not Canceled, Previous Cancellations

Predictive Analysis:

  • The combined data set was split into the two types of hotels for make the prediction analysis for each hotel instead of completing a combined prediction.

  • The target variable for the decision tree analysis is the cancelation status because the purpose of this analysis is to assess the category of clients who is likely to cancel a reservation.

  • Some variables were rejected for the analysis because they were assumed to be useless for the predictive model because relevant features were created with those variables. If they are part of the predictive analysis, they will be implicitly repeated: Number of Adults, Number of Children, Number of Babies, Reservation Status, Bookings


Cleaned Data Preview

The combined dataset named “Hotel Bookings” is displayed below:

# Read the cleaned data
Hotel_booking<- read.csv("Hotel_Bookings.csv")
datatable(head(Hotel_booking))

Exploratory Analysis

This analysis is performed by using graphs to respond to the business problem.

Hotel Guest Types

Figure 1 shows the classification of the guests and their respective proportions in each hotel. Couple make more reservations compared to Family, Single and unclassified group in both hotel. The couple can be guest of same sex or different sex. The gender of the customers is unknown so there is no affirmation that the couple are in a love relationship or friendship. What is certain is that clients in both hotel like to come to a hotel with a companion. City hotel and Resort hotels attract more couples than any other groups and Hotel Del Luna can use this aspect to improve their services,and their advertising in order to attract more couple. Moreover, the hotel chain can perform additional analyses to discover the reason of having less bookings from the other groups.

library(knitr)

# The graph is showing the percent of Total Count of Bookings and Guest Classification by Hotel Type.Color shows details about Guest Classification. Size shows count of Bookings. The marks are labeled by percent of Total Count of Bookings and Guest Classification. The data is filtered on Cancellation Status, which keeps Not Canceled.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic1.jpg")

The next graph (Fig.2) displays the demography of the client in both Portugal hotels. City hotel welcomes more adults and children compared to the Resort hotel while Resort hotel receives more babies compared to City hotel. This demography analysis is related to the analysis above with more insights of the different status of the guests. Based on their status, Hotel Del Luna can identify how they can satisfy their clientele.

# Adults in each Hotel: % of Total Number of Adults and Hotel Type.  Color shows details about Hotel Type.  Size shows sum of Number of Adults.  The marks are labeled by % of Total Number of Adults and Hotel Type. The data is filtered on Cancellation Status, which keeps Canceled and Not Canceled. The view is filtered on Hotel Type, which keeps City Hotel and Resort Hotel.

# Children in each hotel: % of Total Number of Children and Hotel Type.  Color shows details about Hotel Type.  Size shows sum of Number of Children.  The marks are labeled by % of Total Number of Children and Hotel Type. The data is filtered on Cancellation Status, which keeps Canceled and Not Canceled. The view is filtered on Hotel Type, which keeps City Hotel and Resort Hotel.

# Babies in each hotel: % of Total Number of Babies and Hotel Type.  Color shows details about Hotel Type.  Size shows sum of Number of Babies.  The marks are labeled by % of Total Number of Babies and Hotel Type. The data is filtered on Cancellation Status, which keeps Canceled and Not Canceled. The view is filtered on Hotel Type, which keeps City Hotel and Resort Hotel.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic2.jpg")

Hotel Guest Country of Origins

The heat graphs below reveal the country of origins of the hotel guests. The city hotel received customers from these top five countries: Portugal, France, Germany, United Kingdom, and Spain (Fig.3) for the City Hotel.

library(knitr)

# Figure 3: Guest Country and count of Bookings.  Color shows count of Bookings.  Size shows count of Bookings.  The marks are labeled by Guest Country and count of Bookings. The data is filtered on Cancellation Status and Hotel Type. The Cancellation Status filter keeps Canceled and Not Canceled. The Hotel Type filter keeps City Hotel and Resort Hotel.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic3.jpg")

Contrarily, the top five countries where most of the Resort hotel come from are Portugal, United Kingdom, Spain, Ireland, and France (Fig.4). The majority of guests are from Portugal which logically makes sense because both hotels are located in Portugal.

library(knitr)

# Figure 4: It is showing the Guest Country and count of Bookings.  Color shows count of Bookings.  Size shows count of Bookings.  The marks are labeled by Guest Country and count of Bookings. The data is filtered on Cancellation Status and Hotel Type. The Cancellation Status filter keeps Canceled and Not Canceled. The Hotel Type filter keeps Resort Hotel.



include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic4.jpg")

Knowing where the customers come from can help Hotel Del Luna understand how they will appeal them to Hotel Del Luna future hotel in Portugal.

Hotel Guest Preferences

Figure 5 presents the meal preference of guests in each hotel category. In each hotel, Bed & Breakfast is the most popular meal choice and Full board (breakfast,lunch, and dinner) is the least. Half board (breakfast and one other meal – usually dinner) is the second meal type for the resort hotel while it is “No meal package” choice for the city hotel. Bed & Breakfast meal choice is usually the standard meal package that most hotels offer in the room rate. In addition, it is simple to eat breakfast in the hotel than going outside to the restaurants to eat. It might be why clients are mostly to have Bed & Breakfast. Food is important contributor to a hotel’s positioning within its markets. It is an important driver for room revenues as well as asset value.

library(knitr)

# Figure 5: Count of Bookings for each Meal Type  broken down by Hotel Type.  Color shows details about Hotel Type.  The marks are labeled by % of Total Count of Meal Type1. The data is filtered on Cancellation Status, which keeps Canceled and Not Canceled.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic5.jpg")

Figure 6 shows the room preference of hotel guests for each hotel. Room A is the most reserved rooms by the guests in each hotel. The attraction to this room A might be due to either its affordable prices, or the views, or the feature of the room. No information was given regarding the room characteristics so an additional analysis could be done to evaluate the reason of choosing this room and the other room as well. There are many features in rooms such as walls, door, bed, roof, toilet, shower, pillow that can bring comfort to the customers which will contribute in high guest satisfaction.

library(knitr)

# Figure 6: Count of Bookings for each Reserved Room Type broken down by Hotel Type.  Color shows details about Hotel Type.  The marks are labeled by % of Total Count of Reserved Room Type. The data is filtered on Cancellation Status, which keeps Canceled and Not Canceled.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic6.jpg")

Figure 7 indicates that most reservations in both hotel are Transient booking type which is when the booking is not part of a group or contract and is not associated to other transient booking. In both hotel, guests who use the transient booking method are making reservation on their own and usually directly with the hotel. These customers might fall into three categories walk-ins which is guests who arrived at the hotel and being quoted the room rate at the front desk, online reservation - which is made directly through the hotel website or application- and the last one the phone reservation made through call center. These transient guests can be the most profitable ones for the city and Resort hotels in Portugal. Hotel Del Luna can figure out whether they will need to affiliate with a third-party sites or not to increase their booking reservations.

library(knitr)

# Figure 7: Count of Bookings for each Booking Preference broken down by Hotel Type.  Color shows details about Hotel Type.  The marks are labeled by % of Total Count of Booking Preference. The data is filtered on Cancellation Status, which keeps Canceled and Not Canceled.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic7.jpg")

Figure 8 displays that guests prefer book the city hotel during the weekday and for the resort hotel during the weekday (Monday through Friday) and weekend (Saturday-Sunday). In both hotel, less people come during the week-end. They might come during the weekday and stay over the weekend or just the weekday. These guests can be part of the business or tourism category. Business trips are more expected to happen during the weekday and people who are visiting a country, or another city probably come during the weekday compared to the weekend which is just for two days. This analysis can help Hotel Del Luna arrange their pricing accordingly.

library(knitr)

# Figure 8: Sum of Bookings for each Week Stay Preference broken down by Hotel Type.  Color shows details about Hotel Type.  The marks are labeled by % of Total Count of Week Stay Preference. The data is filtered on Cancellation Status, which keeps Canceled and Not Canceled.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic8.jpg")

The city hotel has more reservations in August and May while the resort hotel has more reservation in August and July. January is the month with less reservations (Fig.9). The month of May through August are end-spring and summer periods which might be why most guests are booking during that time. These periods are the ideal time to visit Portugal because of the lovely sunny weather and the lively times. It will help Hotel Del Luna come up with simple and practical strategies to attract more and more guest to their property even during low season by perhaps implementing season or progressive pricing, introducing holiday packages, and/or organizing events.

library(knitr)

# Figure 9: Count of Bookings for each Arrival Month broken down by Hotel Type.  Color shows Rank of Count of Bookings.  The marks are labeled by Rank of Count of Bookings.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic9.jpg")

Figure 10 shows that most of the guests do not request a parking. Portugal offers a wide variety of transportation services such as trains, buses, trams, metros, taxis, and planes which is safe, clean, mostly reliable, and affordable. This country is among the most expensive countries in Europe for cars. Therefore, if most of the guests are coming from another city or country they are more likely to use public transportation which is cheaper. Also depending of the location of both hotels, using public transportation might be a good option. Hotel Del Luna can choose to limit their parking spaces during the construction plan of their potential hotel in Portugal.

library(knitr)

#Figure 10: Count of Bookings for each Required Car Parking Spaces broken down by Hotel Type.  Color shows details about Hotel Type.  The marks are labeled by % of Total Count of Bookings. The data is filtered on Cancellation Status, which keeps Canceled and Not Canceled.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic10.jpg")

The graph below (Fig. 11) indicates that guests do not make a lot of requests. The special requests can be twin bed or high floor or accommodation for disabled people or any client request needs.. Guest requests are also an important part of hotel management. Being able to fulfill to the maximum a guest special request will be beneficial for Hotel Del Luna.

library(knitr)

# Figure 11: Count of Bookings for each Total Of Special Requests broken down by Hotel Type.  Color shows details about Hotel Type.  The marks are labeled by % of Total Count of Bookings. The data is filtered on Cancellation Status, which keeps Canceled and Not Canceled.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic11.jpg")

Guest Loyalty

Figure 12 depicts whether a reservation is cancelled or not by the customers. In city hotel, 58.27% of clients do not cancel their bookings while in the Resort hotel it is 72.24% of guests. There is a lower rate of no show status in both hotel. Cancellation is done by 40.57 % of city hotel compared to Resort hotel which is 27.04%.

library(knitr)

# Figure 12: Count of Bookings for each Reservation Status broken down by Hotel Type.  Color shows details about Hotel Type.  The marks are labeled by % of Total Count of Reservation Status.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic12.jpg")

Figure 13 shows that the majority of guests is all new clients. Most of them do not come back to the same hotel. Only a small portion of people in both hotel are frequent customers. Figure 14 also illustrates that a large amount of the guests in both hotel has canceled less booking in the past than the number of bookings.

library(knitr)

# Figure 13: Count of Bookings for each Hotel Type.  Color shows details about Is Repeated Guest.  The marks are labeled by % of Total Count of Is Repeated Guest. The data is filtered on Cancellation Status, which keeps Canceled and Not Canceled.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic13.jpg")

Figure 14 also illustrates that a large amount of the guests in both hotel has canceled less booking in the past than the number of bookings.

library(knitr)

# Figure 14: Sum of Bookings for each Cancellation Habit broken down by Hotel Type.  Color shows details about Hotel Type.  The marks are labeled by % of Total Count of Cancellation Habit.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic14.jpg")

One of the key goals for most hotels is to build guest loyalty which subsequently rise the amount of repeated guests. Loyal guests are most likely satisfied clients and they will help promote the hotel business to their friends and family and will help raise Hotel Del Luna brand visibility and reputation.

Predictive Analysis - Decision Tree

Cancellation Predictions

This part of the analysis is to identify what type of hotel guests cancel or not their reservation. Opting for the decision tree to respond to this inquiry was to show how different combinations can lead to different decisions which is either to cancel or not the hotel booking.

Resort Hotel

The variable importance are the variables that highly influence the target variable which is the cancellation status meaning they will influence the decision of cancelling or not a booking. Guest country, Room assignation, Required Car Parking Spaces, Cancellation Habit, Week Stay Preference, Guest classification and Total of Special Request have a higher correlation with cancellation status (Fig. 15).

# The result is from the decision tree analysis using the Optimal tree model. The assessment measure is Decision which use misclassification rate to judge the model fit. If the misclassification rate is low, the model is a good fit. 

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic15.jpg")

The prediction model reveals that 81.58 % of the reservation is more likely not to be cancelled if it is forecast that the reservation will not be cancelled (Fig. 16). It shows that how accurate the prediction model is.

# The result shows how accurate the prediction model is. 

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic16.jpg")

Explaining one branch of the Decision Tree result:

The red path in the figure below (Fig.17) demonstrates that if the guests are from one of these countries: United Kingdom, United States, Spain, Ireland, France, Romania, Norway, Oman, Argentina, Belgium, Germany, China, Greece, Switzerland, Sweden, Poland, Brazil, Finland, Australia, Denmark, Netherlands, India, Slovenia, Ukraine, Italy, Latvia, Chile, Estonia, Austria, Turkey, Angola, Czechia, Lithuania, Hungary, Not Specified, Colombia, Croatia, New Zealand, Japan, Slovakia and they don’t usually cancel their reservation after making one, they are 84.21% less likely to cancel their resort hotel bookings.

# The result is the optimal tree after running the decision tree node. 

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic17.jpg")

City Hotel

The variable importance that will influence the cancellation status is Guest country, Total of Special Request, Booking Preference, Room assignation, Cancellation habit, Is Repeated Guest and Guest Classification (Fig. 18).

## The result is from the decision tree analysis using the Optimal tree model. The assessment measure is Decision which use misclassification rate to judge the model fit. If the misclassification rate is low, the model is a good fit. 

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic18.jpg")

The prediction model conveys that 77.31 % of reservations will not be cancelled if it is predicted that the guest will not cancel the reservation (Fig.19)

# The result shows how accurate the prediction model is.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic19.jpg")

Explaining one branch of the Decision Tree result:

The red path in the figure below (Fig.20 ) depicts that if the guess is from one of these countries: Italy, France, United Kingdom, Spain, Brazil, Germany, Mexico, United States, Netherlands, Poland, Sweden, Belgium, Morocco, China, Hungary, Russia, Switzerland, Luxembourg, Egypt, Ukraine, Norway, Ireland, Romania, Israel, Croatia, Australia, Singapore, Japan, Argentina, Iran, Austria, Denmark, Algeria, Lithuania, Turkey, Czechia, Greece, Slovakia, Cyprus, Finland, Estonia, Iceland, Bulgaria, New Zealand, Korea, Serbia, Venezuela, India, Mozambique, Latvia, Slovenia, Lebanon, Ecuador, Thailand, Peru, Belarus, Kazakhstan, Costa Rica and they book their hotel through a Transient-Party, contract or Group, they are 96.41% unlikely to cancel their city hotel bookings.

# The result is the optimal tree after running the decision tree node.

include_graphics("C:/Users/ndryb/Desktop/Capstone/data/Screenshot/Pic20.jpg")

      —

Limitations

  • Additional analyses can be performed to understand the type of problems hotel customers experience and facilitate the hotel management on dealing with their clients reviews in order to enhance customer experience and increase the hotel ratings. Hotel Del Luna wants to better comprehend the needs of their prospective clientele in order to satisfy their needs and attract more customers. Paying attention to the customer voices will be a great path for success. A text analysis on the customer reviews on both city and resort hotels could deepen the guest analysis.

  • One attribute that was difficult to analyze was the room type. The room type was coding for privacy purposes and the assessment of this variable was limited. An analysis on the room type price, the room designs, or locations could give more insights about why most guest are more attracted to a certain room type than the other types.

  • For the predictive analysis, cluster analysis could be another option to predict whether the reservation will be cancelled. It could use to perform a comparison analysis between the decision tree to highlight which of the techniques give a better fit model. Moreover, Decision Tree model has various parameters that can be changed to see how the accuracy could improve. In addition, SAS Enterprise Miner comprises features such as Variables selection and partial least squares located under the Explore and the model processes that could be used to leverage Decision Tree models.

  • There are so many directions someone can take regarding the hotel booking data sets.