The first essential piece to any marketing strategy is a clear understanding of the target customer. If hotels (or any organizations) fail to do this, their marketing efforts will essentially serve as a sunk cost.
However, when strategized correctly, marketing efforts can significantly boost ROI. In order to attract the appropriate audiences, hotels must analyze their typical consumer’s behavior. In addition, they must understand their direct competition to build a sustainable competitive advantage. In our analysis, our goal is to assist our client, a resort hotel in South Carolina, to boost marketing ROI with a 5:1 ratio.
We plan to assist our client in boosting their ROI 5:1 and marketing effectively against nearby city hotels. In this analysis, we’ll compare data around average daily rates, number of special requests, average stays, busiest weeks, number of children during stay, and weekday frequencies to derive insights for resort hotels against city hotels. With our insights, we’ll be able to build and automate targeted email campaigns and ads in addition to competitive pricing. We’ll use our insights to tailor the content within the campaigns and push them heavily during statistically weaker sales periods.
We will analyze the frequency of stays during the week to determine which day (on average) is the most popular. Additionally, we will identify which parameters correlate with
If we can derive which days of the week are statistically least popular for resort hotels, we can relay this information to our client and help them tailor their marketing material accordingly. For instance, if we find that Monday is the least popular day of the week to stay at a resort hotel, we will advise our client to slightly reduce the price of Monday bookings to incentivize Monday sales.
If we can derive average daily rates for resort hotels vs. city hotels, we can help our client build competitive pricing. For example, if we find that the average stay for a resort hotel is 105.00 per night, we might advise our client to offer pricing between 95.00-97.00 during slower periods.
Similarly, if we can derive which weeks or months of the year are least popular for resort hotels, we can help our client prepare incentives for weaker sales periods. For example, if a resort hotel typically has less stays during the summer months, we can offer a discounted stay to customers in the winter months if they book a trip during the summer.
Lastly, if we analyze the number of children who stay at a resort hotel vs. a city hotel and find that there’s a significant difference, we will advise our client to tailor their marketing strategy accordingly. If we find that children stay much more frequently at resort hotels vs. city hotels, then we will advise our client to include photos of children in their marketing material.
Overall, we’ll help our client:
Determine competitive pricing
Identify key parameters for target ads on Facebook, Instagram, Twitter (age, time of ad, day of ad, etc.)
Reap the benefits of strategic marketing —> ideally 5:1 ROI through these efforts
We have loaded the packages upfront.
library(tidyverse)
library(glue)
library(Hmisc)
Package tidyverse is loaded as it includes many core packages such as ggplot2, dplyr, and tidyr. These are important for decision analysis as we explore the data set, clean and organize the file, and make any structural changes needed.
Package glue is loaded to make it easier to paste as it serves as an alternative.
Package Hmisc is loaded to show summary unique statistics on the variables in our set not able to obtain through standard Str() and Summary () functions.
We will be using the Hotel Bookings Data. The data for our project was obtained from an open hotel booking demand dataset from Antonio, Almeida and Nunes, 2019
This dataset describes two sets of information related to Resort Hotels and City Hotels. Both Resort Hotels and City Hotels share the same structure and each observation represents a specific hotel booking. This dataset was collected to aid in research and education in revenue management and data mining. The timeline for this data is between July 01, 2015 and August 31, 2017.
An explanation of the variables in the set is as follows:
hotel: The type of hotel the observation relates to (Resort or City).
is_canceled: Was the booking canceled or not? 1 = canceled, 0 = Not.
lead_time: number of days between the entered date and arrival date.
arrival_date_year: Year of arrival.
arrival_date_month: Month of arrival.
arrival_date_week_number: Week of the year the arrival occured.
arrival_date_day_of_month: Day of the arrival.
stays_in_weekend_nights: Number of weekend nights the guest stayed.
stays_in_week_nights: Number of week nights the guest stayed.
adults: Number of adults in the reservation.
children: Number of children in the reservation.
babies: Number of babies in the reservation.
meal: Undefined/SC – no meal package; BB – Bed & Breakfast; HB – Half board (breakfast and one other meal – usually dinner); FB – Full board (breakfast, lunch and dinner).
country: County of origin.
market_segment: TA = travel agents, TO = tour operators.
distribution_channel: How they booked. TA = travel agent and TO = tour operator.
is_repeated_guest: Is the guest a repeat customer? 1 = Yes and 0 = No.
previous_cancellations: Number of prior bookings canceled before the current booking.
previous_bookings_not_canceled: Number of previous bookings not cancelled by the customer before the current booking.
reserved_room_type: Room type reserved.
assigned_room_type: Room type assigned. Can differ from the reserved type.
booking_changes: Number of changes/amendments made to the booking until check in or cnacellation.
deposit_type: No Deposit = no deposit was made; Non Refund = a deposit was made in the value of the total stay cost; Refundable = a deposit was made with a value under the total cost of stay.
agent: ID of the travel agency where the booking was made.
company: ID of the company that made the booking.
days_in_waiting_list: How long the customer had to wait to get their reservation.
customer_type: Contract = when the booking has a contract associated to it; Group = when the booking is associated with a group; Transient = when the booking is not part of a group or contract, and is not associated to other transient bookings; Transient-party = when the booking is transient, but is associated to at least other transient booking.
adr: Average daily rate of the booking.
required_car_parking_spaces: How many parking spaces are needed for the booking.
total_of_special_requests: Number of special requests made.
reservation_status: Last status of the reservation.
reservation_status_date: Date of the last reservation status.
We read the data in and determine any missing values through the below:
hotel <- read.csv("C:/Users/bellmatj/OneDrive - University of Cincinnati/Data Wrangling/Project/hotels.csv")
colSums(is.na(hotel))
## hotel is_canceled
## 0 0
## lead_time arrival_date_year
## 0 0
## arrival_date_month arrival_date_week_number
## 0 0
## arrival_date_day_of_month stays_in_weekend_nights
## 0 0
## stays_in_week_nights adults
## 0 0
## children babies
## 4 0
## meal country
## 0 0
## market_segment distribution_channel
## 0 0
## is_repeated_guest previous_cancellations
## 0 0
## previous_bookings_not_canceled reserved_room_type
## 0 0
## assigned_room_type booking_changes
## 0 0
## deposit_type agent
## 0 0
## company days_in_waiting_list
## 0 0
## customer_type adr
## 0 0
## required_car_parking_spaces total_of_special_requests
## 0 0
## reservation_status reservation_status_date
## 0 0
There are 119,390 observations in this table across 32 variables:
nrow(hotel)
## [1] 119390
ncol(hotel)
## [1] 32
During this step we replaced the NA values in the children column with 0, similar to how it is structured in the column babies. Overall, the dataset was very clean and needed minimal change.
hotel$children[is.na(hotel$children)] = 0
The data is clean and we have displayed the first 25 observations in the set.
head(hotel, 25)
## hotel is_canceled lead_time arrival_date_year arrival_date_month
## 1 Resort Hotel 0 342 2015 July
## 2 Resort Hotel 0 737 2015 July
## 3 Resort Hotel 0 7 2015 July
## 4 Resort Hotel 0 13 2015 July
## 5 Resort Hotel 0 14 2015 July
## 6 Resort Hotel 0 14 2015 July
## 7 Resort Hotel 0 0 2015 July
## 8 Resort Hotel 0 9 2015 July
## 9 Resort Hotel 1 85 2015 July
## 10 Resort Hotel 1 75 2015 July
## 11 Resort Hotel 1 23 2015 July
## 12 Resort Hotel 0 35 2015 July
## 13 Resort Hotel 0 68 2015 July
## 14 Resort Hotel 0 18 2015 July
## 15 Resort Hotel 0 37 2015 July
## 16 Resort Hotel 0 68 2015 July
## 17 Resort Hotel 0 37 2015 July
## 18 Resort Hotel 0 12 2015 July
## 19 Resort Hotel 0 0 2015 July
## 20 Resort Hotel 0 7 2015 July
## 21 Resort Hotel 0 37 2015 July
## 22 Resort Hotel 0 72 2015 July
## 23 Resort Hotel 0 72 2015 July
## 24 Resort Hotel 0 72 2015 July
## 25 Resort Hotel 0 127 2015 July
## arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
## 1 27 1 0
## 2 27 1 0
## 3 27 1 0
## 4 27 1 0
## 5 27 1 0
## 6 27 1 0
## 7 27 1 0
## 8 27 1 0
## 9 27 1 0
## 10 27 1 0
## 11 27 1 0
## 12 27 1 0
## 13 27 1 0
## 14 27 1 0
## 15 27 1 0
## 16 27 1 0
## 17 27 1 0
## 18 27 1 0
## 19 27 1 0
## 20 27 1 0
## 21 27 1 1
## 22 27 1 2
## 23 27 1 2
## 24 27 1 2
## 25 27 1 2
## stays_in_week_nights adults children babies meal country market_segment
## 1 0 2 0 0 BB PRT Direct
## 2 0 2 0 0 BB PRT Direct
## 3 1 1 0 0 BB GBR Direct
## 4 1 1 0 0 BB GBR Corporate
## 5 2 2 0 0 BB GBR Online TA
## 6 2 2 0 0 BB GBR Online TA
## 7 2 2 0 0 BB PRT Direct
## 8 2 2 0 0 FB PRT Direct
## 9 3 2 0 0 BB PRT Online TA
## 10 3 2 0 0 HB PRT Offline TA/TO
## 11 4 2 0 0 BB PRT Online TA
## 12 4 2 0 0 HB PRT Online TA
## 13 4 2 0 0 BB USA Online TA
## 14 4 2 1 0 HB ESP Online TA
## 15 4 2 0 0 BB PRT Online TA
## 16 4 2 0 0 BB IRL Online TA
## 17 4 2 0 0 BB PRT Offline TA/TO
## 18 1 2 0 0 BB IRL Online TA
## 19 1 2 0 0 BB FRA Corporate
## 20 4 2 0 0 BB GBR Direct
## 21 4 1 0 0 BB GBR Online TA
## 22 4 2 0 0 BB PRT Direct
## 23 4 2 0 0 BB PRT Direct
## 24 4 2 0 0 BB PRT Direct
## 25 5 2 0 0 HB GBR Offline TA/TO
## distribution_channel is_repeated_guest previous_cancellations
## 1 Direct 0 0
## 2 Direct 0 0
## 3 Direct 0 0
## 4 Corporate 0 0
## 5 TA/TO 0 0
## 6 TA/TO 0 0
## 7 Direct 0 0
## 8 Direct 0 0
## 9 TA/TO 0 0
## 10 TA/TO 0 0
## 11 TA/TO 0 0
## 12 TA/TO 0 0
## 13 TA/TO 0 0
## 14 TA/TO 0 0
## 15 TA/TO 0 0
## 16 TA/TO 0 0
## 17 TA/TO 0 0
## 18 TA/TO 0 0
## 19 Corporate 0 0
## 20 Direct 0 0
## 21 TA/TO 0 0
## 22 Direct 0 0
## 23 Direct 0 0
## 24 Direct 0 0
## 25 TA/TO 0 0
## previous_bookings_not_canceled reserved_room_type assigned_room_type
## 1 0 C C
## 2 0 C C
## 3 0 A C
## 4 0 A A
## 5 0 A A
## 6 0 A A
## 7 0 C C
## 8 0 C C
## 9 0 A A
## 10 0 D D
## 11 0 E E
## 12 0 D D
## 13 0 D E
## 14 0 G G
## 15 0 E E
## 16 0 D E
## 17 0 E E
## 18 0 A E
## 19 0 A G
## 20 0 G G
## 21 0 F F
## 22 0 A A
## 23 0 A A
## 24 0 D D
## 25 0 D I
## booking_changes deposit_type agent company days_in_waiting_list
## 1 3 No Deposit NULL NULL 0
## 2 4 No Deposit NULL NULL 0
## 3 0 No Deposit NULL NULL 0
## 4 0 No Deposit 304 NULL 0
## 5 0 No Deposit 240 NULL 0
## 6 0 No Deposit 240 NULL 0
## 7 0 No Deposit NULL NULL 0
## 8 0 No Deposit 303 NULL 0
## 9 0 No Deposit 240 NULL 0
## 10 0 No Deposit 15 NULL 0
## 11 0 No Deposit 240 NULL 0
## 12 0 No Deposit 240 NULL 0
## 13 0 No Deposit 240 NULL 0
## 14 1 No Deposit 241 NULL 0
## 15 0 No Deposit 241 NULL 0
## 16 0 No Deposit 240 NULL 0
## 17 0 No Deposit 8 NULL 0
## 18 0 No Deposit 240 NULL 0
## 19 0 No Deposit NULL 110 0
## 20 0 No Deposit 250 NULL 0
## 21 0 No Deposit 241 NULL 0
## 22 1 No Deposit 250 NULL 0
## 23 1 No Deposit 250 NULL 0
## 24 1 No Deposit 250 NULL 0
## 25 0 No Deposit 115 NULL 0
## customer_type adr required_car_parking_spaces total_of_special_requests
## 1 Transient 0.00 0 0
## 2 Transient 0.00 0 0
## 3 Transient 75.00 0 0
## 4 Transient 75.00 0 0
## 5 Transient 98.00 0 1
## 6 Transient 98.00 0 1
## 7 Transient 107.00 0 0
## 8 Transient 103.00 0 1
## 9 Transient 82.00 0 1
## 10 Transient 105.50 0 0
## 11 Transient 123.00 0 0
## 12 Transient 145.00 0 0
## 13 Transient 97.00 0 3
## 14 Transient 154.77 0 1
## 15 Transient 94.71 0 0
## 16 Transient 97.00 0 3
## 17 Contract 97.50 0 0
## 18 Transient 88.20 0 0
## 19 Transient 107.42 0 0
## 20 Transient 153.00 0 1
## 21 Transient 97.29 0 1
## 22 Transient 84.67 0 1
## 23 Transient 84.67 0 1
## 24 Transient 99.67 0 1
## 25 Contract 94.95 0 1
## reservation_status reservation_status_date
## 1 Check-Out 2015-07-01
## 2 Check-Out 2015-07-01
## 3 Check-Out 2015-07-02
## 4 Check-Out 2015-07-02
## 5 Check-Out 2015-07-03
## 6 Check-Out 2015-07-03
## 7 Check-Out 2015-07-03
## 8 Check-Out 2015-07-03
## 9 Canceled 2015-05-06
## 10 Canceled 2015-04-22
## 11 Canceled 2015-06-23
## 12 Check-Out 2015-07-05
## 13 Check-Out 2015-07-05
## 14 Check-Out 2015-07-05
## 15 Check-Out 2015-07-05
## 16 Check-Out 2015-07-05
## 17 Check-Out 2015-07-05
## 18 Check-Out 2015-07-02
## 19 Check-Out 2015-07-02
## 20 Check-Out 2015-07-05
## 21 Check-Out 2015-07-06
## 22 Check-Out 2015-07-07
## 23 Check-Out 2015-07-07
## 24 Check-Out 2015-07-07
## 25 Check-Out 2015-07-01
describe(hotel)
## hotel
##
## 32 Variables 119390 Observations
## --------------------------------------------------------------------------------
## hotel
## n missing distinct
## 119390 0 2
##
## Value City Hotel Resort Hotel
## Frequency 79330 40060
## Proportion 0.664 0.336
## --------------------------------------------------------------------------------
## is_canceled
## n missing distinct Info Sum Mean Gmd
## 119390 0 2 0.7 44224 0.3704 0.4664
##
## --------------------------------------------------------------------------------
## lead_time
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 479 1 104 112.5 0 3
## .25 .50 .75 .90 .95
## 18 69 160 265 320
##
## lowest : 0 1 2 3 4, highest: 622 626 629 709 737
## --------------------------------------------------------------------------------
## arrival_date_year
## n missing distinct Info Mean Gmd
## 119390 0 3 0.847 2016 0.7499
##
## Value 2015 2016 2017
## Frequency 21996 56707 40687
## Proportion 0.184 0.475 0.341
## --------------------------------------------------------------------------------
## arrival_date_month
## n missing distinct
## 119390 0 12
##
## lowest : April August December February January
## highest: March May November October September
##
## Value April August December February January July
## Frequency 11089 13877 6780 8068 5929 12661
## Proportion 0.093 0.116 0.057 0.068 0.050 0.106
##
## Value June March May November October September
## Frequency 10939 9794 11791 6794 11160 10508
## Proportion 0.092 0.082 0.099 0.057 0.093 0.088
## --------------------------------------------------------------------------------
## arrival_date_week_number
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 53 1 27.17 15.68 5 8
## .25 .50 .75 .90 .95
## 16 28 38 46 49
##
## lowest : 1 2 3 4 5, highest: 49 50 51 52 53
## --------------------------------------------------------------------------------
## arrival_date_day_of_month
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 31 0.999 15.8 10.13 2 4
## .25 .50 .75 .90 .95
## 8 16 23 28 30
##
## lowest : 1 2 3 4 5, highest: 27 28 29 30 31
## --------------------------------------------------------------------------------
## stays_in_weekend_nights
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 17 0.879 0.9276 1.026 0 0
## .25 .50 .75 .90 .95
## 0 1 2 2 2
##
## lowest : 0 1 2 3 4, highest: 13 14 16 18 19
##
## Value 0 1 2 3 4 5 6 7 8 9 10
## Frequency 51998 30626 33308 1259 1855 79 153 19 60 11 7
## Proportion 0.436 0.257 0.279 0.011 0.016 0.001 0.001 0.000 0.001 0.000 0.000
##
## Value 12 13 14 16 18 19
## Frequency 5 3 2 3 1 1
## Proportion 0.000 0.000 0.000 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## stays_in_week_nights
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 35 0.953 2.5 1.865 0 1
## .25 .50 .75 .90 .95
## 1 2 3 5 5
##
## lowest : 0 1 2 3 4, highest: 35 40 41 42 50
## --------------------------------------------------------------------------------
## adults
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 14 0.569 1.856 0.4287 1 1
## .25 .50 .75 .90 .95
## 2 2 2 2 3
##
## lowest : 0 1 2 3 4, highest: 26 27 40 50 55
##
## Value 0 1 2 3 4 5 6 10 20 26 27
## Frequency 403 23027 89680 6202 62 2 1 1 2 5 2
## Proportion 0.003 0.193 0.751 0.052 0.001 0.000 0.000 0.000 0.000 0.000 0.000
##
## Value 40 50 55
## Frequency 1 1 1
## Proportion 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## children
## n missing distinct Info Mean Gmd
## 119390 0 5 0.201 0.1039 0.1955
##
## lowest : 0 1 2 3 10, highest: 0 1 2 3 10
##
## Value 0 1 2 3 10
## Frequency 110800 4861 3652 76 1
## Proportion 0.928 0.041 0.031 0.001 0.000
## --------------------------------------------------------------------------------
## babies
## n missing distinct Info Mean Gmd
## 119390 0 5 0.023 0.007949 0.01578
##
## lowest : 0 1 2 9 10, highest: 0 1 2 9 10
##
## Value 0 1 2 9 10
## Frequency 118473 900 15 1 1
## Proportion 0.992 0.008 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## meal
## n missing distinct
## 119390 0 5
##
## lowest : BB FB HB SC Undefined
## highest: BB FB HB SC Undefined
##
## Value BB FB HB SC Undefined
## Frequency 92310 798 14463 10650 1169
## Proportion 0.773 0.007 0.121 0.089 0.010
## --------------------------------------------------------------------------------
## country
## n missing distinct
## 119390 0 178
##
## lowest : ABW AGO AIA ALB AND, highest: VGB VNM ZAF ZMB ZWE
## --------------------------------------------------------------------------------
## market_segment
## n missing distinct
## 119390 0 8
##
## lowest : Aviation Complementary Corporate Direct Groups
## highest: Direct Groups Offline TA/TO Online TA Undefined
##
## Value Aviation Complementary Corporate Direct
## Frequency 237 743 5295 12606
## Proportion 0.002 0.006 0.044 0.106
##
## Value Groups Offline TA/TO Online TA Undefined
## Frequency 19811 24219 56477 2
## Proportion 0.166 0.203 0.473 0.000
## --------------------------------------------------------------------------------
## distribution_channel
## n missing distinct
## 119390 0 5
##
## lowest : Corporate Direct GDS TA/TO Undefined
## highest: Corporate Direct GDS TA/TO Undefined
##
## Value Corporate Direct GDS TA/TO Undefined
## Frequency 6677 14645 193 97870 5
## Proportion 0.056 0.123 0.002 0.820 0.000
## --------------------------------------------------------------------------------
## is_repeated_guest
## n missing distinct Info Sum Mean Gmd
## 119390 0 2 0.093 3810 0.03191 0.06179
##
## --------------------------------------------------------------------------------
## previous_cancellations
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 15 0.154 0.08712 0.1682 0 0
## .25 .50 .75 .90 .95
## 0 0 0 0 1
##
## lowest : 0 1 2 3 4, highest: 19 21 24 25 26
##
## Value 0 1 2 3 4 5 6 11 13
## Frequency 112906 6051 116 65 31 19 22 35 12
## Proportion 0.946 0.051 0.001 0.001 0.000 0.000 0.000 0.000 0.000
##
## Value 14 19 21 24 25 26
## Frequency 14 19 1 48 25 26
## Proportion 0.000 0.000 0.000 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## previous_bookings_not_canceled
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 73 0.088 0.1371 0.2708 0 0
## .25 .50 .75 .90 .95
## 0 0 0 0 0
##
## lowest : 0 1 2 3 4, highest: 68 69 70 71 72
## --------------------------------------------------------------------------------
## reserved_room_type
## n missing distinct
## 119390 0 10
##
## lowest : A B C D E, highest: F G H L P
##
## Value A B C D E F G H L P
## Frequency 85994 1118 932 19201 6535 2897 2094 601 6 12
## Proportion 0.720 0.009 0.008 0.161 0.055 0.024 0.018 0.005 0.000 0.000
## --------------------------------------------------------------------------------
## assigned_room_type
## n missing distinct
## 119390 0 12
##
## lowest : A B C D E, highest: H I K L P
##
## Value A B C D E F G H I K L
## Frequency 74053 2163 2375 25322 7806 3751 2553 712 363 279 1
## Proportion 0.620 0.018 0.020 0.212 0.065 0.031 0.021 0.006 0.003 0.002 0.000
##
## Value P
## Frequency 12
## Proportion 0.000
## --------------------------------------------------------------------------------
## booking_changes
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 21 0.388 0.2211 0.3919 0 0
## .25 .50 .75 .90 .95
## 0 0 0 1 1
##
## lowest : 0 1 2 3 4, highest: 16 17 18 20 21
## --------------------------------------------------------------------------------
## deposit_type
## n missing distinct
## 119390 0 3
##
## Value No Deposit Non Refund Refundable
## Frequency 104641 14587 162
## Proportion 0.876 0.122 0.001
## --------------------------------------------------------------------------------
## agent
## n missing distinct
## 119390 0 334
##
## lowest : 1 10 103 104 105 , highest: 95 96 98 99 NULL
## --------------------------------------------------------------------------------
## company
## n missing distinct
## 119390 0 353
##
## lowest : 10 100 101 102 103 , highest: 93 94 96 99 NULL
## --------------------------------------------------------------------------------
## days_in_waiting_list
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 128 0.09 2.321 4.559 0 0
## .25 .50 .75 .90 .95
## 0 0 0 0 0
##
## lowest : 0 1 2 3 4, highest: 236 259 330 379 391
## --------------------------------------------------------------------------------
## customer_type
## n missing distinct
## 119390 0 4
##
## Value Contract Group Transient Transient-Party
## Frequency 4076 577 89613 25124
## Proportion 0.034 0.005 0.751 0.210
## --------------------------------------------------------------------------------
## adr
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 8879 1 101.8 51.91 38.40 50.00
## .25 .50 .75 .90 .95
## 69.29 94.58 126.00 164.00 193.50
##
## lowest : -6.38 0.00 0.26 0.50 1.00
## highest: 450.00 451.50 508.00 510.00 5400.00
##
## Value 0 50 100 150 200 250 300 350 400 450 500
## Frequency 2437 35085 50975 21915 6234 2156 463 99 19 4 2
## Proportion 0.020 0.294 0.427 0.184 0.052 0.018 0.004 0.001 0.000 0.000 0.000
##
## Value 5400
## Frequency 1
## Proportion 0.000
##
## For the frequency table, variable is rounded to the nearest 50
## --------------------------------------------------------------------------------
## required_car_parking_spaces
## n missing distinct Info Mean Gmd
## 119390 0 5 0.175 0.06252 0.1173
##
## lowest : 0 1 2 3 8, highest: 0 1 2 3 8
##
## Value 0 1 2 3 8
## Frequency 111974 7383 28 3 2
## Proportion 0.938 0.062 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## total_of_special_requests
## n missing distinct Info Mean Gmd
## 119390 0 6 0.773 0.5714 0.7684
##
## lowest : 0 1 2 3 4, highest: 1 2 3 4 5
##
## Value 0 1 2 3 4 5
## Frequency 70318 33226 12969 2497 340 40
## Proportion 0.589 0.278 0.109 0.021 0.003 0.000
## --------------------------------------------------------------------------------
## reservation_status
## n missing distinct
## 119390 0 3
##
## Value Canceled Check-Out No-Show
## Frequency 43017 75166 1207
## Proportion 0.36 0.63 0.01
## --------------------------------------------------------------------------------
## reservation_status_date
## n missing distinct
## 119390 0 926
##
## lowest : 2014-10-17 2014-11-18 2015-01-01 2015-01-02 2015-01-18
## highest: 2017-09-08 2017-09-09 2017-09-10 2017-09-12 2017-09-14
## --------------------------------------------------------------------------------
There is a correlation and it is strongly negative. This should let hotel executives know that there are weeks within the year that are NOT attractive to their guests and will be periods of less revenue.
cor(hotel$arrival_date_week_number, hotel$arrival_date_year)
## [1] -0.5405613
Staying in a city hotel is much more popular and occurs more often then resorts. Most likley because travel for business is more frequent than leisure (a resort)
ggplot(data = hotel, aes(x = hotel)) +
geom_bar()
THe graph lets us know that there are more frequent hotel stayins in both city hotels and resort hotels between the 10th and 23rd. Beginning of month and end of month travel is less likely, although still occurs.
ggplot(data = hotel, aes(x = hotel, y = arrival_date_day_of_month)) +
geom_boxplot()
Yes there are! We found that groups are LESS likely to cancel than regular reservations. Hotels should attract groups as often as possible to boost revenue.
group_vs_cancel <- lm((is_canceled) ~ customer_type + previous_cancellations, data = hotel)
summary(group_vs_cancel)
##
## Call:
## lm(formula = (is_canceled) ~ customer_type + previous_cancellations,
## data = hotel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2442 -0.4027 -0.2471 0.5973 0.9001
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.295021 0.007454 39.581 < 2e-16 ***
## customer_typeGroup -0.195124 0.021143 -9.229 < 2e-16 ***
## customer_typeTransient 0.107638 0.007616 14.133 < 2e-16 ***
## customer_typeTransient-Party -0.047966 0.008028 -5.975 2.31e-09 ***
## previous_cancellations 0.064737 0.001630 39.712 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4753 on 119385 degrees of freedom
## Multiple R-squared: 0.03141, Adjusted R-squared: 0.03138
## F-statistic: 967.9 on 4 and 119385 DF, p-value: < 2.2e-16
There are many correlations between room types and cancellation. A regression model tells us that room type P will cause the most cancellations, whereas room E will not.
reservation_vs_cancel <- lm((is_canceled) ~ reserved_room_type, data = hotel)
summary(reservation_vs_cancel)
##
## Call:
## lm(formula = (is_canceled) ~ reserved_room_type, data = hotel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4077 -0.3911 -0.3178 0.6089 0.7071
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.391074 0.001642 238.114 < 2e-16 ***
## reserved_room_typeB -0.061915 0.014497 -4.271 1.95e-05 ***
## reserved_room_typeC -0.060602 0.015861 -3.821 0.000133 ***
## reserved_room_typeD -0.073278 0.003844 -19.062 < 2e-16 ***
## reserved_room_typeE -0.098189 0.006180 -15.888 < 2e-16 ***
## reserved_room_typeF -0.087311 0.009098 -9.597 < 2e-16 ***
## reserved_room_typeG -0.026699 0.010652 -2.506 0.012197 *
## reserved_room_typeH 0.016580 0.019714 0.841 0.400341
## reserved_room_typeL -0.057740 0.196629 -0.294 0.769025
## reserved_room_typeP 0.608926 0.139043 4.379 1.19e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4816 on 119380 degrees of freedom
## Multiple R-squared: 0.005426, Adjusted R-squared: 0.005351
## F-statistic: 72.37 on 9 and 119380 DF, p-value: < 2.2e-16
Problem: How can we help our client, a resort hotel, better tailor their marketing platform?
The key variables that we utilized in our analysis were Hotel Type, Arrival Date, Customer Type, Previous Cancellations, and Reserved Room Type. Through a correlation model, two gg-plots, and two linear regressions, we were able to successfully uncover key insights to help shape our client’s marketing strategy as a resort hotel.
Although no strong positive correlations exist between the week of the year and booking, there are a few weeks that illustrate a negative correlation—these would be weeks to offer incentives.
City hotels receive much more frequent, consistent, and steady business than resort hotels.
In both city and resort hotels, most stays occur between the 10th and 23rd of the month. It’s less popular to stay at hotels at the beginning or end of the month.
Groups are least likely to cancel their reservations.
Room type P often cancels whereas room type E is least likely to cancel a reservation.
With these insights, we will advise our client to:
Offer discounted rates at beginning and end of the month.
Offer incentive to receive a discounted rate for customer’s next stay if they book during an unpopular week of the year.
Push email campaigns and online ads heavily during the beginning and end of the month.
Market heavily towards groups.
Enforce more frequent confirmation messages for individuals and reservations with room type P.
Perform a competition analysis on what city hotels offer: what can resort hotels offer than city hotels cannot? Amenities, customer service, all inclusive packages? What are city hotels offering to attract frequent and steady business?
The analysis could be improved with a more comprehensive set of data—no missing data, more entries, etc. Errors may have produced inaccurate data through techniques, method of collection, and study parameters. One could also increase the volume of correlational models and regressions ran within the analysis to provide a broader perspective of the data. Adding a variety of visualizations in addition to bar charts and plots could also be advantageous to the client as it’s easily readable.
Yes, to the best of our ability.
Yes, to the best of our ability.
Yes, we believe the knowledge applied to our strategy well positioned us for a successful report.
Yes, no errors in running.