The first essential piece to any marketing strategy is a clear understanding of the target customer. If hotels (or any organizations) fail to do this, their marketing efforts will essentially serve as a sunk cost.
However, when strategized correctly, marketing efforts can significantly boost ROI. In order to attract the appropriate audiences, hotels must analyze their typical consumer’s behavior. In addition, they must understand their direct competition to build a sustainable competitive advantage. In our analysis, our goal is to assist our client, a resort hotel in South Carolina, to boost marketing ROI with a 5:1 ratio.
We plan to assist our client in boosting their ROI 5:1 and marketing effectively against nearby city hotels. In this analysis, we’ll compare data around average daily rates, number of special requests, average stays, busiest weeks, number of children during stay, and weekday frequencies to derive insights for resort hotels against city hotels. With our insights, we’ll be able to build and automate targeted email campaigns and ads in addition to competitive pricing. We’ll use our insights to tailor the content within the campaigns and push them heavily during statistically weaker sales periods.
We will analyze the frequency of stays during the week to determine which day (on average) is the most popular. Additionally, we will identify which parameters correlate with
If we can derive which days of the week are statistically least popular for resort hotels, we can relay this information to our client and help them tailor their marketing material accordingly. For instance, if we find that Monday is the least popular day of the week to stay at a resort hotel, we will advise our client to slightly reduce the price of Monday bookings to incentivize Monday sales.
If we can derive average daily rates for resort hotels vs. city hotels, we can help our client build competitive pricing. For example, if we find that the average stay for a resort hotel is 105.00 per night, we might advise our client to offer pricing between 95.00-97.00 during slower periods.
Similarly, if we can derive which weeks or months of the year are least popular for resort hotels, we can help our client prepare incentives for weaker sales periods. For example, if a resort hotel typically has less stays during the summer months, we can offer a discounted stay to customers in the winter months if they book a trip during the summer.
Lastly, if we analyze the number of children who stay at a resort hotel vs. a city hotel and find that there’s a significant difference, we will advise our client to tailor their marketing strategy accordingly. If we find that children stay much more frequently at resort hotels vs. city hotels, then we will advise our client to include photos of children in their marketing material.
Overall, we’ll help our client:
Determine competitive pricing
Identify key parameters for target ads on Facebook, Instagram, Twitter (age, time of ad, day of ad, etc.)
Reap the benefits of strategic marketing —> ideally 5:1 ROI through these efforts
We have loaded the packages upfront.
library(tidyverse)
library(glue)
library(Hmisc)
Package tidyverse is loaded as it includes many core packages such as ggplot2, dplyr, and tidyr. These are important for decision analysis as we explore the data set, clean and organize the file, and make any structural changes needed.
Package glue is loaded to make it easier to paste as it serves as an alternative.
Package Hmisc is loaded to show summary unique statistics on the variables in our set not able to obtain through standard Str() and Summary () functions.
We will be using the Hotel Bookings Data. The data for our project was obtained from an open hotel booking demand dataset from Antonio, Almeida and Nunes, 2019
This dataset describes two sets of information related to Resort Hotels and City Hotels. Both Resort Hotels and City Hotels share the same structure and each observation represents a specific hotel booking. This dataset was collected to aid in research and education in revenue management and data mining. The timeline for this data is between July 01, 2015 and August 31, 2017.
An explanation of the variables in the set is as follows:
hotel: The type of hotel the observation relates to (Resort or City).
is_canceled: Was the booking canceled or not? 1 = canceled, 0 = Not.
lead_time: number of days between the entered date and arrival date.
arrival_date_year: Year of arrival.
arrival_date_month: Month of arrival.
arrival_date_week_number: Week of the year the arrival occured.
arrival_date_day_of_month: Day of the arrival.
stays_in_weekend_nights: Number of weekend nights the guest stayed.
stays_in_week_nights: Number of week nights the guest stayed.
adults: Number of adults in the reservation.
children: Number of children in the reservation.
babies: Number of babies in the reservation.
meal: Undefined/SC – no meal package; BB – Bed & Breakfast; HB – Half board (breakfast and one other meal – usually dinner); FB – Full board (breakfast, lunch and dinner).
country: County of origin.
market_segment: TA = travel agents, TO = tour operators.
distribution_channel: How they booked. TA = travel agent and TO = tour operator.
is_repeated_guest: Is the guest a repeat customer? 1 = Yes and 0 = No.
previous_cancellations: Number of prior bookings canceled before the current booking.
previous_bookings_not_canceled: Number of previous bookings not cancelled by the customer before the current booking.
reserved_room_type: Room type reserved.
assigned_room_type: Room type assigned. Can differ from the reserved type.
booking_changes: Number of changes/amendments made to the booking until check in or cnacellation.
deposit_type: No Deposit = no deposit was made; Non Refund = a deposit was made in the value of the total stay cost; Refundable = a deposit was made with a value under the total cost of stay.
agent: ID of the travel agency where the booking was made.
company: ID of the company that made the booking.
days_in_waiting_list: How long the customer had to wait to get their reservation.
customer_type: Contract = when the booking has a contract associated to it; Group = when the booking is associated with a group; Transient = when the booking is not part of a group or contract, and is not associated to other transient bookings; Transient-party = when the booking is transient, but is associated to at least other transient booking.
adr: Average daily rate of the booking.
required_car_parking_spaces: How many parking spaces are needed for the booking.
total_of_special_requests: Number of special requests made.
reservation_status: Last status of the reservation.
reservation_status_date: Date of the last reservation status.
We read the data in and determine any missing values through the below:
hotel <- read.csv("C:/Users/bellmatj/OneDrive - University of Cincinnati/Data Wrangling/Project/hotels.csv")
colSums(is.na(hotel))
## hotel is_canceled
## 0 0
## lead_time arrival_date_year
## 0 0
## arrival_date_month arrival_date_week_number
## 0 0
## arrival_date_day_of_month stays_in_weekend_nights
## 0 0
## stays_in_week_nights adults
## 0 0
## children babies
## 4 0
## meal country
## 0 0
## market_segment distribution_channel
## 0 0
## is_repeated_guest previous_cancellations
## 0 0
## previous_bookings_not_canceled reserved_room_type
## 0 0
## assigned_room_type booking_changes
## 0 0
## deposit_type agent
## 0 0
## company days_in_waiting_list
## 0 0
## customer_type adr
## 0 0
## required_car_parking_spaces total_of_special_requests
## 0 0
## reservation_status reservation_status_date
## 0 0
There are 119,390 observations in this table across 32 variables:
nrow(hotel)
## [1] 119390
ncol(hotel)
## [1] 32
During this step we replaced the NA values in the children column with 0, similar to how it is structured in the column babies. Overall, the dataset was very clean and needed minimal change.
hotel$children[is.na(hotel$children)] = 0
The data is clean and we have displayed the first 25 observations in the set.
head(hotel, 25)
## hotel is_canceled lead_time arrival_date_year arrival_date_month
## 1 Resort Hotel 0 342 2015 July
## 2 Resort Hotel 0 737 2015 July
## 3 Resort Hotel 0 7 2015 July
## 4 Resort Hotel 0 13 2015 July
## 5 Resort Hotel 0 14 2015 July
## 6 Resort Hotel 0 14 2015 July
## 7 Resort Hotel 0 0 2015 July
## 8 Resort Hotel 0 9 2015 July
## 9 Resort Hotel 1 85 2015 July
## 10 Resort Hotel 1 75 2015 July
## 11 Resort Hotel 1 23 2015 July
## 12 Resort Hotel 0 35 2015 July
## 13 Resort Hotel 0 68 2015 July
## 14 Resort Hotel 0 18 2015 July
## 15 Resort Hotel 0 37 2015 July
## 16 Resort Hotel 0 68 2015 July
## 17 Resort Hotel 0 37 2015 July
## 18 Resort Hotel 0 12 2015 July
## 19 Resort Hotel 0 0 2015 July
## 20 Resort Hotel 0 7 2015 July
## 21 Resort Hotel 0 37 2015 July
## 22 Resort Hotel 0 72 2015 July
## 23 Resort Hotel 0 72 2015 July
## 24 Resort Hotel 0 72 2015 July
## 25 Resort Hotel 0 127 2015 July
## arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
## 1 27 1 0
## 2 27 1 0
## 3 27 1 0
## 4 27 1 0
## 5 27 1 0
## 6 27 1 0
## 7 27 1 0
## 8 27 1 0
## 9 27 1 0
## 10 27 1 0
## 11 27 1 0
## 12 27 1 0
## 13 27 1 0
## 14 27 1 0
## 15 27 1 0
## 16 27 1 0
## 17 27 1 0
## 18 27 1 0
## 19 27 1 0
## 20 27 1 0
## 21 27 1 1
## 22 27 1 2
## 23 27 1 2
## 24 27 1 2
## 25 27 1 2
## stays_in_week_nights adults children babies meal country market_segment
## 1 0 2 0 0 BB PRT Direct
## 2 0 2 0 0 BB PRT Direct
## 3 1 1 0 0 BB GBR Direct
## 4 1 1 0 0 BB GBR Corporate
## 5 2 2 0 0 BB GBR Online TA
## 6 2 2 0 0 BB GBR Online TA
## 7 2 2 0 0 BB PRT Direct
## 8 2 2 0 0 FB PRT Direct
## 9 3 2 0 0 BB PRT Online TA
## 10 3 2 0 0 HB PRT Offline TA/TO
## 11 4 2 0 0 BB PRT Online TA
## 12 4 2 0 0 HB PRT Online TA
## 13 4 2 0 0 BB USA Online TA
## 14 4 2 1 0 HB ESP Online TA
## 15 4 2 0 0 BB PRT Online TA
## 16 4 2 0 0 BB IRL Online TA
## 17 4 2 0 0 BB PRT Offline TA/TO
## 18 1 2 0 0 BB IRL Online TA
## 19 1 2 0 0 BB FRA Corporate
## 20 4 2 0 0 BB GBR Direct
## 21 4 1 0 0 BB GBR Online TA
## 22 4 2 0 0 BB PRT Direct
## 23 4 2 0 0 BB PRT Direct
## 24 4 2 0 0 BB PRT Direct
## 25 5 2 0 0 HB GBR Offline TA/TO
## distribution_channel is_repeated_guest previous_cancellations
## 1 Direct 0 0
## 2 Direct 0 0
## 3 Direct 0 0
## 4 Corporate 0 0
## 5 TA/TO 0 0
## 6 TA/TO 0 0
## 7 Direct 0 0
## 8 Direct 0 0
## 9 TA/TO 0 0
## 10 TA/TO 0 0
## 11 TA/TO 0 0
## 12 TA/TO 0 0
## 13 TA/TO 0 0
## 14 TA/TO 0 0
## 15 TA/TO 0 0
## 16 TA/TO 0 0
## 17 TA/TO 0 0
## 18 TA/TO 0 0
## 19 Corporate 0 0
## 20 Direct 0 0
## 21 TA/TO 0 0
## 22 Direct 0 0
## 23 Direct 0 0
## 24 Direct 0 0
## 25 TA/TO 0 0
## previous_bookings_not_canceled reserved_room_type assigned_room_type
## 1 0 C C
## 2 0 C C
## 3 0 A C
## 4 0 A A
## 5 0 A A
## 6 0 A A
## 7 0 C C
## 8 0 C C
## 9 0 A A
## 10 0 D D
## 11 0 E E
## 12 0 D D
## 13 0 D E
## 14 0 G G
## 15 0 E E
## 16 0 D E
## 17 0 E E
## 18 0 A E
## 19 0 A G
## 20 0 G G
## 21 0 F F
## 22 0 A A
## 23 0 A A
## 24 0 D D
## 25 0 D I
## booking_changes deposit_type agent company days_in_waiting_list
## 1 3 No Deposit NULL NULL 0
## 2 4 No Deposit NULL NULL 0
## 3 0 No Deposit NULL NULL 0
## 4 0 No Deposit 304 NULL 0
## 5 0 No Deposit 240 NULL 0
## 6 0 No Deposit 240 NULL 0
## 7 0 No Deposit NULL NULL 0
## 8 0 No Deposit 303 NULL 0
## 9 0 No Deposit 240 NULL 0
## 10 0 No Deposit 15 NULL 0
## 11 0 No Deposit 240 NULL 0
## 12 0 No Deposit 240 NULL 0
## 13 0 No Deposit 240 NULL 0
## 14 1 No Deposit 241 NULL 0
## 15 0 No Deposit 241 NULL 0
## 16 0 No Deposit 240 NULL 0
## 17 0 No Deposit 8 NULL 0
## 18 0 No Deposit 240 NULL 0
## 19 0 No Deposit NULL 110 0
## 20 0 No Deposit 250 NULL 0
## 21 0 No Deposit 241 NULL 0
## 22 1 No Deposit 250 NULL 0
## 23 1 No Deposit 250 NULL 0
## 24 1 No Deposit 250 NULL 0
## 25 0 No Deposit 115 NULL 0
## customer_type adr required_car_parking_spaces total_of_special_requests
## 1 Transient 0.00 0 0
## 2 Transient 0.00 0 0
## 3 Transient 75.00 0 0
## 4 Transient 75.00 0 0
## 5 Transient 98.00 0 1
## 6 Transient 98.00 0 1
## 7 Transient 107.00 0 0
## 8 Transient 103.00 0 1
## 9 Transient 82.00 0 1
## 10 Transient 105.50 0 0
## 11 Transient 123.00 0 0
## 12 Transient 145.00 0 0
## 13 Transient 97.00 0 3
## 14 Transient 154.77 0 1
## 15 Transient 94.71 0 0
## 16 Transient 97.00 0 3
## 17 Contract 97.50 0 0
## 18 Transient 88.20 0 0
## 19 Transient 107.42 0 0
## 20 Transient 153.00 0 1
## 21 Transient 97.29 0 1
## 22 Transient 84.67 0 1
## 23 Transient 84.67 0 1
## 24 Transient 99.67 0 1
## 25 Contract 94.95 0 1
## reservation_status reservation_status_date
## 1 Check-Out 2015-07-01
## 2 Check-Out 2015-07-01
## 3 Check-Out 2015-07-02
## 4 Check-Out 2015-07-02
## 5 Check-Out 2015-07-03
## 6 Check-Out 2015-07-03
## 7 Check-Out 2015-07-03
## 8 Check-Out 2015-07-03
## 9 Canceled 2015-05-06
## 10 Canceled 2015-04-22
## 11 Canceled 2015-06-23
## 12 Check-Out 2015-07-05
## 13 Check-Out 2015-07-05
## 14 Check-Out 2015-07-05
## 15 Check-Out 2015-07-05
## 16 Check-Out 2015-07-05
## 17 Check-Out 2015-07-05
## 18 Check-Out 2015-07-02
## 19 Check-Out 2015-07-02
## 20 Check-Out 2015-07-05
## 21 Check-Out 2015-07-06
## 22 Check-Out 2015-07-07
## 23 Check-Out 2015-07-07
## 24 Check-Out 2015-07-07
## 25 Check-Out 2015-07-01
describe(hotel)
## hotel
##
## 32 Variables 119390 Observations
## --------------------------------------------------------------------------------
## hotel
## n missing distinct
## 119390 0 2
##
## Value City Hotel Resort Hotel
## Frequency 79330 40060
## Proportion 0.664 0.336
## --------------------------------------------------------------------------------
## is_canceled
## n missing distinct Info Sum Mean Gmd
## 119390 0 2 0.7 44224 0.3704 0.4664
##
## --------------------------------------------------------------------------------
## lead_time
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 479 1 104 112.5 0 3
## .25 .50 .75 .90 .95
## 18 69 160 265 320
##
## lowest : 0 1 2 3 4, highest: 622 626 629 709 737
## --------------------------------------------------------------------------------
## arrival_date_year
## n missing distinct Info Mean Gmd
## 119390 0 3 0.847 2016 0.7499
##
## Value 2015 2016 2017
## Frequency 21996 56707 40687
## Proportion 0.184 0.475 0.341
## --------------------------------------------------------------------------------
## arrival_date_month
## n missing distinct
## 119390 0 12
##
## lowest : April August December February January
## highest: March May November October September
##
## Value April August December February January July
## Frequency 11089 13877 6780 8068 5929 12661
## Proportion 0.093 0.116 0.057 0.068 0.050 0.106
##
## Value June March May November October September
## Frequency 10939 9794 11791 6794 11160 10508
## Proportion 0.092 0.082 0.099 0.057 0.093 0.088
## --------------------------------------------------------------------------------
## arrival_date_week_number
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 53 1 27.17 15.68 5 8
## .25 .50 .75 .90 .95
## 16 28 38 46 49
##
## lowest : 1 2 3 4 5, highest: 49 50 51 52 53
## --------------------------------------------------------------------------------
## arrival_date_day_of_month
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 31 0.999 15.8 10.13 2 4
## .25 .50 .75 .90 .95
## 8 16 23 28 30
##
## lowest : 1 2 3 4 5, highest: 27 28 29 30 31
## --------------------------------------------------------------------------------
## stays_in_weekend_nights
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 17 0.879 0.9276 1.026 0 0
## .25 .50 .75 .90 .95
## 0 1 2 2 2
##
## lowest : 0 1 2 3 4, highest: 13 14 16 18 19
##
## Value 0 1 2 3 4 5 6 7 8 9 10
## Frequency 51998 30626 33308 1259 1855 79 153 19 60 11 7
## Proportion 0.436 0.257 0.279 0.011 0.016 0.001 0.001 0.000 0.001 0.000 0.000
##
## Value 12 13 14 16 18 19
## Frequency 5 3 2 3 1 1
## Proportion 0.000 0.000 0.000 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## stays_in_week_nights
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 35 0.953 2.5 1.865 0 1
## .25 .50 .75 .90 .95
## 1 2 3 5 5
##
## lowest : 0 1 2 3 4, highest: 35 40 41 42 50
## --------------------------------------------------------------------------------
## adults
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 14 0.569 1.856 0.4287 1 1
## .25 .50 .75 .90 .95
## 2 2 2 2 3
##
## lowest : 0 1 2 3 4, highest: 26 27 40 50 55
##
## Value 0 1 2 3 4 5 6 10 20 26 27
## Frequency 403 23027 89680 6202 62 2 1 1 2 5 2
## Proportion 0.003 0.193 0.751 0.052 0.001 0.000 0.000 0.000 0.000 0.000 0.000
##
## Value 40 50 55
## Frequency 1 1 1
## Proportion 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## children
## n missing distinct Info Mean Gmd
## 119390 0 5 0.201 0.1039 0.1955
##
## lowest : 0 1 2 3 10, highest: 0 1 2 3 10
##
## Value 0 1 2 3 10
## Frequency 110800 4861 3652 76 1
## Proportion 0.928 0.041 0.031 0.001 0.000
## --------------------------------------------------------------------------------
## babies
## n missing distinct Info Mean Gmd
## 119390 0 5 0.023 0.007949 0.01578
##
## lowest : 0 1 2 9 10, highest: 0 1 2 9 10
##
## Value 0 1 2 9 10
## Frequency 118473 900 15 1 1
## Proportion 0.992 0.008 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## meal
## n missing distinct
## 119390 0 5
##
## lowest : BB FB HB SC Undefined
## highest: BB FB HB SC Undefined
##
## Value BB FB HB SC Undefined
## Frequency 92310 798 14463 10650 1169
## Proportion 0.773 0.007 0.121 0.089 0.010
## --------------------------------------------------------------------------------
## country
## n missing distinct
## 119390 0 178
##
## lowest : ABW AGO AIA ALB AND, highest: VGB VNM ZAF ZMB ZWE
## --------------------------------------------------------------------------------
## market_segment
## n missing distinct
## 119390 0 8
##
## lowest : Aviation Complementary Corporate Direct Groups
## highest: Direct Groups Offline TA/TO Online TA Undefined
##
## Value Aviation Complementary Corporate Direct
## Frequency 237 743 5295 12606
## Proportion 0.002 0.006 0.044 0.106
##
## Value Groups Offline TA/TO Online TA Undefined
## Frequency 19811 24219 56477 2
## Proportion 0.166 0.203 0.473 0.000
## --------------------------------------------------------------------------------
## distribution_channel
## n missing distinct
## 119390 0 5
##
## lowest : Corporate Direct GDS TA/TO Undefined
## highest: Corporate Direct GDS TA/TO Undefined
##
## Value Corporate Direct GDS TA/TO Undefined
## Frequency 6677 14645 193 97870 5
## Proportion 0.056 0.123 0.002 0.820 0.000
## --------------------------------------------------------------------------------
## is_repeated_guest
## n missing distinct Info Sum Mean Gmd
## 119390 0 2 0.093 3810 0.03191 0.06179
##
## --------------------------------------------------------------------------------
## previous_cancellations
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 15 0.154 0.08712 0.1682 0 0
## .25 .50 .75 .90 .95
## 0 0 0 0 1
##
## lowest : 0 1 2 3 4, highest: 19 21 24 25 26
##
## Value 0 1 2 3 4 5 6 11 13
## Frequency 112906 6051 116 65 31 19 22 35 12
## Proportion 0.946 0.051 0.001 0.001 0.000 0.000 0.000 0.000 0.000
##
## Value 14 19 21 24 25 26
## Frequency 14 19 1 48 25 26
## Proportion 0.000 0.000 0.000 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## previous_bookings_not_canceled
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 73 0.088 0.1371 0.2708 0 0
## .25 .50 .75 .90 .95
## 0 0 0 0 0
##
## lowest : 0 1 2 3 4, highest: 68 69 70 71 72
## --------------------------------------------------------------------------------
## reserved_room_type
## n missing distinct
## 119390 0 10
##
## lowest : A B C D E, highest: F G H L P
##
## Value A B C D E F G H L P
## Frequency 85994 1118 932 19201 6535 2897 2094 601 6 12
## Proportion 0.720 0.009 0.008 0.161 0.055 0.024 0.018 0.005 0.000 0.000
## --------------------------------------------------------------------------------
## assigned_room_type
## n missing distinct
## 119390 0 12
##
## lowest : A B C D E, highest: H I K L P
##
## Value A B C D E F G H I K L
## Frequency 74053 2163 2375 25322 7806 3751 2553 712 363 279 1
## Proportion 0.620 0.018 0.020 0.212 0.065 0.031 0.021 0.006 0.003 0.002 0.000
##
## Value P
## Frequency 12
## Proportion 0.000
## --------------------------------------------------------------------------------
## booking_changes
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 21 0.388 0.2211 0.3919 0 0
## .25 .50 .75 .90 .95
## 0 0 0 1 1
##
## lowest : 0 1 2 3 4, highest: 16 17 18 20 21
## --------------------------------------------------------------------------------
## deposit_type
## n missing distinct
## 119390 0 3
##
## Value No Deposit Non Refund Refundable
## Frequency 104641 14587 162
## Proportion 0.876 0.122 0.001
## --------------------------------------------------------------------------------
## agent
## n missing distinct
## 119390 0 334
##
## lowest : 1 10 103 104 105 , highest: 95 96 98 99 NULL
## --------------------------------------------------------------------------------
## company
## n missing distinct
## 119390 0 353
##
## lowest : 10 100 101 102 103 , highest: 93 94 96 99 NULL
## --------------------------------------------------------------------------------
## days_in_waiting_list
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 128 0.09 2.321 4.559 0 0
## .25 .50 .75 .90 .95
## 0 0 0 0 0
##
## lowest : 0 1 2 3 4, highest: 236 259 330 379 391
## --------------------------------------------------------------------------------
## customer_type
## n missing distinct
## 119390 0 4
##
## Value Contract Group Transient Transient-Party
## Frequency 4076 577 89613 25124
## Proportion 0.034 0.005 0.751 0.210
## --------------------------------------------------------------------------------
## adr
## n missing distinct Info Mean Gmd .05 .10
## 119390 0 8879 1 101.8 51.91 38.40 50.00
## .25 .50 .75 .90 .95
## 69.29 94.58 126.00 164.00 193.50
##
## lowest : -6.38 0.00 0.26 0.50 1.00
## highest: 450.00 451.50 508.00 510.00 5400.00
##
## Value 0 50 100 150 200 250 300 350 400 450 500
## Frequency 2437 35085 50975 21915 6234 2156 463 99 19 4 2
## Proportion 0.020 0.294 0.427 0.184 0.052 0.018 0.004 0.001 0.000 0.000 0.000
##
## Value 5400
## Frequency 1
## Proportion 0.000
##
## For the frequency table, variable is rounded to the nearest 50
## --------------------------------------------------------------------------------
## required_car_parking_spaces
## n missing distinct Info Mean Gmd
## 119390 0 5 0.175 0.06252 0.1173
##
## lowest : 0 1 2 3 8, highest: 0 1 2 3 8
##
## Value 0 1 2 3 8
## Frequency 111974 7383 28 3 2
## Proportion 0.938 0.062 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## total_of_special_requests
## n missing distinct Info Mean Gmd
## 119390 0 6 0.773 0.5714 0.7684
##
## lowest : 0 1 2 3 4, highest: 1 2 3 4 5
##
## Value 0 1 2 3 4 5
## Frequency 70318 33226 12969 2497 340 40
## Proportion 0.589 0.278 0.109 0.021 0.003 0.000
## --------------------------------------------------------------------------------
## reservation_status
## n missing distinct
## 119390 0 3
##
## Value Canceled Check-Out No-Show
## Frequency 43017 75166 1207
## Proportion 0.36 0.63 0.01
## --------------------------------------------------------------------------------
## reservation_status_date
## n missing distinct
## 119390 0 926
##
## lowest : 2014-10-17 2014-11-18 2015-01-01 2015-01-02 2015-01-18
## highest: 2017-09-08 2017-09-09 2017-09-10 2017-09-12 2017-09-14
## --------------------------------------------------------------------------------
Questions we plan to answer:
Are there different times of the year when it’s more popular to stay at a resort vs. a city hotel?
Are there different days of the week that are more popular for a resort hotel vs. city hotel?
Is there a disparity between children staying more at resort hotels vs. city hotels?
Is there a difference in # of special requests?
Are there any correlations between group type and cancellations? This way, we can market more heavily towards groups that don’t typically cancel.
Strategy:
We will “slice and dice” the data. We will divide the data into two different categories: city vs. resort hotels. We will perform analysis on all necessary individual variables in regard to these two categories. Through this, we’ll be able to easily draw comparative solutions to the questions above.
Do you plan to slice and dice the data in different ways, create new variables, or join separate data frames to create new summary information?
We will not need to create new variables or join separate data. Our primary goal is splitting the data by resort vs. city hotels and analyzing specific variables. Ex: most popular day for stays in city hotels vs. resort hotels.
How could you summarize your data to answer key questions?
Bar charts will be the best way to visualize our insights. We plan to compare frequencies between resort hotels vs. city hotels, and bar charts are the most efficient way to do this. If we run a linear regression of group type against cancellations, we’ll likely create a scatter plot to help illustrate the regression.
Although we don’t foresee a need to add another variable, we aren’t certain how to do this if it was needed. Even if it’s not needed in this analysis, it could certainly be useful in the future.
We plan to utilize a linear regression for reservation type against cancellations. From this, we’ll be able to see if any positive correlations exist around any specific type of reservations. Ex: do groups tend to have a higher cancellation rate?
Yes, to the best of our ability.
Yes, to the best of our ability.
Yes, we believe the current knowledge we are applying to our strategy wells position us for a successful report at completion.
Yes, no errors in running.