Introduction

1.1 Provide an introduction that explains the problem statement you are addressing. Why should I be interested in this?

The first essential piece to any marketing strategy is a clear understanding of the target customer. If hotels (or any organizations) fail to do this, their marketing efforts will essentially serve as a sunk cost.

However, when strategized correctly, marketing efforts can significantly boost ROI. In order to attract the appropriate audiences, hotels must analyze their typical consumer’s behavior. In addition, they must understand their direct competition to build a sustainable competitive advantage. In our analysis, our goal is to assist our client, a resort hotel in South Carolina, to boost marketing ROI with a 5:1 ratio.

1.2 Provide a short explanation of how you plan to address this problem statement (the data used and the methodology employed).

We plan to assist our client in boosting their ROI 5:1 and marketing effectively against nearby city hotels. In this analysis, we’ll compare data around average daily rates, number of special requests, average stays, busiest weeks, number of children during stay, and weekday frequencies to derive insights for resort hotels against city hotels. With our insights, we’ll be able to build and automate targeted email campaigns and ads in addition to competitive pricing. We’ll use our insights to tailor the content within the campaigns and push them heavily during statistically weaker sales periods.

1.3 Discuss your current proposed approach/analytic technique you think will address (fully or partially) this problem.

We will analyze the frequency of stays during the week to determine which day (on average) is the most popular. Additionally, we will identify which parameters correlate with

1.4 Explain how your analysis will help the consumer of your analysis.

If we can derive which days of the week are statistically least popular for resort hotels, we can relay this information to our client and help them tailor their marketing material accordingly. For instance, if we find that Monday is the least popular day of the week to stay at a resort hotel, we will advise our client to slightly reduce the price of Monday bookings to incentivize Monday sales.

If we can derive average daily rates for resort hotels vs. city hotels, we can help our client build competitive pricing. For example, if we find that the average stay for a resort hotel is 105.00 per night, we might advise our client to offer pricing between 95.00-97.00 during slower periods.

Similarly, if we can derive which weeks or months of the year are least popular for resort hotels, we can help our client prepare incentives for weaker sales periods. For example, if a resort hotel typically has less stays during the summer months, we can offer a discounted stay to customers in the winter months if they book a trip during the summer.

Lastly, if we analyze the number of children who stay at a resort hotel vs. a city hotel and find that there’s a significant difference, we will advise our client to tailor their marketing strategy accordingly. If we find that children stay much more frequently at resort hotels vs. city hotels, then we will advise our client to include photos of children in their marketing material.

Overall, we’ll help our client:

  1. Determine competitive pricing

  2. Identify key parameters for target ads on Facebook, Instagram, Twitter (age, time of ad, day of ad, etc.)

  3. Reap the benefits of strategic marketing —> ideally 5:1 ROI through these efforts

Packages Required

2.1 All packages used are loaded upfront so the reader knows which are required to replicate the analysis.

We have loaded the packages upfront.

2.2 Messages and warnings resulting from loading the package are suppressed.

library(tidyverse)
library(glue)
library(Hmisc)

2.3 Explanation is provided regarding the purpose of each package (there are over 10,000 packages, don’t assume that I know why you loaded each package).

Package tidyverse is loaded as it includes many core packages such as ggplot2, dplyr, and tidyr. These are important for decision analysis as we explore the data set, clean and organize the file, and make any structural changes needed.

Package glue is loaded to make it easier to paste as it serves as an alternative.

Package Hmisc is loaded to show summary unique statistics on the variables in our set not able to obtain through standard Str() and Summary () functions.

Data Preparation

3.1 Original source where the data was obtained is cited and, if possible, hyperlinked.

We will be using the Hotel Bookings Data. The data for our project was obtained from an open hotel booking demand dataset from Antonio, Almeida and Nunes, 2019

3.2 Source data is thoroughly explained (i.e. what was the original purpose of the data, when was it collected, how many variables did the original have, explain any peculiarities of the source data such as how missing values are recorded, or how data was imputed, etc.).

This dataset describes two sets of information related to Resort Hotels and City Hotels. Both Resort Hotels and City Hotels share the same structure and each observation represents a specific hotel booking. This dataset was collected to aid in research and education in revenue management and data mining. The timeline for this data is between July 01, 2015 and August 31, 2017.

An explanation of the variables in the set is as follows:

  • hotel: The type of hotel the observation relates to (Resort or City).

  • is_canceled: Was the booking canceled or not? 1 = canceled, 0 = Not.

  • lead_time: number of days between the entered date and arrival date.

  • arrival_date_year: Year of arrival.

  • arrival_date_month: Month of arrival.

  • arrival_date_week_number: Week of the year the arrival occured.

  • arrival_date_day_of_month: Day of the arrival.

  • stays_in_weekend_nights: Number of weekend nights the guest stayed.

  • stays_in_week_nights: Number of week nights the guest stayed.

  • adults: Number of adults in the reservation.

  • children: Number of children in the reservation.

  • babies: Number of babies in the reservation.

  • meal: Undefined/SC – no meal package; BB – Bed & Breakfast; HB – Half board (breakfast and one other meal – usually dinner); FB – Full board (breakfast, lunch and dinner).

  • country: County of origin.

  • market_segment: TA = travel agents, TO = tour operators.

  • distribution_channel: How they booked. TA = travel agent and TO = tour operator.

  • is_repeated_guest: Is the guest a repeat customer? 1 = Yes and 0 = No.

  • previous_cancellations: Number of prior bookings canceled before the current booking.

  • previous_bookings_not_canceled: Number of previous bookings not cancelled by the customer before the current booking.

  • reserved_room_type: Room type reserved.

  • assigned_room_type: Room type assigned. Can differ from the reserved type.

  • booking_changes: Number of changes/amendments made to the booking until check in or cnacellation.

  • deposit_type: No Deposit = no deposit was made; Non Refund = a deposit was made in the value of the total stay cost; Refundable = a deposit was made with a value under the total cost of stay.

  • agent: ID of the travel agency where the booking was made.

  • company: ID of the company that made the booking.

  • days_in_waiting_list: How long the customer had to wait to get their reservation.

  • customer_type: Contract = when the booking has a contract associated to it; Group = when the booking is associated with a group; Transient = when the booking is not part of a group or contract, and is not associated to other transient bookings; Transient-party = when the booking is transient, but is associated to at least other transient booking.

  • adr: Average daily rate of the booking.

  • required_car_parking_spaces: How many parking spaces are needed for the booking.

  • total_of_special_requests: Number of special requests made.

  • reservation_status: Last status of the reservation.

  • reservation_status_date: Date of the last reservation status.

We read the data in and determine any missing values through the below:

hotel <- read.csv("C:/Users/bellmatj/OneDrive - University of Cincinnati/Data Wrangling/Project/hotels.csv")

colSums(is.na(hotel))
##                          hotel                    is_canceled 
##                              0                              0 
##                      lead_time              arrival_date_year 
##                              0                              0 
##             arrival_date_month       arrival_date_week_number 
##                              0                              0 
##      arrival_date_day_of_month        stays_in_weekend_nights 
##                              0                              0 
##           stays_in_week_nights                         adults 
##                              0                              0 
##                       children                         babies 
##                              4                              0 
##                           meal                        country 
##                              0                              0 
##                 market_segment           distribution_channel 
##                              0                              0 
##              is_repeated_guest         previous_cancellations 
##                              0                              0 
## previous_bookings_not_canceled             reserved_room_type 
##                              0                              0 
##             assigned_room_type                booking_changes 
##                              0                              0 
##                   deposit_type                          agent 
##                              0                              0 
##                        company           days_in_waiting_list 
##                              0                              0 
##                  customer_type                            adr 
##                              0                              0 
##    required_car_parking_spaces      total_of_special_requests 
##                              0                              0 
##             reservation_status        reservation_status_date 
##                              0                              0

There are 119,390 observations in this table across 32 variables:

nrow(hotel)
## [1] 119390
ncol(hotel)
## [1] 32

3.3 Data importing and cleaning steps are explained in the text (tell me why you are doing the data cleaning activities that you perform) and follow a logical process.

During this step we replaced the NA values in the children column with 0, similar to how it is structured in the column babies. Overall, the dataset was very clean and needed minimal change.

hotel$children[is.na(hotel$children)] = 0

3.4 Once your data is clean, show what the final data set looks like. However, do not print off a data frame with 200+ rows; show me the data in the most condensed form possible.

The data is clean and we have displayed the first 25 observations in the set.

head(hotel, 25)
##           hotel is_canceled lead_time arrival_date_year arrival_date_month
## 1  Resort Hotel           0       342              2015               July
## 2  Resort Hotel           0       737              2015               July
## 3  Resort Hotel           0         7              2015               July
## 4  Resort Hotel           0        13              2015               July
## 5  Resort Hotel           0        14              2015               July
## 6  Resort Hotel           0        14              2015               July
## 7  Resort Hotel           0         0              2015               July
## 8  Resort Hotel           0         9              2015               July
## 9  Resort Hotel           1        85              2015               July
## 10 Resort Hotel           1        75              2015               July
## 11 Resort Hotel           1        23              2015               July
## 12 Resort Hotel           0        35              2015               July
## 13 Resort Hotel           0        68              2015               July
## 14 Resort Hotel           0        18              2015               July
## 15 Resort Hotel           0        37              2015               July
## 16 Resort Hotel           0        68              2015               July
## 17 Resort Hotel           0        37              2015               July
## 18 Resort Hotel           0        12              2015               July
## 19 Resort Hotel           0         0              2015               July
## 20 Resort Hotel           0         7              2015               July
## 21 Resort Hotel           0        37              2015               July
## 22 Resort Hotel           0        72              2015               July
## 23 Resort Hotel           0        72              2015               July
## 24 Resort Hotel           0        72              2015               July
## 25 Resort Hotel           0       127              2015               July
##    arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
## 1                        27                         1                       0
## 2                        27                         1                       0
## 3                        27                         1                       0
## 4                        27                         1                       0
## 5                        27                         1                       0
## 6                        27                         1                       0
## 7                        27                         1                       0
## 8                        27                         1                       0
## 9                        27                         1                       0
## 10                       27                         1                       0
## 11                       27                         1                       0
## 12                       27                         1                       0
## 13                       27                         1                       0
## 14                       27                         1                       0
## 15                       27                         1                       0
## 16                       27                         1                       0
## 17                       27                         1                       0
## 18                       27                         1                       0
## 19                       27                         1                       0
## 20                       27                         1                       0
## 21                       27                         1                       1
## 22                       27                         1                       2
## 23                       27                         1                       2
## 24                       27                         1                       2
## 25                       27                         1                       2
##    stays_in_week_nights adults children babies meal country market_segment
## 1                     0      2        0      0   BB     PRT         Direct
## 2                     0      2        0      0   BB     PRT         Direct
## 3                     1      1        0      0   BB     GBR         Direct
## 4                     1      1        0      0   BB     GBR      Corporate
## 5                     2      2        0      0   BB     GBR      Online TA
## 6                     2      2        0      0   BB     GBR      Online TA
## 7                     2      2        0      0   BB     PRT         Direct
## 8                     2      2        0      0   FB     PRT         Direct
## 9                     3      2        0      0   BB     PRT      Online TA
## 10                    3      2        0      0   HB     PRT  Offline TA/TO
## 11                    4      2        0      0   BB     PRT      Online TA
## 12                    4      2        0      0   HB     PRT      Online TA
## 13                    4      2        0      0   BB     USA      Online TA
## 14                    4      2        1      0   HB     ESP      Online TA
## 15                    4      2        0      0   BB     PRT      Online TA
## 16                    4      2        0      0   BB     IRL      Online TA
## 17                    4      2        0      0   BB     PRT  Offline TA/TO
## 18                    1      2        0      0   BB     IRL      Online TA
## 19                    1      2        0      0   BB     FRA      Corporate
## 20                    4      2        0      0   BB     GBR         Direct
## 21                    4      1        0      0   BB     GBR      Online TA
## 22                    4      2        0      0   BB     PRT         Direct
## 23                    4      2        0      0   BB     PRT         Direct
## 24                    4      2        0      0   BB     PRT         Direct
## 25                    5      2        0      0   HB     GBR  Offline TA/TO
##    distribution_channel is_repeated_guest previous_cancellations
## 1                Direct                 0                      0
## 2                Direct                 0                      0
## 3                Direct                 0                      0
## 4             Corporate                 0                      0
## 5                 TA/TO                 0                      0
## 6                 TA/TO                 0                      0
## 7                Direct                 0                      0
## 8                Direct                 0                      0
## 9                 TA/TO                 0                      0
## 10                TA/TO                 0                      0
## 11                TA/TO                 0                      0
## 12                TA/TO                 0                      0
## 13                TA/TO                 0                      0
## 14                TA/TO                 0                      0
## 15                TA/TO                 0                      0
## 16                TA/TO                 0                      0
## 17                TA/TO                 0                      0
## 18                TA/TO                 0                      0
## 19            Corporate                 0                      0
## 20               Direct                 0                      0
## 21                TA/TO                 0                      0
## 22               Direct                 0                      0
## 23               Direct                 0                      0
## 24               Direct                 0                      0
## 25                TA/TO                 0                      0
##    previous_bookings_not_canceled reserved_room_type assigned_room_type
## 1                               0                  C                  C
## 2                               0                  C                  C
## 3                               0                  A                  C
## 4                               0                  A                  A
## 5                               0                  A                  A
## 6                               0                  A                  A
## 7                               0                  C                  C
## 8                               0                  C                  C
## 9                               0                  A                  A
## 10                              0                  D                  D
## 11                              0                  E                  E
## 12                              0                  D                  D
## 13                              0                  D                  E
## 14                              0                  G                  G
## 15                              0                  E                  E
## 16                              0                  D                  E
## 17                              0                  E                  E
## 18                              0                  A                  E
## 19                              0                  A                  G
## 20                              0                  G                  G
## 21                              0                  F                  F
## 22                              0                  A                  A
## 23                              0                  A                  A
## 24                              0                  D                  D
## 25                              0                  D                  I
##    booking_changes deposit_type agent company days_in_waiting_list
## 1                3   No Deposit  NULL    NULL                    0
## 2                4   No Deposit  NULL    NULL                    0
## 3                0   No Deposit  NULL    NULL                    0
## 4                0   No Deposit   304    NULL                    0
## 5                0   No Deposit   240    NULL                    0
## 6                0   No Deposit   240    NULL                    0
## 7                0   No Deposit  NULL    NULL                    0
## 8                0   No Deposit   303    NULL                    0
## 9                0   No Deposit   240    NULL                    0
## 10               0   No Deposit    15    NULL                    0
## 11               0   No Deposit   240    NULL                    0
## 12               0   No Deposit   240    NULL                    0
## 13               0   No Deposit   240    NULL                    0
## 14               1   No Deposit   241    NULL                    0
## 15               0   No Deposit   241    NULL                    0
## 16               0   No Deposit   240    NULL                    0
## 17               0   No Deposit     8    NULL                    0
## 18               0   No Deposit   240    NULL                    0
## 19               0   No Deposit  NULL     110                    0
## 20               0   No Deposit   250    NULL                    0
## 21               0   No Deposit   241    NULL                    0
## 22               1   No Deposit   250    NULL                    0
## 23               1   No Deposit   250    NULL                    0
## 24               1   No Deposit   250    NULL                    0
## 25               0   No Deposit   115    NULL                    0
##    customer_type    adr required_car_parking_spaces total_of_special_requests
## 1      Transient   0.00                           0                         0
## 2      Transient   0.00                           0                         0
## 3      Transient  75.00                           0                         0
## 4      Transient  75.00                           0                         0
## 5      Transient  98.00                           0                         1
## 6      Transient  98.00                           0                         1
## 7      Transient 107.00                           0                         0
## 8      Transient 103.00                           0                         1
## 9      Transient  82.00                           0                         1
## 10     Transient 105.50                           0                         0
## 11     Transient 123.00                           0                         0
## 12     Transient 145.00                           0                         0
## 13     Transient  97.00                           0                         3
## 14     Transient 154.77                           0                         1
## 15     Transient  94.71                           0                         0
## 16     Transient  97.00                           0                         3
## 17      Contract  97.50                           0                         0
## 18     Transient  88.20                           0                         0
## 19     Transient 107.42                           0                         0
## 20     Transient 153.00                           0                         1
## 21     Transient  97.29                           0                         1
## 22     Transient  84.67                           0                         1
## 23     Transient  84.67                           0                         1
## 24     Transient  99.67                           0                         1
## 25      Contract  94.95                           0                         1
##    reservation_status reservation_status_date
## 1           Check-Out              2015-07-01
## 2           Check-Out              2015-07-01
## 3           Check-Out              2015-07-02
## 4           Check-Out              2015-07-02
## 5           Check-Out              2015-07-03
## 6           Check-Out              2015-07-03
## 7           Check-Out              2015-07-03
## 8           Check-Out              2015-07-03
## 9            Canceled              2015-05-06
## 10           Canceled              2015-04-22
## 11           Canceled              2015-06-23
## 12          Check-Out              2015-07-05
## 13          Check-Out              2015-07-05
## 14          Check-Out              2015-07-05
## 15          Check-Out              2015-07-05
## 16          Check-Out              2015-07-05
## 17          Check-Out              2015-07-05
## 18          Check-Out              2015-07-02
## 19          Check-Out              2015-07-02
## 20          Check-Out              2015-07-05
## 21          Check-Out              2015-07-06
## 22          Check-Out              2015-07-07
## 23          Check-Out              2015-07-07
## 24          Check-Out              2015-07-07
## 25          Check-Out              2015-07-01

3.5 Provide summary information about the variables of concern in your cleaned data set. Do not just print off a bunch of code chunks with str(), summary(), etc. Rather, provide me with a consolidated explanation, either with a table that provides summary info for each variable or a nicely written summary paragraph with inline code.

describe(hotel)
## hotel 
## 
##  32  Variables      119390  Observations
## --------------------------------------------------------------------------------
## hotel 
##        n  missing distinct 
##   119390        0        2 
##                                     
## Value        City Hotel Resort Hotel
## Frequency         79330        40060
## Proportion        0.664        0.336
## --------------------------------------------------------------------------------
## is_canceled 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##   119390        0        2      0.7    44224   0.3704   0.4664 
## 
## --------------------------------------------------------------------------------
## lead_time 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0      479        1      104    112.5        0        3 
##      .25      .50      .75      .90      .95 
##       18       69      160      265      320 
## 
## lowest :   0   1   2   3   4, highest: 622 626 629 709 737
## --------------------------------------------------------------------------------
## arrival_date_year 
##        n  missing distinct     Info     Mean      Gmd 
##   119390        0        3    0.847     2016   0.7499 
##                             
## Value       2015  2016  2017
## Frequency  21996 56707 40687
## Proportion 0.184 0.475 0.341
## --------------------------------------------------------------------------------
## arrival_date_month 
##        n  missing distinct 
##   119390        0       12 
## 
## lowest : April     August    December  February  January  
## highest: March     May       November  October   September
##                                                                       
## Value          April    August  December  February   January      July
## Frequency      11089     13877      6780      8068      5929     12661
## Proportion     0.093     0.116     0.057     0.068     0.050     0.106
##                                                                       
## Value           June     March       May  November   October September
## Frequency      10939      9794     11791      6794     11160     10508
## Proportion     0.092     0.082     0.099     0.057     0.093     0.088
## --------------------------------------------------------------------------------
## arrival_date_week_number 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0       53        1    27.17    15.68        5        8 
##      .25      .50      .75      .90      .95 
##       16       28       38       46       49 
## 
## lowest :  1  2  3  4  5, highest: 49 50 51 52 53
## --------------------------------------------------------------------------------
## arrival_date_day_of_month 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0       31    0.999     15.8    10.13        2        4 
##      .25      .50      .75      .90      .95 
##        8       16       23       28       30 
## 
## lowest :  1  2  3  4  5, highest: 27 28 29 30 31
## --------------------------------------------------------------------------------
## stays_in_weekend_nights 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0       17    0.879   0.9276    1.026        0        0 
##      .25      .50      .75      .90      .95 
##        0        1        2        2        2 
## 
## lowest :  0  1  2  3  4, highest: 13 14 16 18 19
##                                                                             
## Value          0     1     2     3     4     5     6     7     8     9    10
## Frequency  51998 30626 33308  1259  1855    79   153    19    60    11     7
## Proportion 0.436 0.257 0.279 0.011 0.016 0.001 0.001 0.000 0.001 0.000 0.000
##                                               
## Value         12    13    14    16    18    19
## Frequency      5     3     2     3     1     1
## Proportion 0.000 0.000 0.000 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## stays_in_week_nights 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0       35    0.953      2.5    1.865        0        1 
##      .25      .50      .75      .90      .95 
##        1        2        3        5        5 
## 
## lowest :  0  1  2  3  4, highest: 35 40 41 42 50
## --------------------------------------------------------------------------------
## adults 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0       14    0.569    1.856   0.4287        1        1 
##      .25      .50      .75      .90      .95 
##        2        2        2        2        3 
## 
## lowest :  0  1  2  3  4, highest: 26 27 40 50 55
##                                                                             
## Value          0     1     2     3     4     5     6    10    20    26    27
## Frequency    403 23027 89680  6202    62     2     1     1     2     5     2
## Proportion 0.003 0.193 0.751 0.052 0.001 0.000 0.000 0.000 0.000 0.000 0.000
##                             
## Value         40    50    55
## Frequency      1     1     1
## Proportion 0.000 0.000 0.000
## --------------------------------------------------------------------------------
## children 
##        n  missing distinct     Info     Mean      Gmd 
##   119390        0        5    0.201   0.1039   0.1955 
## 
## lowest :  0  1  2  3 10, highest:  0  1  2  3 10
##                                              
## Value           0      1      2      3     10
## Frequency  110800   4861   3652     76      1
## Proportion  0.928  0.041  0.031  0.001  0.000
## --------------------------------------------------------------------------------
## babies 
##        n  missing distinct     Info     Mean      Gmd 
##   119390        0        5    0.023 0.007949  0.01578 
## 
## lowest :  0  1  2  9 10, highest:  0  1  2  9 10
##                                              
## Value           0      1      2      9     10
## Frequency  118473    900     15      1      1
## Proportion  0.992  0.008  0.000  0.000  0.000
## --------------------------------------------------------------------------------
## meal 
##        n  missing distinct 
##   119390        0        5 
## 
## lowest : BB        FB        HB        SC        Undefined
## highest: BB        FB        HB        SC        Undefined
##                                                             
## Value             BB        FB        HB        SC Undefined
## Frequency      92310       798     14463     10650      1169
## Proportion     0.773     0.007     0.121     0.089     0.010
## --------------------------------------------------------------------------------
## country 
##        n  missing distinct 
##   119390        0      178 
## 
## lowest : ABW AGO AIA ALB AND, highest: VGB VNM ZAF ZMB ZWE
## --------------------------------------------------------------------------------
## market_segment 
##        n  missing distinct 
##   119390        0        8 
## 
## lowest : Aviation      Complementary Corporate     Direct        Groups       
## highest: Direct        Groups        Offline TA/TO Online TA     Undefined    
##                                                                   
## Value           Aviation Complementary     Corporate        Direct
## Frequency            237           743          5295         12606
## Proportion         0.002         0.006         0.044         0.106
##                                                                   
## Value             Groups Offline TA/TO     Online TA     Undefined
## Frequency          19811         24219         56477             2
## Proportion         0.166         0.203         0.473         0.000
## --------------------------------------------------------------------------------
## distribution_channel 
##        n  missing distinct 
##   119390        0        5 
## 
## lowest : Corporate Direct    GDS       TA/TO     Undefined
## highest: Corporate Direct    GDS       TA/TO     Undefined
##                                                             
## Value      Corporate    Direct       GDS     TA/TO Undefined
## Frequency       6677     14645       193     97870         5
## Proportion     0.056     0.123     0.002     0.820     0.000
## --------------------------------------------------------------------------------
## is_repeated_guest 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##   119390        0        2    0.093     3810  0.03191  0.06179 
## 
## --------------------------------------------------------------------------------
## previous_cancellations 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0       15    0.154  0.08712   0.1682        0        0 
##      .25      .50      .75      .90      .95 
##        0        0        0        0        1 
## 
## lowest :  0  1  2  3  4, highest: 19 21 24 25 26
##                                                                          
## Value           0      1      2      3      4      5      6     11     13
## Frequency  112906   6051    116     65     31     19     22     35     12
## Proportion  0.946  0.051  0.001  0.001  0.000  0.000  0.000  0.000  0.000
##                                                     
## Value          14     19     21     24     25     26
## Frequency      14     19      1     48     25     26
## Proportion  0.000  0.000  0.000  0.000  0.000  0.000
## --------------------------------------------------------------------------------
## previous_bookings_not_canceled 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0       73    0.088   0.1371   0.2708        0        0 
##      .25      .50      .75      .90      .95 
##        0        0        0        0        0 
## 
## lowest :  0  1  2  3  4, highest: 68 69 70 71 72
## --------------------------------------------------------------------------------
## reserved_room_type 
##        n  missing distinct 
##   119390        0       10 
## 
## lowest : A B C D E, highest: F G H L P
##                                                                       
## Value          A     B     C     D     E     F     G     H     L     P
## Frequency  85994  1118   932 19201  6535  2897  2094   601     6    12
## Proportion 0.720 0.009 0.008 0.161 0.055 0.024 0.018 0.005 0.000 0.000
## --------------------------------------------------------------------------------
## assigned_room_type 
##        n  missing distinct 
##   119390        0       12 
## 
## lowest : A B C D E, highest: H I K L P
##                                                                             
## Value          A     B     C     D     E     F     G     H     I     K     L
## Frequency  74053  2163  2375 25322  7806  3751  2553   712   363   279     1
## Proportion 0.620 0.018 0.020 0.212 0.065 0.031 0.021 0.006 0.003 0.002 0.000
##                 
## Value          P
## Frequency     12
## Proportion 0.000
## --------------------------------------------------------------------------------
## booking_changes 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0       21    0.388   0.2211   0.3919        0        0 
##      .25      .50      .75      .90      .95 
##        0        0        0        1        1 
## 
## lowest :  0  1  2  3  4, highest: 16 17 18 20 21
## --------------------------------------------------------------------------------
## deposit_type 
##        n  missing distinct 
##   119390        0        3 
##                                            
## Value      No Deposit Non Refund Refundable
## Frequency      104641      14587        162
## Proportion      0.876      0.122      0.001
## --------------------------------------------------------------------------------
## agent 
##        n  missing distinct 
##   119390        0      334 
## 
## lowest : 1    10   103  104  105 , highest: 95   96   98   99   NULL
## --------------------------------------------------------------------------------
## company 
##        n  missing distinct 
##   119390        0      353 
## 
## lowest : 10   100  101  102  103 , highest: 93   94   96   99   NULL
## --------------------------------------------------------------------------------
## days_in_waiting_list 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0      128     0.09    2.321    4.559        0        0 
##      .25      .50      .75      .90      .95 
##        0        0        0        0        0 
## 
## lowest :   0   1   2   3   4, highest: 236 259 330 379 391
## --------------------------------------------------------------------------------
## customer_type 
##        n  missing distinct 
##   119390        0        4 
##                                                                           
## Value             Contract           Group       Transient Transient-Party
## Frequency             4076             577           89613           25124
## Proportion           0.034           0.005           0.751           0.210
## --------------------------------------------------------------------------------
## adr 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##   119390        0     8879        1    101.8    51.91    38.40    50.00 
##      .25      .50      .75      .90      .95 
##    69.29    94.58   126.00   164.00   193.50 
## 
## lowest :   -6.38    0.00    0.26    0.50    1.00
## highest:  450.00  451.50  508.00  510.00 5400.00
##                                                                             
## Value          0    50   100   150   200   250   300   350   400   450   500
## Frequency   2437 35085 50975 21915  6234  2156   463    99    19     4     2
## Proportion 0.020 0.294 0.427 0.184 0.052 0.018 0.004 0.001 0.000 0.000 0.000
##                 
## Value       5400
## Frequency      1
## Proportion 0.000
## 
## For the frequency table, variable is rounded to the nearest 50
## --------------------------------------------------------------------------------
## required_car_parking_spaces 
##        n  missing distinct     Info     Mean      Gmd 
##   119390        0        5    0.175  0.06252   0.1173 
## 
## lowest : 0 1 2 3 8, highest: 0 1 2 3 8
##                                              
## Value           0      1      2      3      8
## Frequency  111974   7383     28      3      2
## Proportion  0.938  0.062  0.000  0.000  0.000
## --------------------------------------------------------------------------------
## total_of_special_requests 
##        n  missing distinct     Info     Mean      Gmd 
##   119390        0        6    0.773   0.5714   0.7684 
## 
## lowest : 0 1 2 3 4, highest: 1 2 3 4 5
##                                               
## Value          0     1     2     3     4     5
## Frequency  70318 33226 12969  2497   340    40
## Proportion 0.589 0.278 0.109 0.021 0.003 0.000
## --------------------------------------------------------------------------------
## reservation_status 
##        n  missing distinct 
##   119390        0        3 
##                                         
## Value       Canceled Check-Out   No-Show
## Frequency      43017     75166      1207
## Proportion      0.36      0.63      0.01
## --------------------------------------------------------------------------------
## reservation_status_date 
##        n  missing distinct 
##   119390        0      926 
## 
## lowest : 2014-10-17 2014-11-18 2015-01-01 2015-01-02 2015-01-18
## highest: 2017-09-08 2017-09-09 2017-09-10 2017-09-12 2017-09-14
## --------------------------------------------------------------------------------

Proposed Exploratory Data Analysis

4.1 Discuss how you plan to uncover new information in the data that is not self-evident. What are different ways you could look at this data to answer the questions you want to answer?

Questions we plan to answer:

  • Are there different times of the year when it’s more popular to stay at a resort vs. a city hotel?

  • Are there different days of the week that are more popular for a resort hotel vs. city hotel?

  • Is there a disparity between children staying more at resort hotels vs. city hotels?

  • Is there a difference in # of special requests?

  • Are there any correlations between group type and cancellations? This way, we can market more heavily towards groups that don’t typically cancel.

Strategy:

We will “slice and dice” the data. We will divide the data into two different categories: city vs. resort hotels. We will perform analysis on all necessary individual variables in regard to these two categories. Through this, we’ll be able to easily draw comparative solutions to the questions above.

Do you plan to slice and dice the data in different ways, create new variables, or join separate data frames to create new summary information?

We will not need to create new variables or join separate data. Our primary goal is splitting the data by resort vs. city hotels and analyzing specific variables. Ex: most popular day for stays in city hotels vs. resort hotels.

How could you summarize your data to answer key questions?

4.2 What types of plots and tables will help you to illustrate the findings to your questions?

Bar charts will be the best way to visualize our insights. We plan to compare frequencies between resort hotels vs. city hotels, and bar charts are the most efficient way to do this. If we run a linear regression of group type against cancellations, we’ll likely create a scatter plot to help illustrate the regression.

4.3 What do you not know how to do right now that you need to learn to answer your questions?

Although we don’t foresee a need to add another variable, we aren’t certain how to do this if it was needed. Even if it’s not needed in this analysis, it could certainly be useful in the future.

4.4 Do you plan on incorporating any machine learning techniques (i.e. linear regression, discriminant analysis, cluster analysis) to answer your questions?

We plan to utilize a linear regression for reservation type against cancellations. From this, we’ll be able to see if any positive correlations exist around any specific type of reservations. Ex: do groups tend to have a higher cancellation rate?

Formatting & Other Requirements

7.1 All code is visible, proper coding style is followed, and code is well commented (see section regarding style).

Yes, to the best of our ability.

7.2 Coding is systematic - complicated problem broken down into sub-problems that are individually much simpler. Code is efficient, correct, and minimal. Code uses appropriate data structure (list, data frame, vector/matrix/array). Code checks for common errors.

Yes, to the best of our ability.

7.3 Achievement, mastery, cleverness, creativity: Tools and techniques from the course are applied very competently and, perhaps,somewhat creatively. Perhaps student has gone beyond what was expected and required, e.g., extraordinary effort, additional tools not addressed by this course, unusually sophisticated application of tools from course.

Yes, we believe the current knowledge we are applying to our strategy wells position us for a successful report at completion.

7.4 .Rmd fully executes without any errors and HTML produced matches the HTML report submitted by student.

Yes, no errors in running.