Effects of Major Events on Hotel Pricing

Author

Nate Holtz

Published

April 23, 2024

Introduction

It’s common knowledge that hotels raise their rates when there is a large event coming to town. Whether Taylor Swift is performing in Indianapolis or annual The Run for the Roses in the Kentucky Derby in Louisville, the hotels in the area generally increase their rates. It’s as simple as supply and demand. With an influx of out of town attendees, it only makes sense that the rates would increase.

To dive deeper into the true changes in pricing, the popular hotel booking website, booking.com, was web-scraped to find hotels surrounding a few events.

Events Chosen

Those hotels were filtered to be those closest to the event venue. The following 2024 events and hotel dates were chosen:

Kentucky Derby (May 4th): Churchill Downs - Louisville, KY
- Check-in: May 4th
- Check-out: May 5th
- Total Nights: 1
Indianapolis 500 (May 26th): Indianapolis Motor Speedway - Indianapolis, IN
- Check-in: May 25th
- Check-out: May 27th
- Total Nights: 2
CONMEBOL Copa América Final Weekend (July 14th): Hard Rock Stadium - Miami Gardens, FL
- Check-in: July 12th
- Check-out: July 15th
- Total Nights: 3
The Eras Tour - Taylor Swift (November 1st, 2nd, 3rd): Lucas Oil Stadium - Indianapolis, IN
- Check-in: November 1st
- Check-out: November 3rd
- Total Nights: 2
Hackney Diamonds Tour - The Rolling Stones (May 7th): State Farm Stadium - Glendale, AZ
- Check-in: May 6th
- Check-out: May 8th
- Total Nights: 2

For each event, hotel rates were pulled, filtered by closest to venue, for the dates listed above as well as exactly one month after. The exception was for Lucas Oil Stadium, who is hosting the BIG10 Football Championship exactly one month after the Taylor Swift Concert. For that event, data was pulled for the month prior to the event. By getting information for the month after, we are able to use that as our “control” dates and pricing and we have something to compare the data to.

For this exercise, only one control date range was selected but in actual practice it’s important to note that more than one date range should have been chosen, then the aggregate average of those prices would have been used. That approach would result in a higher confidence control group. Based on the difficulty of this website’s URL and the time it would have taken, that was not the route that was chosen.

Webpage Structure

When data scraping, the structure of the webpage is extremely important. The image below shows an example of a booking.com webpage for Xavier University, this weekend:

Data Points of Interest

The following list are some data points of interest that should be included in the data scrape:

Check-in Date (Top)
Check-out Date (Top)
Location (Top)
Hotel Name (Located in Blue)
Room Type (Located Below Hotel Name)
Distance from Location (Right of Hotel Name)
Room Price (Bottom Right)
Review Information (Top Right)

Data Location in HTML

For scraping the hotel name, the structure was fairly simple. The R code required to pull the first 25 instances on the page is shown below:

# Define url
booking_url<-
  read_html("booking.com URL")

# Scrape Hotel Name
hotel_name<-
  booking_url %>%
  html_elements("div.d6767e681c") %>%
  html_elements("h3.aab71f8e4e") %>%
  html_elements("a.a78ca197d0") %>%
  html_text2()

Each hotel name block from the webpage is scored in the “div”, each hotel name was stored in the “h3” and the actual text (not URL) was scored in the “a”. Those combined allowed the script to pull all 25 instances into a single vector.

Date and Location Issues

For the Check-in, Check-out and Location data, a little creativity was necessary. For the dates, that was stored at the top of the page, but only once. To create a vector that was the same length as the rest of the vectors, a repetition function was necessary. Another structure issue was that the date range was stored together, not individually, requiring some wrangling to separate the check-in and check-out dates into different vectors. The following R code was used:

# Scrape Date Range
dates<-
  booking_url %>%
  html_elements("div.a1139161bf") %>%
  html_elements("span.a8887b152e") %>%
  html_text2()
# Separate Check-in and Check-out
checkin<-
  dates[1]
checkout<-
  dates[2]
# Replicate value to the same length as the hotel_name vector
checkin<-
  rep(checkin,length(hotel_name))
checkout<-
  rep(checkout,length(hotel_name))

Additionally, the location was scored once on the webpage and it was necessary to do something similar:

# Scrape and Replicate Location
location<-
  booking_url %>%
  html_elements("div.e000754250") %>%
  html_elements("input") %>%
  html_attr("value") %>%
  rep(length(hotel_name))

As a result, all data pulls are stored in vectors of the same length, where they can be transformed to a traditional data frame.

Functions

In order to complete the scrape process, a few functions must be created that allow the computer to scan through each page, and pull information that has been defined. The first function, named booking_page_scrape is used to pull each point on a webpage. The second function, named scrape_bookings is a loop of booking_page_scrape that loops through each URL and the finally creates a single data frame.

Before moving on, it was important to store all URLs in a single vector that was to be scraped. Unfortunately, due to the uniqueness of the URL of booking.com, those URLs needed to be manually searched for, copy and pasted in a vector.

Booking Page Scrape

The function booking_page_scrape is used to pull each data point and at the end, the resulting vectors were used to fill the data frame, and then finally return the data frame:

# Booking Page Scrape
booking_page_scrape<-function(url){
  
# Define URL
booking_url<-
  read_html(url)
  
# Retrieve Rows
hotel_name<-
  booking_url %>%
  html_elements("div.d6767e681c") %>%
  html_elements("h3.aab71f8e4e") %>%
  html_elements("a.a78ca197d0") %>%
  html_text2()
  
###### Rest of Pull ############################################

location<-
  booking_url %>%
  html_elements("div.e000754250") %>%
  html_elements("input") %>%
  html_attr("value") %>%
  rep(length(hotel_name))

# Create Data frame
booking_df<-
  data.frame(checkin,checkout,location,hotel_name,room_type,
             distance,hotel_price,hotel_score,review_count,hotel_url)
  
# Return Data frame
return(booking_df)
}

As a result, every time a booking.com URL is entered into the function it pulls the information requested, then returns booking_df.

URL Loop

The function scrape_bookings is used to loop through the URLs, scored in a vector, and return one final data frame containing all of the data pulled from all of the URLs.

# Scrape Bookings Function
scrape_bookings<-function(urls){
# Create empty data frame
all_bookings<-
  data.frame()

# loop through each instance of URL
for(i in seq_along(urls)){
  print(paste("Collecting page",i,"of",length(urls),":) ",sep=" "))
  Sys.sleep(runif(1,5,15))
# Add rows and create page identifier
  all_bookings<-
    booking_page_scrape(urls[i]) %>% 
    mutate(page_id=i) %>% 
    bind_rows(all_bookings)
# Let user know how far along the scrape is
  print(paste(urls[i],"collected",sep=" "))
  print(paste(nrow(all_bookings),"total bookings collected so far!",sep=" "))
  }

# Returns final data frame
return(all_bookings)
}

The function works by first creating an empty data frame called all_bookings. Then, the system puts itself to sleep for a random interval of time betwen five and fifteen seconds. This is done so that the page requests do not overwelm the webpage’s server, as well as to appear like a human. Then, for each instance of i, in this case each URL, it is put into the previous booking_page_scrape function, returning the information from the page. Additionally, a column named page_id is created so that each page is defined in the data frame as a specific number. Finally, the collected data frame is added the the bottom of all_booknigs and then returning the final data frame, all_bookings.

Running Functions

When all the functions have been defined, a URL list has been created, then the scrape can be initiated.

# Run Scrape
all_bookings<-
  scrape_bookings(url_list)

As a result, the final data frame is created. That data frame includes all the information needed from the webpage to begin data wrangling. A sample of the final raw data frame is shown below:

hotel_name	room_type	checkin	checkout	distance	hotel_price	hotel_score	review_count
Best Western Premier Airport/Expo Center Hotel
Opens in new window	Queen Room with Bath Tub - Disability Access	5/4/2024	5/5/2024	2.5 miles from Churchill Downs	US$1,088	Scored 8.3	951 reviews

Data Wrangling

Unfortunately, the raw scrape includes information that is either incorrectly stored and unnecessary information. It is also necessary to add some more information. To work around that, the following code was used:

# Data Wrangling
all_bookings<-all_bookings %>%
  mutate(
    # Store dates in Lubridate format
    checkin=mdy(checkin),
    checkout=mdy(checkout),
    # Remove "Opens in New Window" from hotel name
    hotel_name=gsub("Opens in new window","",hotel_name),
    # Create parent company groups for each hotel brand
    hotel_parent=case_when(
     grepl("Best Western|SureStay|Western",hotel_name)~"Best Western",
     grepl("Motel 6|Studio 6",hotel_name)~"Motel 6",
     
###### Rest of Categories ############################################

     grepl("Red Roof",hotel_name)~"Red Roof",
     grepl("My Place Hotel",hotel_name)~"My Place Hotel",
     TRUE ~ "Private"
     ), 
    # Remove unncessary information remaining variables
    room_type=sub("\\s*-.*","",room_type),
    distance=as.numeric(sub("\\s*miles.*", "",distance)),
    hotel_price=as.numeric(gsub("[^0-9.]","",hotel_price)),
    hotel_score=as.numeric(gsub("[^0-9.]","",hotel_score)),
    review_count=as.numeric(gsub("[^0-9.]","",review_count)),
    # Add event flag
    # If event flag is true, event is during date range
    event_flag=as.logical(ifelse(page_id %% 2==1,1,0))
  ) %>%
  # Sort by page_id and place it first
  arrange(page_id) %>% 
  select(page_id, everything())

After wrangling is completed, the resulting data frame is much easier to use.

page_id	checkin	checkout	location	hotel_name	room_type	distance	hotel_price	hotel_score	review_count	hotel_parent	event_flag
1	2024-05-04	2024-05-05	Churchill Downs	Best Western Premier Airport/Expo Center Hotel	Queen Room with Bath Tub	2.5	1088	8.3	951	Best Western	TRUE
1	2024-05-04	2024-05-05	Churchill Downs	SureStay Plus by Best Western Louisville Airport Expo	King Room with Sofa Bed	2.5	757	8.1	581	Best Western	TRUE
1	2024-05-04	2024-05-05	Churchill Downs	The Bellwether Hotel	King Suite	3.6	1239	7.2	5	Private	TRUE
1	2024-05-04	2024-05-05	Churchill Downs	Motel 6 Louisville, Ky- Airport/ Fair Expo	Queen Room with Two Queen Beds	4.4	338	6.1	993	Motel 6	TRUE
1	2024-05-04	2024-05-05	Churchill Downs	Baymont by Wyndham Louisville Airport South	King Room	4.6	659	5.9	587	Wyndham	TRUE

Analysis

Visuals are a great way to show data in a digestible format. The use of visuals transform numbers into pictures, and help those looking to give a story, in a way different from traditional spreadsheets.

Distance from Venue

The first question that needs to be answered is, how much of a difference does distance from venue have on price? To answer this, a visual graphing the price of a hotel compared to the distance was necessary to create.

Here, it is noticeable that when the control dates are in effect, location related to the venue has little to do with driving up its price. Though when the event dates are in effect, the distance from the venue has a more noticeable impact on hotel price.

Price Change in Control and Event Dates by Event

Another question is what are the the price difference on event dates and control dates. That visual can be shown by comparing the mean prices of each date range, grouped by location.

The mean price difference between the two dates are shown in dollar value. For percent changes, a different visual can be used.

Two big things to point out, both Indianapolis Motor Speedway and Lucas Oil Stadium are in Indianapolis. Those two have the largest percent change as well. Could it be that the control for Indianapolis is lower on average than the rest of the locations, or are those events such a draw to increase prices. Taylor Swift is a big draw that will obviously raise rates, but the fact that the Indianapolis 500 also makes such an increase is very interesting.

Price Change in Control and Event Dates by Parent Company

Which parent companies are taking advantage of the event the most? To find out, two visuals are used. First, the dollar value change.

This visual shows that some parent companies are changing their rates much more than others. Next, it is important to standardize the price and look at the percent difference

It appears that although every single parent company raises their rates, a few outliers are more greedy than the rest. Who would’ve guessed, the hotels privately owned and not owned by CEOs making tens of millions of dollars are the ones that raise their rates the least.

Summary

Its no secret that hotels raise their rates when there’s a big event in town; However, it is curious to wonder how much they change their rates. Overall, each venue’s surrounding hotels saw a dramatic increase in price. Also, it appears that in the control dates, the distance to a venue has little to do with it’s price, whereas when the event is in town, the venue location has a very large impact.

This is some early research that very briefly shows the relationship between large events and hotel rates, but more extensive research could be done as well.

In conclusion, it’s to be expected that hotels raise rates when there is an event, but how much is ethical? Can raising rates over 300% be defended, even if it’s Taylor Swift? Having an acceptable rate increase would be extremely beneficial to the consumer and it will interesting to see in the wake of Ticketmaster’s backlash, if the hotel industry also seems something similar.