Analyzing Soccer Ticket Pricing using data based on Seatgeek event tickets.
The event that it is analyzing is the Nashville SC at intermiami SF game held in Florida.
The analysis for Deal score and various of other variables have helped us correlate how prices change from game to game
The management of Nashville SC at Inter Miami CF looks at the ticket pricing variations and strategies to best optimize revenue.
The objective is to understand the factors that influence the ticket prices changing from game to game.
This means using data visualization to find the independent variables such as section and Deal Score(rating of a ticket).
# install.packages("readxl")
library(readxl)#Allows us to import Excel Files
options(scipen = 999)#Avoids using Scientific Notation in favor of decimals
event_data <- read_excel(file.choose())
event_data
## # A tibble: 1,000 × 27
## event_url event_id event_category event_subcategory title datetime_utc
## <chr> <dbl> <chr> <chr> <chr> <chr>
## 1 https://seatgee… 5900975 SPORTS MLS Nash… 2023/08/30 …
## 2 https://seatgee… 5900975 SPORTS MLS Nash… 2023/08/30 …
## 3 https://seatgee… 5900975 SPORTS MLS Nash… 2023/08/30 …
## 4 https://seatgee… 5900975 SPORTS MLS Nash… 2023/08/30 …
## 5 https://seatgee… 5900975 SPORTS MLS Nash… 2023/08/30 …
## 6 https://seatgee… 5900975 SPORTS MLS Nash… 2023/08/30 …
## 7 https://seatgee… 5900975 SPORTS MLS Nash… 2023/08/30 …
## 8 https://seatgee… 5900975 SPORTS MLS Nash… 2023/08/30 …
## 9 https://seatgee… 5900975 SPORTS MLS Nash… 2023/08/30 …
## 10 https://seatgee… 5900975 SPORTS MLS Nash… 2023/08/30 …
## # ℹ 990 more rows
## # ℹ 21 more variables: datetime_local <chr>, name <chr>, address <chr>,
## # city <chr>, state <chr>, country <chr>, postal_code <dbl>,
## # formatted_address <chr>, ticket_id <chr>, ticket_price <dbl>,
## # total_price <dbl>, fee <dbl>, full_section <chr>, section <chr>, row <chr>,
## # quantity <dbl>, delivery_method <chr>, marketplace <chr>, deal_label <chr>,
## # deal_score <chr>, in_hand_date <chr>
#Makes a new data frame with only relevant columns
price_data <- subset(event_data, select = c(ticket_price, section, row, quantity, deal_score))
price_data
## # A tibble: 1,000 × 5
## ticket_price section row quantity deal_score
## <dbl> <chr> <chr> <dbl> <chr>
## 1 369 130 4 2 8.8000000000000007
## 2 210 113 35 2 6.5
## 3 173 125 6 2 9.3000000000000007
## 4 345 122 1 3 7.2
## 5 275 128 21 2 3.4
## 6 215 122 15 4 7.5
## 7 298 127 4 2 6.4
## 8 260 128 25 4 3.2
## 9 335 112 2 2 7.9
## 10 383 126 1 2 5.7
## # ℹ 990 more rows
Data Description: A description of some of the features are presented in the table below.
| Variable | Definition |
|---|---|
| Price | Price of ticket |
| Row | Row of ticket (Lower is closer) |
| Deal_Score | Seatgeek’s rating of value (Highers is better) |
| Section | Seating of the ticket inside a row |
| Quantity | Number of tickets sold |
# Extract numeric part from the 'section' column. Ex: Section 89 -> 89
price_data$section <- as.numeric(gsub("[^0-9]", "", price_data$section))
# Extract numeric part from the 'row' column. Ex: Row 20 -> 20
price_data$row <- as.numeric(gsub("[^0-9]", "", price_data$row))
#Turns the column numeric and turns anything else into N/A
price_data$deal_score <- as.numeric(price_data$deal_score)
## Warning: NAs introduced by coercion
pairs(price_data)
model <- lm( ticket_price ~ row + deal_score + section + quantity, data = price_data)
summary(model)
##
## Call:
## lm(formula = ticket_price ~ row + deal_score + section + quantity,
## data = price_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -604.1 -178.5 -48.8 105.7 7037.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1336.6408 61.4915 21.737 <0.0000000000000002 ***
## row -19.3089 1.5175 -12.725 <0.0000000000000002 ***
## deal_score -100.4576 5.4885 -18.303 <0.0000000000000002 ***
## section 0.4141 0.3972 1.043 0.297
## quantity -4.2246 6.3723 -0.663 0.508
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 463.8 on 954 degrees of freedom
## (41 observations deleted due to missingness)
## Multiple R-squared: 0.3264, Adjusted R-squared: 0.3236
## F-statistic: 115.6 on 4 and 954 DF, p-value: < 0.00000000000000022
Interpretation: Only Row and Deal_Score are significant (p_value > 0.05) so must omit the rest
Keeping only significant values make new data frame
sig_price_data <- subset(price_data, select = c(ticket_price, row, deal_score))
sig_price_data
## # A tibble: 1,000 × 3
## ticket_price row deal_score
## <dbl> <dbl> <dbl>
## 1 369 4 8.8
## 2 210 35 6.5
## 3 173 6 9.3
## 4 345 1 7.2
## 5 275 21 3.4
## 6 215 15 7.5
## 7 298 4 6.4
## 8 260 25 3.2
## 9 335 2 7.9
## 10 383 1 5.7
## # ℹ 990 more rows
summary(sig_price_data)
## ticket_price row deal_score
## Min. : 138.0 Min. : 0.00 Min. :0.200
## 1st Qu.: 207.0 1st Qu.:10.00 1st Qu.:3.400
## Median : 304.0 Median :18.00 Median :6.350
## Mean : 457.6 Mean :17.76 Mean :5.598
## 3rd Qu.: 503.0 3rd Qu.:26.00 3rd Qu.:7.900
## Max. :8354.0 Max. :38.00 Max. :9.700
## NA's :31 NA's :8
Interpretation: Average price of a ticket is 457.60 dollars
pairs(sig_price_data)
cor(sig_price_data, use = "pairwise.complete.obs")
## ticket_price row deal_score
## ticket_price 1.0000000 -0.29780261 -0.46121541
## row -0.2978026 1.00000000 -0.09824359
## deal_score -0.4612154 -0.09824359 1.00000000
Price and row have a -0.298 correlation so as price goes up you go down in rows.
(Getting closer to field)
Price and deal_score have a -0.461 correlation so as price goes up deal score goes down.
(Higher deal_score shows better value so cheaper tickets are a better deal)
qqnorm(residuals(model))
qqline(residuals(model))
Assumption of Normality of Residuals:
Residuals should lie along the 45-degree line
This plot shows deviation from the line especially in the upper right tail
non-normality: significant deviation from the line
impacts confidence intervals and hypothesis tests
Extreme outliers suggest that the residuals do not follow a normal distribution
plot(model, which = 1)
Assumptions of Linearity and Homoscedasticity:
Residuals should be randomly scattered around zero
Our plot shows a funnel like shape with residuals increasing in spread as the fitted value increase
Heteroscedasticity: variance of residuals are not constant
extreme outliers indicate a few data points have an unusually high influence on the model
model <- lm( ticket_price ~ row + deal_score, data = sig_price_data)
summary(model)
##
## Call:
## lm(formula = ticket_price ~ row + deal_score, data = sig_price_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -604.9 -188.0 -46.3 110.0 7053.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1359.398 44.471 30.57 <0.0000000000000002 ***
## row -19.250 1.479 -13.01 <0.0000000000000002 ***
## deal_score -99.479 5.415 -18.37 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 462 on 966 degrees of freedom
## (31 observations deleted due to missingness)
## Multiple R-squared: 0.3247, Adjusted R-squared: 0.3233
## F-statistic: 232.2 on 2 and 966 DF, p-value: < 0.00000000000000022
Coefficient Interpretation
The expected price of a ticket if row and deal score are 0 would be 1359.40$.
The expected price would go down $19.25 for every row you move up in.
(Sitting farther away from the field makes the seat cheaper)
The expected price goes down $99.48 for every whole deal score increase.
(Better/Higher deal score = Cheaper tickets)
Adjusted R-squared: about 32.33% of the variance in the dependent variable
This means that there is a weak model fit and there may be important factors missing from the model.
Model is significant because p-value is < 0.05
Conclusion & Recommendation
- We can conclude that it is possible to predict ticket seating based on certain variables such as price
- We observed that two variables, namely row and deal score are statistically significant (p-value < 0.05)
Business Recommendation
- The stadiums can use this to implement future pricing strategies
- The ticket companies can use it for understanding the pricing decisions
- Use it to see how changes in different factors affect ticket pricing