Over the last few years there has been lots of talk about the attendance of Major League Baseball games. Baseball was once the dominant sport in the United States, but now, in terms of viewership it has been surpassed by football and basketball. The attendance had gone down every year in the MLB from 2012-2021. However, the last two years have seen a spike in attendance. Instead of focusing on the league as a whole, I wanted to look at each teams attendance and see what kind of things might have affected their 2023 attendance.
Specifically, I want to see if each team’s in game statistics would effect their attendance. For example, would a team who hit more home runs also have a higher average attendance. The data frame used in the analysis is a combination of MLB attendance data scraped from ESPN.com and batting and pitching data scraped from MLB.com. This created a data frame comprised of 2023 attendance for each team as well as some statistics that could have impacted this attendance. A description of this data frame is listed below:
Data Dictionary:
Variable
Description
Rank
Order of each team according to average attendance for home games.
Team
The name of the Team.
Games
Total number of 2023 home games.
Total
Total attendance for 2023 home games.
Average.
Average attendance for 2023 home games.
H
Total number of hits in the 2023 season.
HR
Total number of home runs in the 2023 season.
R
Total number of runs scored in the 2023 season.
OPS
On base plus slugging percentage for the 2023 season.
W
Amount of wins in the 2023 season.
L
Amount of losses in the 2023 season.
ERA
Earned run average in the 2023 season.
Playoff
Did the team make the 2023 MLB playoffs
Dataframe Summary
To get more familiar with the data that will be used, below is two tables with some important statistics.
Rank
Team
Average
W
1
LA Dodgers
47371
100
2
NY Yankees
40862
82
3
San Diego
40389
82
4
St. Louis
40013
71
5
Atlanta
39401
104
6
Philadelphia
38157
90
7
Houston
37683
90
8
Toronto
37307
89
9
Chicago Cubs
34261
83
10
Seattle
33215
88
11
NY Mets
32994
75
12
Boston
32989
78
13
LA Angels
32599
73
14
Colorado
32196
59
15
Milwaukee
31497
92
16
Texas
31272
90
17
San Francisco
30866
79
18
Cincinnati
25164
82
19
Minnesota
24371
87
20
Arizona
24212
84
21
Baltimore
23911
101
22
Cleveland
23513
76
23
Washington
23034
71
24
Chicago White Sox
21405
61
25
Detroit
20946
78
26
Pittsburgh
20131
76
27
Tampa Bay
17781
99
28
Kansas City
16136
56
29
Miami
14355
84
30
Oakland
10275
50
Average W
Min. :10275 Min. : 50.00
1st Qu.:23154 1st Qu.: 75.25
Median :31384 Median : 82.00
Mean :29277 Mean : 81.00
3rd Qu.:36546 3rd Qu.: 89.75
Max. :47371 Max. :104.00
The first table shows the list of teams, their average attendance, and how many games they won in the 2023 season. A couple things to note here is that the team with the highest average attendance in 2023 was the LA Dodgers with 47,371 people. In terms of Wins the Atlanta Braves had the most wins with 104 and were also a top 5 team in terms of attendance. The Oakland A’s were last in both attendance at 10,275 people and wins at 50.
Looking at the second table, a big thing to note is the averages of each column. The mean of the average home attendances was 29,277 and the average amount of wins for a team was 81. Out of the 30 Major League teams 17 teams were above these averages in both attendance and wins.
Analysis
Now we can look at whether or not some of these variables affect average attendance.
Attendance by Wins
To start we will see the relationship bewtween average attendance and amount of wins.
This graph shows the relationship between the amount of wins a team had in 2023 and their average attendance for home games. From the graph, it is evident that there is a positive relationship between Wins and average attendance. There are some teams who have less wins and have more average attendance than more successful teams, but overall winning more games results in getting more attendance. One big outlier is Tampa Bay. In 2023, Tampa Bay had one of the best records in the MLB and were also bottom 5 in average attendance. This is due to many reasons, but Tampa Bay’s stadium, Tropicana field is considered one of the worst venues in the MLB. The Rays plan to combat this and in September announced plans for a new $1.3 billion Stadium. The team with the least amount of wins, Oakland, also had the lowest average attendance.
Attendance by Runs Scored
Next we will look at the relationship between average attendance and amount of runs scored.
This graph shows the relationship of the amount of Runs a team scored in 2023 and their average attendance for home games. From the graph, again it is evident that there is a positive relationship between amount of runs scored and attendance. In fact their was only one team that scored less than 700 runs who were top half in attendance, which was the Yankees, who are one of the biggest franchises in all of sports. The team who scored the most Runs the Dodgers also had the highest average attendance and the team who score the least runs, Oakland, had the least average attendance.
Attendance by Home Runs
Now, we will look at the relationship between average attendance and amount of home runs
This graph shows the relationship of the number of home runs a team hit in 2023 and their average attendance for home games. From the graph there is a slight positive relationship between home runs and attendance, but I would not say this relationship is significant. Although generally, teams with less home runs had less attendance, there are a lot of exceptions. Out of the six teams who hit more than 225 home runs, only two of them are in the top ten of average attendance. Also, Cleveland had the least amount of home runs and were not in the bottom five in terms of attendance.
Attendance by Earned Run Average
Next, we will look at the relationship between earned run average and average attendance.
This graph shows the relationship between a team’s earned run average in 2023, and their average attendance for home games. From the graph there is a slight relationship between ERA and attendance. That being said, I think this is more to do with teams with lower ERA tend to win more games. ERA alone does not appear to be significantly related to attendance. Colorado had the highest ERA and were top half of teams in terms of attendance. Milwaukee also had the lowest ERA and were only 15th in attendance.
Attendance by If Team Made the Playoffs
Finally, we will look at the relationship between average attendance and whether or not a team made the playoffs.
This graph shows the relationship between if a team made the playoffs in 2023 and their average home attendance. From the graph there does not appear to be a significant relationship between making the playoffs and attendance. Although the teams who made the playoffs had a higher average attendance it is not by much. Also out of the top four teams in terms of attendance, only one made the playoffs. Out of the bottom four teams in terms of attendance, two of them made the playoffs.
Sentiment Analysis
Other than each team’s metrics, one big thing I estimate affects attendance, is each team’s stadium. To analyze the relationship between attendance and stadium, we will look at at sentiment analysis of Dodger Stadium and the Oakland Coliseum.
In 2023 the LA Dodgers had the highest average attendance in the MLB and the Oakland A’s had the lowest average attendance. The goal of this analysis is to see if their stadium’s are playing a role in each team’s attendance numbers. To do this, TripAdvisor reviews for Dodger Stadium and the Oakland Coliseum will be analyzed through sentiment and compared to see if the stadium reviews reflect each team’s attendance.
This analysis will go into emotional sentiment, particular words, and total positivity for both stadiums.
Stadium Sentiment Scores
This graph shows the emotion sentiment of words from each stadiums review and shows the total number of words written that relate to each emotion. This is also separated by each team with the green bars being Oakland and the blue bars being the Dodgers.
Oakland’s reviews overall had more positive words, but also more negative words than the Dodgers reviews. This may point out that people either liked or disliked the A’s stadium, where the Dodgers may have had more neutral reviews. People were more positive about the A’s stadium, but less negative about the Dodger’s stadium. With that being said, both stadium’s reviews had much more positive things said in them than negative.
Word Analysis
This graph shows scorable words (positive or negative) and how many times they were written in each stadium’s reviews. The graph only shows words that appeared 4 or more times. If a positive word appeared 5 times it would be shown in the graph as 5, if a negative word appeared 5 times it would be shown as -5.
The first thing to note is that there are very few words that are repeated 4 or more times across both sets of reviews. This can be good as it can mean people put more effort in and were more personal with their reviews. Oakland certainly had more negative words repeated, although words counted as negative like cheap and concession could have been written with a positive meaning. The word fans was used in the most reviews across both stadiums. This is interesting as good fan culture in the stadium might be able to make up for a sub par stadium. A big topic that was the opposite in the reviews was the dodgers stadium was described as expensive and Oakland’s stadium was described as cheap. Both words were counted as negative, but again there are cases where each word can be written in a more positive way than negative. Overall, much more positive words were written than negative.
Positivity by Day of the Week
This graph shows the positivity score for each day of the week for each stadium. This can show what days are the best to go to each stadium. The positivity score is calculated by subtracting the total negative words by total positive words. Another thing to note is that this graph takes into account what day the review was made and that can be different from which day a reviewer attended a game or event.
For Oakland, the day with the highest positivity score is Saturday and by a big margin. For the Dodgers, their highest positvity score comes on Thursday. Overall, Oakland had more total positivity over the weekend and the Dodgers had more positivity during the week days. This makes sense for a couple of reasons. First, Oakland does most if not all of their promotional days on the weekend, where the Dodgers do promotions more spread out across the week. Another reason this could be the case is ticket prices. Where Oakland’s ticket prices are cheaper than the Dodgers, people can more easily go to a game on the weekend where they have more time and can enjoy a game. Ticket prices are usually more expensive on the weekend and for a team like the Dodgers this may put tickets out of the price range of a lot of fans. These fans might be able to more easily afford a ticket during the week days when ticket prices are down.
Average Rating by Stadium Review
The last thing we will look at is a graph showing the relationship between a team’s average attendance and their average rating. For this comparison I would like to get a little bit more of a sample size. For this reason, instead of just the Stadium reviews for the Dodgers and Oakland, I have included the top 3 teams in average attendance, the Dodgers, the Yankees, and San Diego and the bottom 3 teams in average attendance, Oakland, Miami, and Kansas City.
The results of this graph are very interesting. On average the teams with a higher average attendance did have a higher average rating. However, the team with the 3rd worst attendance in the MLB, the Kansas City Royals, had the largest average stadium rating out of the group. Miami and Oakland were to be expected as they had the lowest attendance and average stadium rating. As for the top 3 teams in attendance, San Diego had the highest average rating. As San Diego had much less wins than the Dodgers, their stadium may have made up for this when it came to their attendance.
Conclusion
To start, we looked at the relationship between each team’s in game statistics and their attendance. While all of the variables we looked at did seem to have a relationship with attendance, they were not all significant. Out of all the variables, I think the most important when it comes to affecting attendance is wins. Generally, teams who won more games also had a higher average attendance. This seems obvious, as people would want to go to games more if the team is good. As for some of the other variables a lot of where their relationship with attendance could come from how each stat impacts winning rather than attendance. Another thing to look at could be how each variable affects wins and thus affects attendance. Of course, there are a lot that goes into how much people go to games, but winning definitely seems to be one of them.
As for stadiums, although this is a small sample size of fan sentiment,the results from the sensitivity analysis were much closer than expected. As previously mentioned, the Oakland A’s had the least average attendance in 2023 and yet, had more positive stadium reviews than the LA Dodgers, who had the highest average attendance. One thing from the analysis that can point to this discrepancy in attendance is that the A’s did have more negative reviews than the Dodgers. This can be for many reasons, but Oakland’s stadium seemed more love it or hate. It is clear that the attendance was a problem for Oakland and the MLB and that the stadium was labeled as a reason, as just recently in November 2023, the MLB approved the relocation of the A’s to a new stadium in Las Vegas for the 2028 season.
When including the top 3 teams in attendance and bottom 3 in attendance, it did look like attendance was affected by the stadium. Although Kansas City did have the highest average stadium rating and were the 3rd worst team when it came to average attendance, overall on average the teams with a higher average attendance did have a higher average rating. Stadiums impacting attendance can also explain how San Diego was among the top in attendance although they were not a very successful team in 2023 in terms of wins. As previously mentioned Tampa Bay who had one of the best records in the MLB were among the bottom in attendance. furthermore, just like Oakland, the team and Major League Baseball seems to think the stadium is a problem because they are also announced plans to relocate to a new stadium in the coming years. It will of course be much more clear how much of an impact their stadiums made on their attendance when they do end up moving, but for now, I think the amount a team wins and their stadium heavily influence what attendance numbers will look like.
Source Code
---title: "MLB Attendance Analysis"author: "Isaiah Johnson"format: html: code-tools: TRUE self-contained: TRUE execute: warning: FALSE # Prevent all warnings from displaying in the output. message: FALSE # Prevent all messages from displaying in the output. echo: FALSE ---```{r}library(tidyverse) # The tidyverse collection of packages#install.packages("httr")library(httr) # Useful for web authentication#install.packages("rvest")library(rvest) # Useful tools for working with HTML and XMLlibrary(lubridate) # Working with dates#install.packages("magrittr")library(magrittr) # Piping output easily with loopslibrary(knitr)#install.packages("ggrepel")library(ggrepel) #Helps with ggplot cleanlinesslibrary(readr)library(tidytext)library(textdata)MLBattendance<-read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/johnsoni4_xavier_edu/ET-jknNMHhBBsORmk2XIyMoB3oUX5wnBNr-osNGsWwxeZw?download=1")MLB_reviews<-read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/johnsoni4_xavier_edu/ETD0h33I7RZIgk5IoDdTljMBOSUfceB6pmRs7WwVYWePGg?download=1")MLB_reviews$review_date <-ymd(MLB_reviews$review_date)MLB_reviews <- MLB_reviews%>%arrange(review_date) %>%mutate(review_id =row_number())tidy_MLB <- MLB_reviews %>%unnest_tokens(word, review_content) %>%anti_join(stop_words)bing <-get_sentiments("bing")MLB_counts <- tidy_MLB %>%group_by(Team, word) %>%summarize(n =n()) %>%inner_join(bing)MLBdata<-MLBattendance %>%left_join(MLB_reviews)```# Introduction Over the last few years there has been lots of talk about the attendance of Major League Baseball games. Baseball was once the dominant sport in the United States, but now, in terms of viewership it has been surpassed by football and basketball. The attendance had gone down every year in the MLB from 2012-2021. However, the last two years have seen a spike in attendance. Instead of focusing on the league as a whole, I wanted to look at each teams attendance and see what kind of things might have affected their 2023 attendance. Specifically, I want to see if each team's in game statistics would effect their attendance. For example, would a team who hit more home runs also have a higher average attendance. The data frame used in the analysis is a combination of MLB attendance data scraped from [ESPN.com](https://www.espn.com/mlb/attendance) and batting and pitching data scraped from [MLB.com](https://www.mlb.com/stats/team). This created a data frame comprised of 2023 attendance for each team as well as some statistics that could have impacted this attendance. A description of this data frame is listed below:### Data Dictionary:| Variable | Description ||-------------------------|-----------------------------------------------|| Rank | Order of each team according to average attendance for home games. || Team | The name of the Team. || Games | Total number of 2023 home games. || Total | Total attendance for 2023 home games. || Average. | Average attendance for 2023 home games. || H | Total number of hits in the 2023 season. || HR | Total number of home runs in the 2023 season. || R | Total number of runs scored in the 2023 season. || OPS | On base plus slugging percentage for the 2023 season. || W | Amount of wins in the 2023 season. || L | Amount of losses in the 2023 season. || ERA | Earned run average in the 2023 season. || Playoff | Did the team make the 2023 MLB playoffs |### Dataframe SummaryTo get more familiar with the data that will be used, below is two tables with some important statistics. ```{r}table_data <- MLBattendance %>%select(Rank,Team, Average, W) kable(table_data)MLBattendance %>%select(Average, W ) %>%summary()```The first table shows the list of teams, their average attendance, and how many games they won in the 2023 season. A couple things to note here is that the team with the highest average attendance in 2023 was the LA Dodgers with 47,371 people. In terms of Wins the Atlanta Braves had the most wins with 104 and were also a top 5 team in terms of attendance. The Oakland A's were last in both attendance at 10,275 people and wins at 50. Looking at the second table, a big thing to note is the averages of each column. The mean of the average home attendances was 29,277 and the average amount of wins for a team was 81. Out of the 30 Major League teams 17 teams were above these averages in both attendance and wins. # AnalysisNow we can look at whether or not some of these variables affect average attendance. ### Attendance by WinsTo start we will see the relationship bewtween average attendance and amount of wins.```{r}MLBattendance %>%select(Team, Average, W) %>%ggplot(aes(x = W, y = Average, color = Team)) +geom_smooth(method ="lm", se =FALSE, color ="black") +geom_point() +geom_text_repel(aes(label = Team), size =2.5) +scale_x_continuous(expand =c(0.1, 0)) +scale_y_continuous(expand =c(0.1, 0)) +labs(title ="MLB Attendance by Wins",x ="Number of Wins",y ="Average Attendance")+theme_minimal()```This graph shows the relationship between the amount of wins a team had in 2023 and their average attendance for home games. From the graph, it is evident that there is a positive relationship between Wins and average attendance. There are some teams who have less wins and have more average attendance than more successful teams, but overall winning more games results in getting more attendance. One big outlier is Tampa Bay. In 2023, Tampa Bay had one of the best records in the MLB and were also bottom 5 in average attendance. This is due to many reasons, but Tampa Bay's stadium, Tropicana field is considered one of the worst venues in the MLB. The Rays plan to combat this and in September announced plans for a new $1.3 billion Stadium. The team with the least amount of wins, Oakland, also had the lowest average attendance.### Attendance by Runs ScoredNext we will look at the relationship between average attendance and amount of runs scored.```{r}MLBattendance %>%select(Team, Average, R ) %>%ggplot(aes(x = R, y = Average, color = Team)) +geom_smooth(method ="lm", se =FALSE, color ="black") +geom_point()+geom_text_repel(aes(label = Team), size =2.5) +scale_x_continuous(expand =c(0.1, 0)) +scale_y_continuous(expand =c(0.1, 0)) +labs(title ="MLB Attendance by Runs Scored",x ="Number of Runs Scored",y ="Average Attendance")```This graph shows the relationship of the amount of Runs a team scored in 2023 and their average attendance for home games. From the graph, again it is evident that there is a positive relationship between amount of runs scored and attendance. In fact their was only one team that scored less than 700 runs who were top half in attendance, which was the Yankees, who are one of the biggest franchises in all of sports. The team who scored the most Runs the Dodgers also had the highest average attendance and the team who score the least runs, Oakland, had the least average attendance.### Attendance by Home RunsNow, we will look at the relationship between average attendance and amount of home runs```{r}MLBattendance %>%select(Team, Average, HR ) %>%ggplot(aes(x = HR, y = Average, color = Team)) +geom_smooth(method ="lm", se =FALSE, color ="black") +geom_point()+geom_text_repel(aes(label = Team), size =2.5) +scale_x_continuous(expand =c(0.1, 0)) +scale_y_continuous(expand =c(0.1, 0)) +labs(title ="MLB Attendance by Home Runs",x ="Number of Home Runs",y ="Average Attendance")```This graph shows the relationship of the number of home runs a team hit in 2023 and their average attendance for home games. From the graph there is a slight positive relationship between home runs and attendance, but I would not say this relationship is significant. Although generally, teams with less home runs had less attendance, there are a lot of exceptions. Out of the six teams who hit more than 225 home runs, only two of them are in the top ten of average attendance. Also, Cleveland had the least amount of home runs and were not in the bottom five in terms of attendance.### Attendance by Earned Run AverageNext, we will look at the relationship between earned run average and average attendance.```{r}MLBattendance %>%select(Team, Average, ERA ) %>%ggplot(aes(x = ERA, y = Average, color = Team)) +geom_smooth(method ="lm", se =FALSE, color ="black") +geom_point()+geom_text_repel(aes(label = Team), size =2.5) +scale_x_continuous(expand =c(0.1, .1)) +scale_y_continuous(expand =c(0.1, 0)) +labs(title ="MLB Attendance by ERA",x ="Earned Run Average",y ="Average Attendance")```This graph shows the relationship between a team's earned run average in 2023, and their average attendance for home games. From the graph there is a slight relationship between ERA and attendance. That being said, I think this is more to do with teams with lower ERA tend to win more games. ERA alone does not appear to be significantly related to attendance. Colorado had the highest ERA and were top half of teams in terms of attendance. Milwaukee also had the lowest ERA and were only 15th in attendance.### Attendance by If Team Made the PlayoffsFinally, we will look at the relationship between average attendance and whether or not a team made the playoffs.```{r}MLBattendance %>%select(Team, Average, Playoff ) %>%ggplot(aes(x = Team, y = Average, color = Playoff)) +geom_col()+geom_text(aes(label = Team), vjust =-0.5, size =2.5)+scale_x_discrete(expand =c(0.1, 0)) +scale_y_continuous(expand =c(0.1, 0)) +labs(title ="MLB Attendance by Playoff Qualification",x ="Team",y ="Average Attendance")+theme(axis.text.x =element_text(angle =45, hjust =1, vjust =1))```This graph shows the relationship between if a team made the playoffs in 2023 and their average home attendance. From the graph there does not appear to be a significant relationship between making the playoffs and attendance. Although the teams who made the playoffs had a higher average attendance it is not by much. Also out of the top four teams in terms of attendance, only one made the playoffs. Out of the bottom four teams in terms of attendance, two of them made the playoffs.# Sentiment AnalysisOther than each team's metrics, one big thing I estimate affects attendance, is each team's stadium. To analyze the relationship between attendance and stadium, we will look at at sentiment analysis of Dodger Stadium and the Oakland Coliseum. In 2023 the LA Dodgers had the highest average attendance in the MLB and the Oakland A's had the lowest average attendance. The goal of this analysis is to see if their stadium's are playing a role in each team's attendance numbers. To do this, TripAdvisor reviews for Dodger Stadium and the Oakland Coliseum will be analyzed through sentiment and compared to see if the stadium reviews reflect each team's attendance.This analysis will go into emotional sentiment, particular words, and total positivity for both stadiums.### Stadium Sentiment ScoresThis graph shows the emotion sentiment of words from each stadiums review and showsthe total number of words written that relate to each emotion. This is also separated by each teamwith the green bars being Oakland and the blue bars being the Dodgers. ```{r}nrc <-get_sentiments("nrc")tidy_MLB %>%inner_join(nrc, by ="word", relationship ="many-to-many") %>%filter(Team %in%c("Oakland", "LA Dodgers")) %>%group_by(sentiment, Team) %>%summarize(n =n()) %>%ggplot(aes(x = sentiment, y = n, fill = Team)) +geom_bar(stat ="identity", position ="dodge") +scale_fill_manual(values =c("blue", "green")) +labs(title ="Stadium Sentiment Scores",subtitle ="Total number of emotive words scored ",y ="Total Number of Words",x ="Emotional Sentiment",fill ="Team")+theme(axis.text.x =element_text(angle =45, hjust =1, vjust =1))```Oakland's reviews overall had more positive words, but also more negative words than the Dodgers reviews. This may point out that people either liked or disliked the A's stadium, where the Dodgers may have had more neutral reviews. People were more positive about the A's stadium, but less negative about the Dodger's stadium. With that being said, both stadium's reviews had much more positive things said in them than negative. ### Word AnalysisThis graph shows scorable words (positive or negative) and how many times they were writtenin each stadium's reviews. The graph only shows words that appeared 4 or more times. If a positive word appeared 5 times it would be shown in the graph as 5, if a negative word appeared 5 times it would be shown as -5.```{r}MLB_counts %>%group_by(Team) %>%filter(n>3, Team %in%c("Oakland", "LA Dodgers")) %>%mutate(n =ifelse(sentiment =="negative", -n, n)) %>%mutate(word =reorder(word, n)) %>%ggplot(aes(word, n)) +geom_col() +coord_flip() +facet_wrap(~Team, ncol =2) +geom_text(aes(label =signif(n, digits =3)), nudge_y =8) +labs(title ="Positive and Negative Words for Oakland and the Dodgers ",subtitle ="Only words appearing at least 4 times are shown")```The first thing to note is that there are very few words that are repeated 4 or more times across both sets of reviews. This can be good as it can mean people put more effort in and were more personal with their reviews. Oakland certainly had more negative words repeated, although words counted as negative like cheap and concession could have been written with a positive meaning. The word fans was used in the most reviews across both stadiums. This is interesting as good fan culture in the stadium might be able to make up for a sub par stadium. A big topic that was the opposite in the reviews was the dodgers stadium was described as expensive and Oakland's stadium was described as cheap. Both words were counted as negative, but again there are cases where each word can be written in a more positive way than negative. Overall, much more positive words were written than negative. ### Positivity by Day of the WeekThis graph shows the positivity score for each day of the week for each stadium. This can show what days are the best to go to each stadium. The positivity score is calculated by subtracting the total negative words by total positive words. Another thing to note is that this graph takes into account what day the review was made and that can be different from which day a reviewer attended a game or event.```{r}tidy_MLB %>%inner_join(bing) %>%filter(Team %in%c("Oakland", "LA Dodgers")) %>%mutate(day =wday(review_date, label =TRUE)) %>%group_by(Team, day, sentiment) %>%summarize(n =n()) %>%spread(sentiment, n, fill =0) %>%mutate(sentiment = positive - negative) %>%ggplot(aes(x = day, y = sentiment, fill = Team)) +geom_col(position ="dodge") +scale_fill_manual(values =c("blue","green")) +labs(title ="Stadium Positivity Scores by Day of the Week",subtitle ="Positivity score is the total number of positive words minus total negative words",y ="Total Positivity Score",x ="Day of the Week",fill ="Team")```For Oakland, the day with the highest positivity score is Saturday and by a big margin. For the Dodgers, their highest positvity score comes on Thursday. Overall, Oakland had more total positivity over the weekend and the Dodgers had more positivity during the week days. This makes sense for a couple of reasons. First, Oakland does most if not all of their promotional days on the weekend, where the Dodgers do promotions more spread out across the week. Another reason this could be the case is ticket prices. Where Oakland's ticket prices are cheaper than the Dodgers, people can more easily go to a game on the weekend where they have more time and can enjoy a game. Ticket prices are usually more expensive on the weekend and for a team like the Dodgers this may put tickets out of the price range of a lot of fans. These fans might be able to more easily afford a ticket during the week days when ticket prices are down. ### Average Rating by Stadium ReviewThe last thing we will look at is a graph showing the relationship between a team's average attendance and their average rating. For this comparison I would like to get a little bit more of a sample size. For this reason, instead of just the Stadium reviews for the Dodgers and Oakland, I have included the top 3 teams in average attendance, the Dodgers, the Yankees, and San Diego and the bottom 3 teams in average attendance, Oakland, Miami, and Kansas City. ```{r}MLBdata %>%group_by(Team) %>%filter(Team %in%c("Oakland", "LA Dodgers", "NY Yankees", "San Diego", "Kansas City", "Miami")) %>%mutate(Avg_rating=mean(review_rating, na.rm =TRUE)) %>%ggplot(aes(x = Avg_rating, y = Average, color = Team))+geom_point()+labs(title ="Average Attenance by Stadium Review Rating",y ="Average Attendance",x ="Average Rating")```The results of this graph are very interesting. On average the teams with a higher average attendance did have a higher average rating. However, the team with the 3rd worst attendance in the MLB, the Kansas City Royals, had the largest average stadium rating out of the group. Miami and Oakland were to be expected as they had the lowest attendance and average stadium rating. As for the top 3 teams in attendance, San Diego had the highest average rating. As San Diego had much less wins than the Dodgers, their stadium may have made up for this when it came to their attendance. # ConclusionTo start, we looked at the relationship between each team's in game statistics and their attendance. While all of the variables we looked at did seem to have a relationship with attendance, they were not all significant. Out of all the variables, I think the most important when it comes to affecting attendance is wins. Generally, teams who won more games also had a higher average attendance. This seems obvious, as people would want to go to games more if the team is good. As for some of the other variables a lot of where their relationship with attendance could come from how each stat impacts winning rather than attendance. Another thing to look at could be how each variable affects wins and thus affects attendance. Of course, there are a lot that goes into how much people go to games, but winning definitely seems to be one of them. As for stadiums, although this is a small sample size of fan sentiment,the results from the sensitivity analysis were much closer than expected. As previously mentioned, the Oakland A's had the least average attendance in 2023 and yet, had more positive stadium reviews than the LA Dodgers, who had the highest average attendance. One thing from the analysis that can point to this discrepancy in attendance is that the A's did have more negative reviews than the Dodgers. This can be for many reasons, but Oakland's stadium seemed more love it or hate. It is clear that the attendance was a problem for Oakland and the MLB and that the stadium was labeled as a reason, as just recently in November 2023, the MLB approved the relocation of the A's to a new stadium in Las Vegas for the 2028 season. When including the top 3 teams in attendance and bottom 3 in attendance, it did look like attendance was affected by the stadium. Although Kansas City did have the highest average stadium rating and were the 3rd worst team when it came to average attendance, overall on average the teams with a higher average attendance did have a higher average rating. Stadiums impacting attendance can also explain how San Diego was among the top in attendance although they were not a very successful team in 2023 in terms of wins. As previously mentioned Tampa Bay who had one of the best records in the MLB were among the bottom in attendance. furthermore, just like Oakland, the team and Major League Baseball seems to think the stadium is a problem because they are also announced plans to relocate to a new stadium in the coming years. It will of course be much more clear how much of an impact their stadiums made on their attendance when they do end up moving, but for now, I think the amount a team wins and their stadium heavily influence what attendance numbers will look like.