For sports fans, passion runs beyond wins and losses. It’s about the community, the traditions, and the highs and lows that come with being a die hard. For my Business Analytics Capstone Final Project, I’ve decided to explore my favorite team, the Philadelphia Eagles, game data since 2003, the year I was born. I’m going to analyze the different aspects of the Eagles’ performance of the last 21 years.
After my analysis of the Eagles game data, I’m going to scrape the Wikipedia pages for the Eagles and their long-time rival, the Dallas Cowboys. By breaking down each Wikipedia page word by word, I can do sentiment analysis and analyze which team has more positive sentiment in their articles.
Data Dictionary
Variable
Description
Week
The NFL season week the game occurred in
Date
Calendar Date of Game
Result
Win or Loss
Opponent
Philadelphia Eagles’ Opponent
PHI_score
Philadelphia Eagles Game Score
OPP_score
Opponent Game Score
Year
Season of Game
Half
First or Second Half of the NFL season
Month
Month that game occurred in
Descriptive Analysis
Wins By Year
The Eagles have been fairly consistent in this time period, with a few bad years in 2005, 2012, and 2020. Their best regular seasons were also years that they made the Super Bowl. They lost in 2004 and 2022 and won in 2017. Other than that, they’ve been at or around 9 or 10 wins.
Average Win Margin
I think this was an interesting chart because there were two big outliers. In 2008, even though they only had 9 wins based on the previous chart, their average win margin was a little over 20 points per game. In 2012, when they only had 4 wins, their win margin was around 2 points per game, which is very slim for football.
Records Against Each Team
Based on this chart, the Eagles have dominated against the Giants since 2003. They’ve performed far better against them than any other team. Also, I found it interesting that they’re undefeated against the Texans and winless against the Bengals.
Best Performing Months
Before I do my analysis of whether the Eagles are a better first or second half performing team, I thought it would be interesting to do an analysis of what months they performed the best in. December was their best month, but October and September are right behind it. My guess based off their info is that they’re a better first half performing team.
First Half vs. Second Half Performance
I would say that based on this chart, my guess was fairly right. 2023 was the biggest example of how the Eagles had a first half-surge and a second-half fall off. There’s 8 years out of the 21 that I used where the Eagles became a better team in the second half of the season.
Secondary Data Source
For my analysis of a secondary data source, I decided to include the biggest rivalry the Eagles have: vs. the Cowboys. It’s been a rivalry full of tension and pettiness for numerous decades, and I wanted to know which team had a more positive sentiment in their Wikipedia page.
Based off the sentiment data, the Eagles have more positive and negative words in their Wikipedia pages, but for the sake of the rivalry we’ll just say that the Eagles just beat them in positive words so they win this battle :).
Thank you for reading my Business Analytics Capstone Project!
Source Code
---title: "BAIS 462 Final Project"toc: true # Generates an automatic table of contents.format: # Options related to formatting. html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: true---## IntroductionFor sports fans, passion runs beyond wins and losses. It's about the community, the traditions, and the highs and lows that come with being a die hard. For my Business Analytics Capstone Final Project, I've decided to explore my favorite team, the Philadelphia Eagles, game data since 2003, the year I was born. I'm going to analyze the different aspects of the Eagles' performance of the last 21 years.After my analysis of the Eagles game data, I'm going to scrape the Wikipedia pages for the Eagles and their long-time rival, the Dallas Cowboys. By breaking down each Wikipedia page word by word, I can do sentiment analysis and analyze which team has more positive sentiment in their articles.### Data Dictionary| Variable | Description ||-----------|------------------------------------------|| Week | The NFL season week the game occurred in || Date | Calendar Date of Game || Result | Win or Loss || Opponent | Philadelphia Eagles' Opponent || PHI_score | Philadelphia Eagles Game Score || OPP_score | Opponent Game Score || Year | Season of Game || Half | First or Second Half of the NFL season || Month | Month that game occurred in |```{r}#| include: falselibrary(rvest)library(tidyverse)library(dplyr)library(magrittr)library(ggplot2)library(dplyr)library(tidytext)library(httr)library(jsonlite)all_data <-vector("list", length(2003:2023))years <-2003:2023for(i inseq_along(years)) { year <- years[i] url <-paste0("https://www.pro-football-reference.com/teams/phi/",year,"/gamelog/") webpage <-read_html(url) games <- webpage %>%html_table() %>%extract2(1) games <- games %>%select(week =1,date =3,result =5,opponent =8,PHI_score =9,OPP_score =10) %>%mutate(year = year) %>%filter(week !="Week") all_data[[i]] <- games}all_games <-bind_rows(all_data)all_games <- all_games %>%mutate(PHI_score =as.integer(PHI_score),OPP_score =as.integer(OPP_score))```## Descriptive Analysis### [Wins By Year]{.underline}```{r}#| echo: falsewins_by_year <- all_games %>%filter(result =="W") %>%group_by(year) %>%summarize(total_wins =n())ggplot(wins_by_year, aes(x =factor(year), y = total_wins, fill =factor(year))) +geom_col() +labs(title ="Philadelphia Eagles Regular Season Wins by Year",x ="Year",y ="Wins",) +theme_minimal()```The Eagles have been fairly consistent in this time period, with a few bad years in 2005, 2012, and 2020. Their best regular seasons were also years that they made the Super Bowl. They lost in 2004 and 2022 and won in 2017. Other than that, they've been at or around 9 or 10 wins.### [Average Win Margin]{.underline}```{r}#| echo: falsewin_margin <- all_games %>%filter(result =="W") %>%mutate(margin = PHI_score - OPP_score) %>%group_by(year) %>%summarize(avg_margin =mean(margin))ggplot(win_margin, aes(x =factor(year), y = avg_margin, group =1)) +geom_line(color ="forestgreen") +geom_point() +labs(title ="Average Win Margin by Year", x ="Year", y ="Average Margin") +theme_minimal()```I think this was an interesting chart because there were two big outliers. In 2008, even though they only had 9 wins based on the previous chart, their average win margin was a little over 20 points per game. In 2012, when they only had 4 wins, their win margin was around 2 points per game, which is very slim for football.### [Records Against Each Team]{.underline}```{r}#| echo: falserecord_by_opponent <- all_games %>%group_by(opponent) %>%summarize(wins =sum(result =="W"),losses =sum(result =="L"))record_by_opponent_long <- record_by_opponent %>%pivot_longer(cols =c("wins", "losses"), names_to ="result", values_to ="count")ggplot(record_by_opponent_long, aes(x =reorder(opponent, -count), y = count, fill = result)) +geom_col(position ="dodge") +coord_flip() +labs(title ="Record Against Each Team", x ="Opponent", y ="Count") +theme_minimal()```Based on this chart, the Eagles have dominated against the Giants since 2003. They've performed far better against them than any other team. Also, I found it interesting that they're undefeated against the Texans and winless against the Bengals.### [Best Performing Months]{.underline}```{r}#| echo: falseall_games <- all_games %>%mutate(date =paste(date, year),date =as.Date(date, format ="%B %d %Y"),month =format(date, "%B") )performance_by_month <- all_games %>%group_by(month) %>%summarize(games_played =n(),wins =sum(result =="W"),win_percentage = wins / games_played) %>%arrange(desc(win_percentage))ggplot(performance_by_month, aes(x =reorder(month, -win_percentage), y = win_percentage, fill = month)) +geom_col() +labs(title ="Win Percentage by Month",x ="Month",y ="Win Percentage") +theme_minimal()```Before I do my analysis of whether the Eagles are a better first or second half performing team, I thought it would be interesting to do an analysis of what months they performed the best in. December was their best month, but October and September are right behind it. My guess based off their info is that they're a better first half performing team.### [First Half vs. Second Half Performance]{.underline}```{r}#| echo: falseall_games <- all_games %>%mutate(week =as.numeric(week),half =ifelse(week <=9, "First Half", "Second Half"))performance_by_half <- all_games %>%group_by(year, half) %>%summarize(games_played =n(),wins =sum(result =="W"),win_percentage = wins / games_played)ggplot(performance_by_half, aes(x =factor(year), y = win_percentage, fill = half)) +geom_col(position ="dodge") +labs(title ="First Half vs. Second Half Performance",x ="Year",y ="Win Percentage") +theme_minimal()```I would say that based on this chart, my guess was fairly right. 2023 was the biggest example of how the Eagles had a first half-surge and a second-half fall off. There's 8 years out of the 21 that I used where the Eagles became a better team in the second half of the season.## Secondary Data SourceFor my analysis of a secondary data source, I decided to include the biggest rivalry the Eagles have: vs. the Cowboys. It's been a rivalry full of tension and pettiness for numerous decades, and I wanted to know which team had a more positive sentiment in their Wikipedia page.### [Eagles vs Cowboys Sentiment]{.underline}```{r}#| echo: falseeagles_url <-"https://en.wikipedia.org/wiki/Philadelphia_Eagles"cowboys_url <-"https://en.wikipedia.org/wiki/Dallas_Cowboys"eagles_page <-read_html(eagles_url)cowboys_page <-read_html(cowboys_url)eagles_text <- eagles_page %>%html_nodes("p") %>%html_text()cowboys_text <- cowboys_page %>%html_nodes("p") %>%html_text()team_data <-data.frame(team =c(rep("Eagles", length(eagles_text)), rep("Cowboys", length(cowboys_text))),text =c(eagles_text, cowboys_text),stringsAsFactors =FALSE)team_words <- team_data %>%unnest_tokens(word,text) %>%anti_join(stop_words)sentiment <- team_words %>%inner_join(get_sentiments("bing")) %>%count(team,sentiment) %>%spread(sentiment,n,fill =0)print(sentiment)```Based off the sentiment data, the Eagles have more positive and negative words in their Wikipedia pages, but for the sake of the rivalry we'll just say that the Eagles just beat them in positive words so they win this battle :).### Thank you for reading my Business Analytics Capstone Project!