Hello! For this assignment, I will be exploring the NFL’s Passing Stats starting in 2017 using data scraped from Pro Football Reference.
Questions
My questions using this data will be finding out who the best passers are and do pass attempts lead to higher numbers of stats. I think the results will be interesting because even in the last 6 years, the game of football has changed and the passing stats should illustrate that.
Cleaning and Wrangling
There is no data cleaning required for this data set. But some wrangling I performed was to only grab Player Name, Age, Team, Games Started, Completions, Attempts, Touchdowns, Interceptions, Passer Rating, and Quarterback Rating.
The next thing I want to do is filter for quarterbacks with over 200 passing attempts. This is to prevent stats that are unusually low or high. Having 200 passing attempts means that the quarterback has played around 8 games or so which is a sufficient sample size for what I’m trying to find.
ggplot(passing_stats, aes(x = TD)) +geom_histogram(bins =20, fill ="forestgreen", color ="white") +labs(title ="Distribution of Touchdowns", x ="Touchdowns", y ="Frequency")
2) Touchdowns vs. Passing Attempts
ggplot(passing_stats, aes(x = Att, y = TD)) +geom_point(color ="brown") +labs(title ="Touchdowns vs. Passing Attempts", x ="Passing Attempts", y ="Touchdowns")
3) Interceptions vs. Passing Attempts
ggplot(passing_stats, aes(x = Att, y = Int)) +geom_point(color ="forestgreen") +labs(title ="Interceptions vs. Passing Attempts", x ="Passing Attempts", y ="Interceptions")
4) Top 10 QBs by Touchdown / Interception Ratio
passing_stats <- passing_stats %>%mutate(TD_by_INT =ifelse(Int >0, TD / Int, NA)) %>%arrange(desc(TD_by_INT))top_10_qbs <- passing_stats %>%head(10)ggplot(top_10_qbs, aes(x =reorder(Player, TD_by_INT), y = TD_by_INT)) +geom_bar(stat ="identity", fill ="brown") +coord_flip() +labs(title ="Top 10 Quarterbacks by TD/INT Ratio", x ="Player", y ="TD/INT Ratio")
5) Passer Rating by Year
---title: "Assignment #6 Ethical Web Scraping"toc: true # Generates an automatic table of contents.format: # Options related to formatting. html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: TRUE # TRUE: Show all code in the output.---Hello! For this assignment, I will be exploring the NFL's Passing Stats starting in 2017 using data scraped from [Pro Football Reference](https://www.pro-football-reference.com/years/2024/passing.htm).## QuestionsMy questions using this data will be finding out who the best passers are and do pass attempts lead to higher numbers of stats. I think the results will be interesting because even in the last 6 years, the game of football has changed and the passing stats should illustrate that.quarto-executable-code-5450563D```r#| include: FALSElibrary(readr)library(tidyverse)```quarto-executable-code-5450563D```r#| echo: FALSE#| warning: falsepassing_stats <-read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/stewartm27_xavier_edu/EYc1MgP2FANPifdTOTQrAuMBVwogP0Jj2SPeAcGC04888A?download=1")```## Cleaning and WranglingThere is no data cleaning required for this data set. But some wrangling I performed was to only grab Player Name, Age, Team, Games Started, Completions, Attempts, Touchdowns, Interceptions, Passer Rating, and Quarterback Rating.The next thing I want to do is filter for quarterbacks with over 200 passing attempts. This is to prevent stats that are unusually low or high. Having 200 passing attempts means that the quarterback has played around 8 games or so which is a sufficient sample size for what I'm trying to find.quarto-executable-code-5450563D```rpassing_stats <- passing_stats %>%filter(Att >=200)```## Visualizations1\) Distribution of Touchdownsquarto-executable-code-5450563D```rggplot(passing_stats, aes(x = TD)) +geom_histogram(bins =20, fill ="forestgreen", color ="white") +labs(title ="Distribution of Touchdowns", x ="Touchdowns", y ="Frequency")```2\) Touchdowns vs. Passing Attemptsquarto-executable-code-5450563D```rggplot(passing_stats, aes(x = Att, y = TD)) +geom_point(color ="brown") +labs(title ="Touchdowns vs. Passing Attempts", x ="Passing Attempts", y ="Touchdowns")```3\) Interceptions vs. Passing Attemptsquarto-executable-code-5450563D```rggplot(passing_stats, aes(x = Att, y = Int)) +geom_point(color ="forestgreen") +labs(title ="Interceptions vs. Passing Attempts", x ="Passing Attempts", y ="Interceptions")```4\) Top 10 QBs by Touchdown / Interception Ratioquarto-executable-code-5450563D```rpassing_stats <- passing_stats %>%mutate(TD_by_INT =ifelse(Int >0, TD / Int, NA)) %>%arrange(desc(TD_by_INT))top_10_qbs <- passing_stats %>%head(10)ggplot(top_10_qbs, aes(x =reorder(Player, TD_by_INT), y = TD_by_INT)) +geom_bar(stat ="identity", fill ="brown") +coord_flip() +labs(title ="Top 10 Quarterbacks by TD/INT Ratio", x ="Player", y ="TD/INT Ratio")```5\) Passer Rating by Year