Assignment 7: Ethical Web Scraping

Author

Kyle Lark

Assignment 7 Ethical Web Scraping

Research Question

As a Browns fan, I wanted to look at since 2018, at the level of team stats, how bad the Cleveland Browns really have been?

Fun Fact and Semi-Spoiler. Yesterday, 11/23/2025, Shedeur Sanders was the first Browns quarterback to win in their first start since 1999. Previous quarterbacks were 0-17.

How will I answer this question?

To answer this question, I will be scraping team stats from Pro Football Reference. I will use a loop to scrape through seven years of team stats. Looking at seven years of data will be helpful because the rules have changed so much over the last few years. Because of the rules changing, we will be able to see if the Browns are terrible no matter what the rules are, or if they can be decent here and there.

Data Wrangling + Transformation

The first step to this will be installing and activating the necessary packages for scraping, transforming and visualizing the data.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Warning: package 'polite' was built under R version 4.5.2


Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding


Attaching package: 'magrittr'

The following object is masked from 'package:purrr':

    set_names

The following object is masked from 'package:tidyr':

    extract

Function Creation

First, we will create a function to scrape the data that we need.

Loop Creation

Then we need to create a loop that will allow us to scrape multiple years’ worth of data dating back to 2018. We have to go back to 2018 in order to get 100+ instances of team stats.

Scraping year: 2018

No encoding supplied: defaulting to UTF-8.

Scraping year: 2019

No encoding supplied: defaulting to UTF-8.

Scraping year: 2020

No encoding supplied: defaulting to UTF-8.

Scraping year: 2021

No encoding supplied: defaulting to UTF-8.

Scraping year: 2022

No encoding supplied: defaulting to UTF-8.

Scraping year: 2023

No encoding supplied: defaulting to UTF-8.

Scraping year: 2024

No encoding supplied: defaulting to UTF-8.

Cleaning + Mutation

Finally, we can clean and mutate the data to make it much more user-friendly when we are doing the visualizations or even more analysis later on.

Due to the data we scraped having full rows of division names as a divider between divisions, we will have to remove that, along with small characters such as * and + that denote if a team made the playoffs or wild card.

The last step is to make a column that will help us see where the Browns are in some of our visuals.

Analysis + Visuals

Visualization #1

This scatter plot looks at the relationship between the number of points a team scores in one season compared to their win %. Highlighted in blue is the Cleveland Browns. From the plot, we can see that the Browns like to hang out either close to the regression line or below it, meaning that relative to how many points they score, they tend to underperform in win %.

`geom_smooth()` using formula = 'y ~ x'

Visualization #2

This histogram looks at every team in the NFL’s point differentials, which is how many more or fewer points they scored than the teams they played that season. The vertical red lines are the Browns. We can see that of the seven years of data we have on record, the Browns had a negative point differential six of their seasons, with the worst season being in 2024 when they had a point differential of -177, which equates to an average loss of 10.4 points!.

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Visualization #3

This bar chart shows the percentile rank for the Browns’ win % since 2018. We can see that from 2018 - 2023, they were below the 40th percentile for wins 4 out of the 6 seasons. You may be wondering, why is there no bar for 2024? Well, they were tied for the worst record in the league last year at 3-14, so they were awarded with being placed in the 0th percentile in terms of win percentage.

Visualization #4

You might be thinking, “Wow, the Browns have been subpar in every category we have looked at so far”, and this is true, but I am going to try and help their case here. To do this, we are going to look at strength of schedule, which measures how difficult a team’s schedule was in a given season, based on how good the teams were that they played against. Looking at this chart, we can see that the Browns had an above-average strength of schedule in 5 of the last 7 years. This could come down to many things, but I think a good starting place would be the AFC North.

Since 2018, the AFC North has had 1 team in the playoffs 2 out of the 7 years, 2 teams in the playoffs 3 out of the 7 years and 3 teams in the playoffs 2 out of the 7 years. This comes out to an average of 2 teams from the AFC North in the playoffs per year, which is .25 teams higher than the division average(1.75).

Because of this, I think we can cut the Browns a little slack for playing in a tough division.

Visualization #5

Finally, we are going to look at the one part of the Browns that seems to be the only bright spot every year, which is the defense. This plot looks at the number of points each team gave up in a season, with the black line being the average, and the blue dots being the Browns. Across the last seven seasons, the amount of points scored has fluctuated noticeably. However, the Browns consistently sit near the league average, indicating that the Browns’ defense has been about average since 2018. Now, this isn’t great, but compared to the other stats we have looked at, this is by far and away the best statistical performance by the Browns.

Conclusion

Across all five visualizations, we can see that the Browns are either average(rarely) or below average in most team level performance metrics. They score fewer points, underperform relative to their point totals, and regularly finish with a negative point differential. This is reflected in their win percentage, as they almost always end up below or at the 40th percentile for win percentage. Some of this hardship could be attributed to playing in a tougher division, but I don’t think this does much to help their case.

Overall, the Cleveland Browns have been abysmal since 2018, and I would assume if I gathered more data from the 2000’s and 2010’s, it would hurt their case even more. From what we have seen, their failure to compete regularly has come from a weak offense, poor efficiency, a sprinkle of tough competition, and the only area keeping them semi-afloat has been their defense.