My Code

I will be using NFL Standings data in this week’s data dive.

I first want to import my data, then get a better idea of what I am working with by running head(). This will let me see the first few rows of the data.

standings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-04/standings.csv')

## Rows: 638 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): team, team_name, playoffs, sb_winner
## dbl (11): year, wins, loss, points_for, points_against, points_differential,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(standings)

## # A tibble: 6 × 15
##   team         team_name  year  wins  loss points_for points_against
##   <chr>        <chr>     <dbl> <dbl> <dbl>      <dbl>          <dbl>
## 1 Miami        Dolphins   2000    11     5        323            226
## 2 Indianapolis Colts      2000    10     6        429            326
## 3 New York     Jets       2000     9     7        321            321
## 4 Buffalo      Bills      2000     8     8        315            350
## 5 New England  Patriots   2000     5    11        276            338
## 6 Tennessee    Titans     2000    13     3        346            191
## # ℹ 8 more variables: points_differential <dbl>, margin_of_victory <dbl>,
## #   strength_of_schedule <dbl>, simple_rating <dbl>, offensive_ranking <dbl>,
## #   defensive_ranking <dbl>, playoffs <chr>, sb_winner <chr>

Just looking at the first few columns, I see a few character type columns, and a few double type columns.

How many teams are there in the NFL?

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

standings |>
  distinct(team_name) |>
  count()

## # A tibble: 1 × 1
##       n
##   <int>
## 1    32

How many wins does a team need to have to be in the top 75% quantile of wins?

 standings |>
    pluck('wins') |>
    quantile()

##   0%  25%  50%  75% 100% 
##    0    6    8   10   16

What is the average number of wins in a season?

  standings |>
    pluck('wins') |>
    mean() |>
    round()

## [1] 8

How many times has each team made it to the playoffs, and of those playoff teams, how many won the Superbowl?

I want to start by counting how many times each team made it to the playoffs.

standings |>
  group_by(team_name) |>
  filter(playoffs == 'Playoffs') |>
  count()

## # A tibble: 32 × 2
## # Groups:   team_name [32]
##    team_name      n
##    <chr>      <int>
##  1 49ers          6
##  2 Bears          5
##  3 Bengals        7
##  4 Bills          2
##  5 Broncos        9
##  6 Browns         1
##  7 Buccaneers     5
##  8 Cardinals      4
##  9 Chargers       7
## 10 Chiefs         9
## # ℹ 22 more rows

I want to plot this to better understand how each team performed.

playoff_counts <- standings |>
  group_by(team_name) |>
  filter(playoffs == 'Playoffs') |>
  count()

# There are too many teams to display at once, so I will only display the top 75 quantile of playoff counts.

playoff_counts |>
  pluck('n') |>
  quantile()

##    0%   25%   50%   75%  100% 
##  1.00  4.75  7.00  9.00 17.00

high_playoff_counts <- playoff_counts |>
  filter(n > 9)

high_playoff_counts |>
  ggplot(aes(x = reorder(team_name, n), y = n)) +
  geom_bar(stat = "identity", fill = 'blue') +
  theme_minimal() +
  labs(title = "Top NFL Team Playoff Appearances",
       x = "Team",
       y = "Number of Playoff Appearances") +
  coord_flip()

Next, I want to find the number of Superbowl wins for each team. I will use a similar process to the one I used to find the playoff counts.

superbowl_wins <- standings |>
  group_by(team_name) |>
  filter(sb_winner == 'Won Superbowl') |>
  count() |>
  arrange(desc(n))

Next, I want to visualize the number of Superbowl wins with the number of playoff appearances for each Superbowl winning team.

I need to combine the playoff counts with the Superbowl win counts.

playoff_counts |>
  inner_join(superbowl_wins, by = 'team_name') |>
  ggplot(aes(y = reorder(team_name, n.x))) +
  geom_bar(mapping = aes(x = n.x), stat = 'identity', fill = 'blue') +
  geom_bar(mapping = aes(x = n.y), stat = 'identity', fill = 'red') +
  theme_minimal() +
  labs(title = "NFL Playoff Appearances and Super Bowl Wins",
       y = "Team",
       x = "Number of Playoff Appearances (Blue) and Superbowl Wins (Red)")

What range of years does ‘standings’ capture?

standings |>
  pluck('year') |>
  range()

## [1] 2000 2019

As a Baltimore Raven’s fan, the years 2000 and 2019 were two very memorable seasons. In 2000 (and 2012), the Baltimore Ravens won the Super Bowl…

standings |>
  filter(team == 'Baltimore') |>
  filter(sb_winner == 'Won Superbowl') |>
  pluck('year')

## [1] 2000 2012

… and in 2019, Lamar Jackson, the quarterback of the Ravens, won the Most Valuable Player award. Many fans believed the team would go on to win a Superbowl. However, the Ravens season would be cut short after a first round, leaving the fans very disappointed.

standings |>
  filter(team == 'Baltimore') |>
  filter(year == '2019') |>
  pluck('playoffs')

## [1] "Playoffs"

standings |>
  filter(team == 'Baltimore') |>
  filter(year == '2019') |>
  pluck('sb_winner')

## [1] "No Superbowl"

Lets further examine the Baltimore Ravens in both of their super bowl seasons, and Lamar Jackson’s 2019 MVP season.

good_ravens_seasons <- standings |>
  filter(team == 'Baltimore') |>
  filter((year == 2000) | (year == 2012) | (year == 2019))

Lets compare teams be visualizing points_for and points_against.

good_ravens_seasons |>
  ggplot(mapping = aes(x = points_for, y = points_against, color = as.factor(year))) +
  geom_point() +
  labs(title = ' Balimore Ravens, Points For and Against, Memorable Seasons',color = "Year", x = "Points For", y = "Points Against") +
  theme_minimal()

The 2019 Baltimore Ravens scored more points than both of their Superbowl winning teams and had conceded less points than the 2012 team. Lets look at more than just the Ravens. Lets see how the offense and defense of the league varied over time.

standings |>
  ggplot(mapping = aes(x = points_for, y = points_against, color = factor(playoffs))) +
  geom_point() +
  scale_color_manual(values = c('red', 'blue')) +
  labs(title = 'NFL Points For VS Points Against: What it takes to make the playoffs',color = '',x = "Points For", y = "Points Against") +
  theme_minimal()

standings |>
  ggplot(mapping = aes(x = points_for, y = points_against, color = factor(sb_winner))) +
  geom_point() +
  scale_color_manual(values = c('red', 'blue')) +
  labs(title = 'NFL Points For VS Points Against: What it takes to win a Superbowl', color = 'Super Bowl Winner',x = "Points For", y = "Points Against") +
  theme_minimal()

Week 2 | Data Dive - Summaries

Due: Monday Jan 22, 2024

My Code