HW06

Author

Xiangzhe Li

Question 1

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data1 <- read_csv("https://raw.githubusercontent.com/vaiseys/dav-course/refs/heads/main/Data/nfl_salaries.csv")
Rows: 800 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (11): year, Cornerback, Defensive Lineman, Linebacker, Offensive Lineman...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Question 2

data2 <- data1 |>
  pivot_longer(cols = -year, names_to = "position", values_to = "salary")

Question 3

data3 <- data2 |>
  filter(position == "Quarterback")
data3 |>
  ggplot(aes(x = salary)) +
  geom_histogram(binwidth = 1e6, boundary = 0, closed = "left", na.rm = TRUE) +
  facet_wrap(~ year, ncol = 3) +   
  scale_x_continuous(
    breaks = scales::breaks_pretty(n = 6),            
    labels = scales::label_number(scale = 1e-6,       
                                  accuracy = 1)       
  ) +
  labs(
    title = "Quarterback Salaries by Year",
    x = "Salary (M)",                                 
    y = "Count"
  ) 

What patterns do you notice?

Between 2011 and 2019, in each of those years, roughly 40% of quarterbacks earned less than $1 million, (Seen from the leftmost bars for all year are the tallest) and that percentage fluctuates.

Question 4

data4 <- data2 |>
  group_by(year, position) |>
  summarize(
    avg_salary = mean(salary, na.rm = TRUE),  
    .groups = "drop"
  ) |>
  arrange(year, position)
view(data4)

Question 5

data4 |>
  ggplot(aes(x = year, y = avg_salary, color = position)) +
  geom_line(linewidth = 1) +
  geom_point(size = 1.5) +
  scale_y_continuous(
    labels = scales::label_number(scale = 1e-6, accuracy = 0.1)
  ) +
  labs(
    title = "Average NFL Salaries by Position (2011–2019)",
    x = "Year",
    y = "Average Salary (Million $)",
    color = "Position"
  ) 

Trend 1: Quarterbacks earn higher salaries than almost all other positions in every year.

Trend 2: Safeties, special teamers, tight ends, and wide receivers lag far behind other positions in both salary levels and growth trends.