Final Project

Author

Alana Zarneke

Load Packages

First, I will load the packages necessary for this project.

library(tidyverse)
library(readxl)

Read-in Data

Next, I will read my data in. The dataset contains my basketball stats from the past 3 seasons.

bballstats <- read_excel("data/bballstats.xlsx")
bballstats
# A tibble: 85 × 27
   date                opponent     site  started   min   fgm   fga fgper `3fgm`
   <dttm>              <chr>        <chr> <chr>   <dbl> <dbl> <dbl> <dbl>  <dbl>
 1 2023-11-11 00:00:00 Valley City… Home  No         14     2     6 0.333      0
 2 2023-11-18 00:00:00 Parkside     Away  No          7     4     5 0.8        4
 3 2023-11-25 00:00:00 Northern Mi… Home  No          5     0     0 0          0
 4 2023-11-26 00:00:00 Michigan Te… Home  No          2     0     1 0          0
 5 2023-12-01 00:00:00 Crookston    Home  No          5     1     5 0.2        1
 6 2023-12-02 00:00:00 Minot        Home  No          2     0     3 0          0
 7 2023-12-08 00:00:00 Northern     Away  No          3     1     2 0.5        0
 8 2023-12-09 00:00:00 Moorhead     Away  No          8     0     1 0          0
 9 2023-12-12 00:00:00 Mary         Away  No          6     0     1 0          0
10 2024-01-05 00:00:00 Sioux Falls  Away  No          6     2     2 1          2
# ℹ 75 more rows
# ℹ 18 more variables: `3fga` <dbl>, `3fgper` <dbl>, ftm <dbl>, fta <dbl>,
#   ftper <dbl>, offreb <dbl>, defreb <dbl>, totreb <dbl>, avgreb <dbl>,
#   pf <dbl>, assists <dbl>, turnovers <dbl>, blocks <dbl>, steals <dbl>,
#   points <dbl>, avgpoints <dbl>, eff <dbl>, outcome <chr>

Displaying and Analyzing Data

This first graph shows points by site in a box plot. This is helpful to see where I normally average the most points. From this, there doesn’t seem to be a clear standout of where I typically average the most points.

bballstats |>
  ggplot(mapping = aes(x = site, y = points)) +
  geom_boxplot() +
  labs(title = "Average Points for Home, Away, and Neutral Games",
       x = "Home, Away, or Neutral",
       y = "Average Points")

Looking at average points in a win or loss is important to see if I contribute more in a win or a loss. Overall, I average slightly more points in a win, but it is not statistically significant that if I score above a certain amount of points, it is guaranteed a win.

bballstats |>
  ggplot(mapping = aes(x = outcome, y = points)) +
  geom_boxplot() +
  labs(title = "Average Points for Win vs. Loss",
       x = "Win or Loss",
       y = "Average Points")

This graph shows the points for each game in order. Seeing the trends can see how my scoring has fluctuated over time and whether it has gone up, down, or stayed the same. In this case, you can see it went up from the first season to the second season, and then fluctuated quite a bit for the third season, meaning my scoring was not consistent throughout the entirety of this last season.

bballstats |>
  ggplot(mapping = aes(x = date, y = points)) +
  geom_line() +
  labs(title = "Points Over Time",
       x = "Date",
       y = "Points")

This next plot shows multiple different variables, and looking at all of these stats together can help show a trend. An example can be not scoring very much along with little to no assists, little to no rebounds, and a loss. To me, it does not seem like there is a very apparent trend, but the two major outliers I see both have one or more values that are greater than the median values by a substantial amount.

bballstats |>
  ggplot(mapping = aes(x = assists, y = points, size = totreb,
                       color = outcome)) +
  geom_point() +
  labs(title = "Points, Assists, and Rebounds for Wins and Losses",
       x = "Assists",
       y = "Points",
       color = "Outcome",
       size = "Total Rebounds")

Seeing a graph representation of field goal percentage for 2-points compared to how many attempts I have can give me an idea of whether shooting more is beneficial to my percentage. I also colored it to interpret the made field goals better, but doing the math for attempts and percentage works, too. Clearly, the more shots I take, the more I can make, but free throws are not counted in this, so the total points I scored cannot be calculated.

bballstats |>
  mutate(
    `2fga` = fga - `3fga`,
    `2fgm` = fgm - `3fgm`,
    `2fgper` = `2fgm` / `2fga`) |>
  relocate(`2fgm`, `2fga`, `2fgper`, .after = fgper) |>
  mutate(
    `2fgper` = if_else(is.na(`2fgper`), 0, `2fgper`)) |>
  mutate(
    `2fgper` = round(`2fgper`, 3)) |>
  ggplot(mapping = aes(x = `2fga`, y = `2fgper`, color = `2fgm`)) +
  geom_point() +
  labs(title = "Field Goal Percentage for 2-pointers with Attempts and Makes",
       x = "Field Goals Attempted",
       y = "Field Goal Percentage",
       color = "Field Goals Made")

Generating the medians of each of the criteria below shows the values that are “average” for me and what I want to aim to achieve for each game. Seeing these values separately is good to know what I have averaged throughout the past 3 seasons.

bballstats |>
  summarize(
    med_points = median(points, na.rm = TRUE),
    med_reb = median(totreb, na.rm = TRUE),
    med_ast = median(assists, na.rm = TRUE),
    med_to = median(turnovers, na.rm = TRUE))
# A tibble: 1 × 4
  med_points med_reb med_ast med_to
       <dbl>   <dbl>   <dbl>  <dbl>
1          8       3       2      2

Now, I want to see what games I have achieved equal to or above the median for the criteria I have chosen.

bballstats |>
  filter(points >= median(points) &
           totreb >= median(totreb) &
           assists >= median(assists) &
           turnovers <= median(turnovers))
# A tibble: 11 × 27
   date                opponent     site  started   min   fgm   fga fgper `3fgm`
   <dttm>              <chr>        <chr> <chr>   <dbl> <dbl> <dbl> <dbl>  <dbl>
 1 2024-11-08 00:00:00 Central Okl… Home  Yes        31     3     7 0.429      0
 2 2024-12-14 00:00:00 Northern     Home  Yes        29     3     7 0.429      2
 3 2025-01-11 00:00:00 Duluth       Away  Yes        28     7    13 0.538      3
 4 2025-01-24 00:00:00 Sioux Falls  Home  Yes        27     4    14 0.286      3
 5 2025-01-25 00:00:00 SMSU         Home  Yes        31     5    19 0.263      3
 6 2025-02-15 00:00:00 Augie        Away  Yes        30     6    11 0.545      4
 7 2025-02-22 00:00:00 Winona       Home  Yes        25     7    17 0.412      3
 8 2025-11-14 00:00:00 Washburn     Neut… Yes        26     8    14 0.571      2
 9 2025-11-20 00:00:00 Northern Mi… Away  Yes        35     9    21 0.429      1
10 2026-02-05 00:00:00 SMSU         Home  Yes        32     4    10 0.4        1
11 2026-02-14 00:00:00 Augie        Away  Yes        27     7    11 0.636      5
# ℹ 18 more variables: `3fga` <dbl>, `3fgper` <dbl>, ftm <dbl>, fta <dbl>,
#   ftper <dbl>, offreb <dbl>, defreb <dbl>, totreb <dbl>, avgreb <dbl>,
#   pf <dbl>, assists <dbl>, turnovers <dbl>, blocks <dbl>, steals <dbl>,
#   points <dbl>, avgpoints <dbl>, eff <dbl>, outcome <chr>

Now that I can see that there are 11 games that I performed above “average” in points, rebounds, assists, and turnovers, I also want to see the win or loss right away and keep only the selected variables.

bballstats |>
  filter(points >= median(points) &
           totreb >= median(totreb) &
           assists >= median(assists) &
           turnovers <= median(turnovers)) |>
  select(date, opponent, outcome, site, points, totreb, assists, turnovers)
# A tibble: 11 × 8
   date                opponent    outcome site  points totreb assists turnovers
   <dttm>              <chr>       <chr>   <chr>  <dbl>  <dbl>   <dbl>     <dbl>
 1 2024-11-08 00:00:00 Central Ok… Win     Home       9      6       3         1
 2 2024-12-14 00:00:00 Northern    Win     Home       8      4       2         2
 3 2025-01-11 00:00:00 Duluth      Win     Away      25     13       2         0
 4 2025-01-24 00:00:00 Sioux Falls Loss    Home      15      7       5         1
 5 2025-01-25 00:00:00 SMSU        Loss    Home      13      5       2         2
 6 2025-02-15 00:00:00 Augie       Win     Away      20      7       7         2
 7 2025-02-22 00:00:00 Winona      Win     Home      17      7       2         2
 8 2025-11-14 00:00:00 Washburn    Win     Neut…     19      3       3         2
 9 2025-11-20 00:00:00 Northern M… Loss    Away      22      3       2         1
10 2026-02-05 00:00:00 SMSU        Win     Home       9      4       4         2
11 2026-02-14 00:00:00 Augie       Win     Away      19      8       2         2

Now I can see that some of my most well-rounded games where I made 8 or more points, had 3 or more rebounds, had 2 or more assists, and had 2 or less turnovers have resulted in more wins than losses, leading me to believe that when I am playing better and not only scoring more, but assisting my teammates, grabbing rebounds, and taking care of the ball by not turning it over, my team typically plays better.