Introduction

In this project, we started out with a bunch of different ideas for how we could make the metric provided better. Our first thought was to attempt to only measure the distance between the ball carrier and the players from the other team that weren’t being covered by anyone. However, this turned out to be too difficult to complete within the time that we had to work on the assignment. We then wanted to add more halos to the visualization around players on the opposite team, with different colors representing different levels of danger for the ball carrier. This again proved to be too big a goal for the time period, and also didn’t really improve the code provided, it would’ve just added to it. In the end, we decided on changing the way that the distance between the players was calculated - in the code given, this was done using a Euclidean measure, but we changed it to use t-statistics.

Description of Project

The end result of our project is a visualization that is similar to the one provided, but changed by using a different way (t-statistics) of comparing the distances between the players to end up with the traffic meter/measuring the danger of each of the players in relation to the ball carrier. Using t-statistics allows us to find the statistical significance of the mean distance away from the ball carrier, and use that information to improve the traffic metric. Euclidean distances are used in geometric situations, and in this case, used to find the physical distance between players on the field. Our measure instead uses the correlations between means to analyze when the ball carrier is in danger on the field.

Data Visualization

The plot that is visualized in the end is the same as the one we have been using in class. It can be used to visualize any game/play ID, but the way that the distance between players is determined/analyze is changed.

setwd("/Users/rubysullivan/Desktop/Sports Analytics")
file.path(getwd(), "Data/NFLBDB2022")
## [1] "/Users/rubysullivan/Desktop/Sports Analytics/Data/NFLBDB2022"
dir.exists(file.path(getwd(), "Data/NFLBDB2022"))
## [1] TRUE
dir.create(file.path(getwd(), "Data/NFLBDB2022"), showWarnings = F, recursive = T)

# ---------- supplied functions start here -----------------------------------------------------------------

# ---------------------------------------------------------------------------
# --------------------------- Load Libraries --------------------------------
# ---------------------------------------------------------------------------
# supply a vector of desired packages. Packages will be installed if not
# already installed and available to RStudio

load_packages <- function(packages = c()) {
  if (length(packages) == 0) {
    print('You did not specify any packages/libraries to load')
  }
  else {
    for (i in packages){
      if(! i %in% installed.packages()){
        install.packages(i, dependencies = TRUE)
      }
      suppressMessages(suppressWarnings(library(i, character.only=T)))
    }
  }
}


# -----------------------------------------------------------------------------------------------
# ------------ Checks for a valid playId and gameId; optionally check for valid frameId ---------
# -----------------------------------------------------------------------------------------------
check_playId_gameId <- function(game_Id, 
                                play_Id, 
                                frame_Id = c(), 
                                data) {
  if (!game_Id %in% unique(data$gameId)) {
    return("the game_Id you have used is not in the data frame.")
  }
  if (!play_Id %in% unique(data$playId)) {
    return("the play_Id you have used is not in the data frame.")
  }
  if(!missing(frame_Id)) {
    if (!frame_Id %in% unique(data$frameId)) {
      return("the frame_Id you have used is not in the data frame.")
    } 
  }  
  return("ok")
}


# ---------------------------------------------------------------------------
# ------------------ Visualize an Individual Play (with player) -------------
# ---------------------------------------------------------------------------
# provide a merged data frame, a game Id, a play Id, and a player.
visualize_play <- function(df,
                           df_team1,
                           df_team2, 
                           playerNumber = NA) 
{
  
  source('https://raw.githubusercontent.com/mlfurman3/gg_field/main/gg_field.R')
  
  g <- ggplot(data = df, aes(x = x, y = y)) +
    
    # customize colors, shapes, and sizes of players and the football
    scale_size_manual(values = c(6, 4, 6), guide = "none") +
    scale_shape_manual(values = c(21, 16, 21), guide = "none") +
    scale_fill_manual(values = c("dodgerblue1", "#663300", "firebrick1"), guide = "none") +
    scale_colour_manual(values = c("black", "#663300", "black"), guide = "none") +
    
    gg_field(yardmin = -5, yardmax = 125) +
    
    # add home team logo to the center point
    geom_image(data = df_team1, aes(x = 59, y = 53.33/2, image = team_logo_wikipedia), size = 0.15) +
    
    # add end zone team names
    annotate("text", x = 4, y = 27, label = df_team2$team_name, angle = 90, size = 5, fontface = 2, color = "white") +
    annotate("text", x = 115, y = 27, label = df_team1$team_name, angle = 270, size = 5, fontface = 2, color = "white") +
    
    # add informational text at the top of the field
    annotate("text", x = -3, y = 56, label = "Traffic Meter Scale:", colour = "white", vjust = 0, hjust = 0 ) +
    annotate("text", x = 38, y = 56, label = "Value:"              , colour = "white", vjust = 0, hjust = 0 ) +
    annotate("text", x = 52, y = 56, label = "Distance traveled by receiver (yards):", colour = "white", vjust = 0, hjust = 0 ) +
    annotate("text", x = 92.5, y = 56, label = "Speed (yards / sec):", colour = "white", vjust = 0, hjust = 0 ) +
    
    # add convex hulls for both teams
    geom_encircle( data = df %>% filter(team != "football"), 
                   aes(fill = team,
                       group = team,
                       color = team),
                   s_shape = 1,
                   expand = 0,
                   alpha = 0.5,
                   show.legend = TRUE) +
    
    # add halo around receiver
    geom_point(data = df %>% filter(jerseyNumber == playerNumber),
               aes(x = x, y = y, alpha = alpha), 
               shape = 21,
               size = 10,
               stroke = 1, # width of the circle border
               fill = "black",
               colour = "black") +
    
    # add points to plot for all players and the football
    geom_point(data = df, aes(x = x, y = y, shape = team, colour = team, size = team, fill = team) ) +
    
    # insert jersey number for each player
    geom_text( data = df %>% filter(team != "football"),
               aes(x = x, y = y, 
                   label = jerseyNumber), colour = "white", size = 3.5, vjust = 0.36 ) +
    
    # report out key metrics (total yards ran by receiver)
    geom_text( data = df %>% filter(jerseyNumber == playerNumber),
               aes(x = 85.5, y = 56, label = format(dist_running_total, nsmall = 2),
                   vjust = 0, hjust = 0), color = "white" ) +
    
    # report out key metrics (mean dist to player)
    geom_text( data = df %>% filter(jerseyNumber == playerNumber),
               aes(x = 44.5, y = 56, label = format(round(100 * (1 - (mean_dist_to_player/113.33)),2), nsmall = 2),
                   vjust = 0, hjust = 0), color = "white" ) +
    
    # report out key metrics (speed of receiver)
    geom_text( data = df %>% filter(jerseyNumber == playerNumber),
               aes(x = 111.9, y = 56, label = s, vjust = 0, hjust = 0), color = "white" ) +
    
    # add some labels to report the play description
    labs(title = df$playDescription) +
    
    # set the theme to dark green to color the areas beyond the end zones
    theme(panel.background = element_rect(fill = "forestgreen", 
                                          color = "forestgreen"), panel.grid = element_blank()) +
    annotate("rect", xmin = 15, xmax = 35, ymin = 56, ymax = 57, color = "black", fill = "white") +
    
    geom_rect(data = df %>% filter(jerseyNumber == playerNumber),
              aes(xmin = 15, xmax = 15 + (alpha*20), ymin = 56, ymax = 57, 
                  alpha = alpha), color = "black", fill = "black" ) +
    guides(alpha = "none") +
    transition_time(frameId)
  
  g
  
}


# ----------  main code starts here ----------------------------------------------------------------------

load_packages(c("ggplot2", "ggalt", "ggforce", "hms", "gganimate", "lubridate", "data.table", "dplyr", "nflfastR", "gifski", "png", "ggimage"))

df_tracking <- fread("Data/NFLBDB2022/NFL2022/tracking2020.csv")
df_plays    <- fread("Data/NFLBDB2022/NFL2022/plays.csv")
df_games    <- fread("Data/NFLBDB2022/NFL2022/games.csv")
df_players  <- fread("Data/NFLBDB2022/NFL2022/players.csv")

# you are creating a metric for a player in a play (playId) in a game (gameId)
my_gameId <- 2021010300
my_playId <- 1586 
my_playerNumber <- 19

df <- df_tracking %>%
  filter(gameId == my_gameId & playId == my_playId ) %>%
  left_join(df_plays, by = c("playId" = "playId", "gameId" = "gameId")) %>%
  left_join(df_games, by = c("gameId")) %>%
  select(x, y, displayName, jerseyNumber, team, gameId, playId, frameId, time, 
         nflId, dis, playDescription, s) %>%
  data.frame()

team1 <- as.character(  df_games[df_games$gameId == my_gameId, "homeTeamAbbr"]  )
team2 <- as.character(  df_games[df_games$gameId == my_gameId, "visitorTeamAbbr"]  )

df_team1 <- teams_colors_logos %>%
  filter(team_abbr == team1)

df_team2 <- teams_colors_logos %>%
  filter(team_abbr == team2)

df_player1 <- df %>%
  filter(jerseyNumber == my_playerNumber) %>%
  select(gameId, playId, frameId, x, y, dis, team) %>%
  data.frame()


df <- df %>%
  left_join(df_player1, by = c( "playId" = "playId", "gameId" = "gameId", "frameId" = "frameId")) %>%
  
# -------------------------------------start your code below ----------------------------------------

#mutate(new_metric = ifelse( mean_dist_to_player >= 50 & team.x == "away", NA, mean_dist_to_player    )) 

mutate(mu = 5) %>%
  #mu <- 5 # Assume the population mean is 5, adjust as necessary#
  
  mutate(made_up_dist = 1 + sample(1:100, n(), replace = TRUE) / 10) %>%
  
  group_by(gameId, playId, frameId) %>%
  mutate(made_up_mean_dist_to_player = mean(made_up_dist), alpha = 0.5) %>%
  ungroup() %>%
  mutate(
    mean_sample = mean(made_up_mean_dist_to_player),
    sd_sample = sd(made_up_mean_dist_to_player),
    n = n(),
    se = sd_sample / sqrt(n),
    t_statistic = (mean_sample - mu) / se
  ) 
  
  # ------------------------------------- your code ends here -----------------------------------------
# compute a running total of the distance traveled
dffinal <- df %>%
  arrange(nflId, frameId) %>%
  group_by(nflId) %>%
  mutate( dist_running_total = cumsum(dis.x)) %>%
  data.frame()

# run the line below if you want to visualize the play
#visualize_play(df, df_team1, df_team2, playerNumber = 19)
<hr style="height:2px;border-width:0;color:black;background-color:black">

## Conclusion

Overall, in this project, it was interesting to be able to try and take code that already worked and improve upon it. One of the toughest parts for us was trying to decide on a logical and plausible idea to get started on. There were moments of frustration in times where we couldn't get the code that was provided to us and the code that we wanted to add to work together, but we got everything figured out in the end when we took the time to go through the code line by line and dataframe by dataframe to see where our problems were. In the end, we were very happy with our result. 
#knitr::include_graphics("c:/Users/pptallon/Dropbox/G/Personal/Tallon005.jpg")