Introduction

For this assignment, we were given a program that shows the mean distance of every defensive player to a punt returner on a specific play, and were tasked to derive a better statistic. To do this, we recognized that not every defensive player is in a position to make a tackle on the punt returner. We decided that if a defensive player is within 5 yards to the returner, he is a threat to stop the play. Further, another threat to Isaiah McKenzie, the punt returner, is the amount of people in his 5 yard radius. Naturally, 2 people being in a position to tackle him is a more imposing threat than just one. Our traffic meter statistic is made up of 2 equally weighted variables, the amount of defensive players within a 5 yard radius, and the average distance to McKenzie of all of thse players.

Description of Project

Firstly, we made a boolean variable indicating if a specific defensive player is inside 5 yards. To do this, we needed an if statement noting if the distance variable is less than 5, and the player is assigned to the away team. From this, we made a dataframe to calculate our traffic meter variable. We wanted to group this by frameId, as every instance of frameId has a different amount of traffic, since it fluctuates from second to second. Through our boolean variable, for each frame, we summated the amount of players with a value of 1 to calculate how many defensive players are in a 5 yard radius to Isaiah McKenzie. We then divided this number by 11 to make it a percentage. This variable, which we called w1, served as the first component of our statistic. Then, we derived a variable which only considers the distance from McKenzie if that player is within 5 yards. We took this variable, got its average, divided it by 5 and subtracted it from 1 so that we would have a number that would always lie between 0 and 1. We called this new statistic w2. Finally, our traffic meter statistic is the quotient of the sum of w1 and w2, so that heavy traffic would be 1, and no traffic would be 0.

Data Visualization

There is a vizualization of the play that was used to derrive the statistic. Also there are a pair of density plots, one representing the original distance metric given and the other is our created metric.

setwd("/Users/smesaros/Desktop/IS470Data")

# ---------- supplied functions start here -----------------------------------------------------------------


# ---------------------------------------------------------------------------
# --------------------------- Load Libraries --------------------------------
# ---------------------------------------------------------------------------
# supply a vector of desired packages. Packages will be installed if not
# already installed and available to RStudio


load_packages <- function(packages = c()) {
  if (length(packages) == 0) {
    print('You did not specify any packages/libraries to load')
  }
  else {
    for (i in packages){
      if(! i %in% installed.packages()){
        install.packages(i, dependencies = TRUE)
      }
      suppressMessages(suppressWarnings(library(i, character.only=T)))
    }
  }
}




# -----------------------------------------------------------------------------------------------
# ------------ Checks for a valid playId and gameId; optionally check for valid frameId ---------
# -----------------------------------------------------------------------------------------------
check_playId_gameId <- function(game_Id, 
                                play_Id, 
                                frame_Id = c(), 
                                data) {
  if (!game_Id %in% unique(data$gameId)) {
    return("the game_Id you have used is not in the data frame.")
  }
  if (!play_Id %in% unique(data$playId)) {
    return("the play_Id you have used is not in the data frame.")
  }
  if(!missing(frame_Id)) {
    if (!frame_Id %in% unique(data$frameId)) {
      return("the frame_Id you have used is not in the data frame.")
    } 
  }  
  return("ok")
}




# ---------------------------------------------------------------------------
# ------------------ Visualize an Individual Play (with player) -------------
# ---------------------------------------------------------------------------
# provide a merged data frame, a game Id, a play Id, and a player.
visualize_play <- function(df,
                           df_team1,
                           df_team2, 
                           playerNumber = NA) 
{
  
  source('https://raw.githubusercontent.com/mlfurman3/gg_field/main/gg_field.R')
  
  
  g <- ggplot(data = df, aes(x = x, y = y)) +
    
    # customize colors, shapes, and sizes of players and the football
    scale_size_manual(values = c(6, 4, 6), guide = "none") +
    scale_shape_manual(values = c(21, 16, 21), guide = "none") +
    scale_fill_manual(values = c("dodgerblue1", "#663300", "firebrick1"), guide = "none") +
    scale_colour_manual(values = c("black", "#663300", "black"), guide = "none") +
    
    gg_field(yardmin = -5, yardmax = 125) +
    
    # add home team logo to the center point
    geom_image(data = df_team1, aes(x = 59, y = 53.33/2, image = team_logo_wikipedia), size = 0.15) +
    
    # add end zone team names
    annotate("text", x = 4, y = 27, label = df_team2$team_name, angle = 90, size = 5, fontface = 2, color = "white") +
    annotate("text", x = 115, y = 27, label = df_team1$team_name, angle = 270, size = 5, fontface = 2, color = "white") +
    
    # add informational text at the top of the field
    annotate("text", x = -3, y = 56, label = "Traffic Meter Scale:", colour = "white", vjust = 0, hjust = 0 ) +
    annotate("text", x = 38, y = 56, label = "Value:"              , colour = "white", vjust = 0, hjust = 0 ) +
    annotate("text", x = 52, y = 56, label = "Distance traveled by receiver (yards):", colour = "white", vjust = 0, hjust = 0 ) +
    annotate("text", x = 92.5, y = 56, label = "Speed (yards / sec):", colour = "white", vjust = 0, hjust = 0 ) +
    
    # add convex hulls for both teams
    geom_encircle( data = df %>% filter(team != "football"), 
                   aes(fill = team,
                       group = team,
                       color = team),
                   s_shape = 1,
                   expand = 0,
                   alpha = 0.5,
                   show.legend = TRUE) +
    
    # add halo around receiver
    geom_point(data = df %>% filter(jerseyNumber == playerNumber),
               aes(x = x, y = y, alpha = alpha), 
               shape = 21,
               size = 10,
               stroke = 1, # width of the circle border
               fill = "black",
               colour = "black") +
    
    # add points to plot for all players and the football
    geom_point(data = df, aes(x = x, y = y, shape = team, colour = team, size = team, fill = team) ) +
    
    # insert jersey number for each player
    geom_text( data = df %>% filter(team != "football"),
               aes(x = x, y = y, 
                   label = jerseyNumber), colour = "white", size = 3.5, vjust = 0.36 ) +
    
    # report out key metrics (total yards ran by receiver)
    geom_text( data = df %>% filter(jerseyNumber == playerNumber),
               aes(x = 85.5, y = 56, label = format(dist_running_total, nsmall = 2),
                   vjust = 0, hjust = 0), color = "white" ) +
    
    # report out key metrics (mean dist to player)
    geom_text( data = df %>% filter(jerseyNumber == playerNumber),
               aes(x = 44.5, y = 56, label = format(round(100 * (1 - (mean_dist_to_player/113.33)),2), nsmall = 2),
                   vjust = 0, hjust = 0), color = "white" ) +
    
    # report out key metrics (speed of receiver)
    geom_text( data = df %>% filter(jerseyNumber == playerNumber),
               aes(x = 111.9, y = 56, label = s, vjust = 0, hjust = 0), color = "white" ) +
    
    # add some labels to report the play description
    labs(title = df$playDescription) +
    
    # set the theme to dark green to color the areas beyond the end zones
    theme(panel.background = element_rect(fill = "forestgreen", 
                                          color = "forestgreen"), panel.grid = element_blank()) +
    annotate("rect", xmin = 15, xmax = 35, ymin = 56, ymax = 57, color = "black", fill = "white") +
    
    geom_rect(data = df %>% filter(jerseyNumber == playerNumber),
              aes(xmin = 15, xmax = 15 + (alpha*20), ymin = 56, ymax = 57, 
                  alpha = alpha), color = "black", fill = "black" ) +
    guides(alpha = "none") +
    transition_time(frameId)
  
  
  g
  
  
}




# ----------  main code starts here ----------------------------------------------------------------------

load_packages(c("ggplot2", "ggalt", "ggforce", "hms", "gganimate", "lubridate", "data.table", "dplyr", "nflfastR", "gifski", "png", "ggimage"))

df_tracking <- fread("NFLBDB2022/tracking2020.csv")
df_plays    <- fread("NFLBDB2022/plays.csv")
df_games    <- fread("NFLBDB2022/games.csv")


# you are creating a metric for a player in a play (playId) in a game (gameId)
my_gameId <- 2021010300
my_playId <- 1586 
my_playerNumber <- 19


df <- df_tracking %>%
  filter(gameId == my_gameId & playId == my_playId ) %>%
  left_join(df_plays, by = c("playId" = "playId", "gameId" = "gameId")) %>%
  select(x, y, displayName, jerseyNumber, team, gameId, playId, frameId, time, 
         nflId, dis, playDescription, s) %>%
  data.frame()


check_playId_gameId(my_gameId, my_playId, 1, df)
## [1] "ok"
check_playId_gameId(game_Id = my_gameId, play_Id = my_playId, data = df)
## [1] "ok"
team1 <- as.character(  df_games[df_games$gameId == my_gameId, "homeTeamAbbr"]  )
team2 <- as.character(  df_games[df_games$gameId == my_gameId, "visitorTeamAbbr"]  )


df_team1 <- teams_colors_logos %>%
  filter(team_abbr == team1)


df_team2 <- teams_colors_logos %>%
  filter(team_abbr == team2)


df_player1 <- df %>%
  filter(jerseyNumber == my_playerNumber) %>%
  select(gameId, playId, frameId, x, y, dis, team) %>%
  data.frame()


df <- df %>%
  left_join(df_player1, by = c( "playId" = "playId", "gameId" = "gameId", "frameId" = "frameId")) %>%
  
  # ----------------------------- here is the code that I created in class -----------------------------
# compute the distance to each player on the opposing team (not to your teammates and not to the ball)
mutate(dist = ifelse( team.x != team.y & team.x != "football", 
                      sqrt( (x.x - x.y)^2 + (y.x - y.y)^2), 
                      NA)) %>%
  
  group_by(gameId, playId, frameId) %>%
  
  # compute the average of all distances
  mutate(mean_dist_to_player = mean(dist, na.rm = TRUE ),
         alpha = 1 - (mean_dist_to_player/113.33)  ) %>%
  ungroup() %>%
  # -------------------------------------- my code ends here ------------------------------------------
# -------------------------------------start your code below ----------------------------------------


  mutate(is_inside_5_yards = ifelse(dist <= 5 & df$team == "away", 1, 0)) %>%
  
  
  
  
  # ------------------------------------- your code ends here -----------------------------------------
# compute a running total of the distance traveled
arrange(nflId, frameId) %>%
  group_by(nflId) %>%
  mutate( dist_running_total = cumsum(dis.x)) %>%
  rename( "x" = "x.x",
          "y" = "y.x",
          "team" = "team.x",
          "dis" = "dis.x") %>%
  data.frame()

traffic_meter_df <- df %>%
  select(nflId, displayName, dist, is_inside_5_yards, frameId) %>%
  group_by(frameId) %>%
  arrange(frameId) %>%
  mutate(count_inside_5_yards = sum(is_inside_5_yards)) %>%
  mutate(w1 = count_inside_5_yards / 11)%>%
  mutate(dist_of_player_inside_5_yards = ifelse(is_inside_5_yards == TRUE, dist,NA))%>%
  mutate(w2 = 1-(mean(dist_of_player_inside_5_yards, na.rm =TRUE)/5))%>%
  mutate(w2 = ifelse(is.na(w2),0,w2))%>%
  mutate(traffic_meter = (w1 + w2)/2)%>%
  data.frame()
  
frame_df <- traffic_meter_df%>%
  select(frameId, traffic_meter)%>%
  arrange(frameId)%>%
  distinct()%>%
  data.frame()

cor(traffic_meter_df$traffic_meter, df$dist, method = c("pearson", "kendall", "spearman"), use = "complete.obs")
## [1] -0.1027823
t.test(df$dist, traffic_meter_df$traffic_meter, alternative = "two.sided", var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  df$dist and traffic_meter_df$traffic_meter
## t = 71.399, df = 2540.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  26.18121 27.65990
## sample estimates:
##  mean of x  mean of y 
## 27.0856206  0.1650662

Density Plot for our Metric

ggplot(data = df[!is.na(traffic_meter_df$traffic_meter),], aes(traffic_meter_df$traffic_meter, fill = "red"), color = "black") +
  geom_density(position = "identity", alpha=0.5)

Conclusion

According to the t-test our metric, statistically speaking, is signifcantly different from the original distance metric given to us.