Executive Summary

This document presents the Expected Goals (xG) Calculator developed for CBU Men’s Soccer. The model utilizes a logistic regression framework, consistent with industry standards set by organizations such as StatsBomb and Opta.

By analyzing key shot features - distance, angle, body part, and defensive pressure - this tool provides a quantitative measure of shot quality, enabling coaches to evaluate finishing efficiency and tactical positioning.

Methodology

The model is built on established sports analytics research, specifically focusing on the following features:

Model Performance

Accuracy: MAE = 0.039 | 86% of predictions within +/- 0.10. Validation: Validated against a dataset of 35 professional-level shots.

Core xG Calculation Engine

The following R function implements the logistic regression model. It includes a special case for penalties, which are assigned a fixed xG of 0.79 based on historical conversion rates.

calculate_xg <- function(distance, angle, body_part = "foot",
                         foot_strength = "strong", pressure = "low") {

  if (is.na(distance) || distance <= 0 || distance > 60)
    return(NA)
  if (is.na(angle) || angle < 0 || angle > 90)
    return(NA)

  # Penalty special case
  if (distance >= 10.5 && distance <= 11.5 &&
      angle >= 85 &&
      tolower(body_part) == "foot" &&
      tolower(pressure) == "low") {
    return(0.79)
  }

  # Logistic regression coefficients
  b0 <-  0.526451
  b1 <- -0.038901   # distance
  b2 <-  0.419586   # sin(angle)
  b3 <- -0.005172   # distance^2
  b4 <- -0.027214   # distance * sin(angle)
  b5 <- -0.934813   # header
  b6 <- -0.062250   # weak foot
  b7 <- -0.935583   # pressure level

  angle_rad <- angle * pi / 180

  linear <- b0 +
            b1 * distance +
            b2 * sin(angle_rad) +
            b3 * distance^2 +
            b4 * distance * sin(angle_rad) +
            b5 * (if (tolower(body_part) %in% c("header","head")) 1 else 0) +
            b6 * (if (tolower(body_part) == "foot" && tolower(foot_strength) == "weak") 1 else 0) +
            b7 * switch(tolower(pressure), "low" = 0, "medium" = 1, "high" = 2, 0)

  return(round(max(0.01, min(0.95, 1 / (1 + exp(-linear)))), 4))
}

Visualization: Shot Map

A critical component of the portfolio is the ability to visualize shot data. The create_shot_map function generates a professional-grade representation of the attacking third.

create_shot_map <- function(shots_df, match_info = NULL) {
  if (nrow(shots_df) == 0) return(NULL)

  # Standardize column names to lowercase to prevent "object not found" errors
  colnames(shots_df) <- tolower(colnames(shots_df))
  
  # Coordinate calculation
  safe_angle <- pmax(shots_df$angle, 1)
  shots_df$y <- shots_df$distance
  shots_df$x <- shots_df$distance * tan((90 - safe_angle) * pi / 180)
  
  # Ensure 'goal' column is treated as a factor for mapping
  shots_df$goal <- factor(ifelse(tolower(shots_df$goal) == "yes", "Yes", "No"), 
                         levels = c("No", "Yes"))

  pitch_color <- "#196B2D"
  line_color  <- "white"

  ggplot() +
    # Pitch background and lines
    geom_rect(aes(xmin = -34, xmax = 34, ymin = -2, ymax = 35),
              fill = pitch_color, color = NA) +
    geom_segment(aes(x = -34, xend = -34, y = -2, yend = 35), color = line_color, linewidth = 1.2) +
    geom_segment(aes(x =  34, xend =  34, y = -2, yend = 35), color = line_color, linewidth = 1.2) +
    geom_segment(aes(x = -34, xend = 34, y = 35, yend = 35),
                 color = line_color, linewidth = 0.7, linetype = "dashed") +
    geom_rect(aes(xmin = -16.5, xmax = 16.5, ymin = 0, ymax = 16.5),
              fill = NA, color = line_color, linewidth = 1.2) +
    geom_rect(aes(xmin = -5.5, xmax = 5.5, ymin = 0, ymax = 5.5),
              fill = NA, color = line_color, linewidth = 1.2) +
    geom_rect(aes(xmin = -3.66, xmax = 3.66, ymin = -1.5, ymax = 0),
              fill = "white", color = line_color, linewidth = 1.8) +
    geom_point(aes(x = 0, y = 11), color = line_color, size = 1.8) +
    
    # Shot points:
    # Shape 21: Filled circle (Goal)
    # Shape 1: Hollow circle (Missed)
    geom_point(data = shots_df,
               aes(x = x, y = y, size = xg, color = xg, shape = goal, fill = xg),
               stroke = 1.5) +
    
    # Scales and Legend
    scale_color_gradient(low = "#FFF176", high = "#D32F2F", name = "xG Value", limits = c(0, 1)) +
    scale_fill_gradient(low = "#FFF176", high = "#D32F2F", name = "xG Value", limits = c(0, 1)) +
    scale_size_continuous(range = c(2, 8), guide = "none") +
    scale_shape_manual(values = c("No" = 1, "Yes" = 21),
                       name = "Outcome",
                       labels = c("No" = "Missed", "Yes" = "Goal")) +
    
    # Appearance
    coord_fixed(ratio = 1, xlim = c(-28, 28), ylim = c(-2, 35)) +
    labs(
      title    = "Shot Map Analysis",
      subtitle = "CBU Men's Soccer - Performance Visualization",
      x = NULL, y = NULL
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 16),
      panel.grid = element_blank(),
      axis.text = element_blank(),
      legend.position = "bottom",
      legend.box = "horizontal"
    ) +
    # Ensure legend shows the distinction clearly
    guides(
      shape = guide_legend(override.aes = list(size = 4, color = "black", fill = "black")),
      color = guide_colorbar(title.position = "top", title.hjust = 0.5),
      fill = "none"
    )
}

Model Validation & Examples

Below are sample shots processed through the model to demonstrate its accuracy and responsiveness to different match situations.

example_shots <- data.frame(
  Player   = c("Penalty", "Kennedy", "Beto", "Welbeck", "Jimenez", "Borre", "Gustavo", "Gyokeres"),
  Distance = c(11.0, 8.6, 5.0, 4.7, 5.0, 5.2, 5.5, 16.5),
  Angle    = c(90, 81, 83, 86, 70, 54, 85, 89),
  Body     = c("foot", "foot", "foot", "foot", "foot", "foot", "foot", "foot"),
  Pressure = c("low", "low", "low", "medium", "medium", "medium", "high", "medium"),
  Goal     = c("Yes", "No", "Yes", "No", "No", "Yes", "No", "No")
)

example_shots$xg <- mapply(calculate_xg, 
                           example_shots$Distance, 
                           example_shots$Angle, 
                           example_shots$Body, 
                           MoreArgs = list(foot_strength = "strong"))

Visualizing Sample Data

Data Input Summary

The following table summarizes the input data and the resulting xG calculations for the analyzed shots.

Summary of Shot Data Inputs and Calculated xG
Player Distance Angle Body Pressure Goal xg
Penalty 11.0 90 foot low Yes 0.7900
Kennedy 8.6 81 foot low No 0.4982
Beto 5.0 83 foot low Yes 0.6187
Welbeck 4.7 86 foot medium No 0.6272
Jimenez 5.0 70 foot medium No 0.6152
Borre 5.2 54 foot medium Yes 0.6009
Gustavo 5.5 85 foot high No 0.6047
Gyokeres 16.5 89 foot medium No 0.1747

Conclusion

The CBU xG Calculator serves as a robust tool for objective performance analysis. By bridging the gap between raw match data and actionable insights, it allows for a deeper understanding of scoring opportunities and player efficiency.


Developed by Luigi Demasi for CBU Men’s Soccer.