This document presents the Expected Goals (xG) Calculator developed for CBU Men’s Soccer. The model utilizes a logistic regression framework, consistent with industry standards set by organizations such as StatsBomb and Opta.
By analyzing key shot features - distance, angle, body part, and defensive pressure - this tool provides a quantitative measure of shot quality, enabling coaches to evaluate finishing efficiency and tactical positioning.
The model is built on established sports analytics research, specifically focusing on the following features:
Accuracy: MAE = 0.039 | 86% of predictions within +/- 0.10. Validation: Validated against a dataset of 35 professional-level shots.
The following R function implements the logistic regression model. It includes a special case for penalties, which are assigned a fixed xG of 0.79 based on historical conversion rates.
calculate_xg <- function(distance, angle, body_part = "foot",
foot_strength = "strong", pressure = "low") {
if (is.na(distance) || distance <= 0 || distance > 60)
return(NA)
if (is.na(angle) || angle < 0 || angle > 90)
return(NA)
# Penalty special case
if (distance >= 10.5 && distance <= 11.5 &&
angle >= 85 &&
tolower(body_part) == "foot" &&
tolower(pressure) == "low") {
return(0.79)
}
# Logistic regression coefficients
b0 <- 0.526451
b1 <- -0.038901 # distance
b2 <- 0.419586 # sin(angle)
b3 <- -0.005172 # distance^2
b4 <- -0.027214 # distance * sin(angle)
b5 <- -0.934813 # header
b6 <- -0.062250 # weak foot
b7 <- -0.935583 # pressure level
angle_rad <- angle * pi / 180
linear <- b0 +
b1 * distance +
b2 * sin(angle_rad) +
b3 * distance^2 +
b4 * distance * sin(angle_rad) +
b5 * (if (tolower(body_part) %in% c("header","head")) 1 else 0) +
b6 * (if (tolower(body_part) == "foot" && tolower(foot_strength) == "weak") 1 else 0) +
b7 * switch(tolower(pressure), "low" = 0, "medium" = 1, "high" = 2, 0)
return(round(max(0.01, min(0.95, 1 / (1 + exp(-linear)))), 4))
}
A critical component of the portfolio is the ability to visualize
shot data. The create_shot_map function generates a
professional-grade representation of the attacking third.
create_shot_map <- function(shots_df, match_info = NULL) {
if (nrow(shots_df) == 0) return(NULL)
# Standardize column names to lowercase to prevent "object not found" errors
colnames(shots_df) <- tolower(colnames(shots_df))
# Coordinate calculation
safe_angle <- pmax(shots_df$angle, 1)
shots_df$y <- shots_df$distance
shots_df$x <- shots_df$distance * tan((90 - safe_angle) * pi / 180)
# Ensure 'goal' column is treated as a factor for mapping
shots_df$goal <- factor(ifelse(tolower(shots_df$goal) == "yes", "Yes", "No"),
levels = c("No", "Yes"))
pitch_color <- "#196B2D"
line_color <- "white"
ggplot() +
# Pitch background and lines
geom_rect(aes(xmin = -34, xmax = 34, ymin = -2, ymax = 35),
fill = pitch_color, color = NA) +
geom_segment(aes(x = -34, xend = -34, y = -2, yend = 35), color = line_color, linewidth = 1.2) +
geom_segment(aes(x = 34, xend = 34, y = -2, yend = 35), color = line_color, linewidth = 1.2) +
geom_segment(aes(x = -34, xend = 34, y = 35, yend = 35),
color = line_color, linewidth = 0.7, linetype = "dashed") +
geom_rect(aes(xmin = -16.5, xmax = 16.5, ymin = 0, ymax = 16.5),
fill = NA, color = line_color, linewidth = 1.2) +
geom_rect(aes(xmin = -5.5, xmax = 5.5, ymin = 0, ymax = 5.5),
fill = NA, color = line_color, linewidth = 1.2) +
geom_rect(aes(xmin = -3.66, xmax = 3.66, ymin = -1.5, ymax = 0),
fill = "white", color = line_color, linewidth = 1.8) +
geom_point(aes(x = 0, y = 11), color = line_color, size = 1.8) +
# Shot points:
# Shape 21: Filled circle (Goal)
# Shape 1: Hollow circle (Missed)
geom_point(data = shots_df,
aes(x = x, y = y, size = xg, color = xg, shape = goal, fill = xg),
stroke = 1.5) +
# Scales and Legend
scale_color_gradient(low = "#FFF176", high = "#D32F2F", name = "xG Value", limits = c(0, 1)) +
scale_fill_gradient(low = "#FFF176", high = "#D32F2F", name = "xG Value", limits = c(0, 1)) +
scale_size_continuous(range = c(2, 8), guide = "none") +
scale_shape_manual(values = c("No" = 1, "Yes" = 21),
name = "Outcome",
labels = c("No" = "Missed", "Yes" = "Goal")) +
# Appearance
coord_fixed(ratio = 1, xlim = c(-28, 28), ylim = c(-2, 35)) +
labs(
title = "Shot Map Analysis",
subtitle = "CBU Men's Soccer - Performance Visualization",
x = NULL, y = NULL
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 16),
panel.grid = element_blank(),
axis.text = element_blank(),
legend.position = "bottom",
legend.box = "horizontal"
) +
# Ensure legend shows the distinction clearly
guides(
shape = guide_legend(override.aes = list(size = 4, color = "black", fill = "black")),
color = guide_colorbar(title.position = "top", title.hjust = 0.5),
fill = "none"
)
}
Below are sample shots processed through the model to demonstrate its accuracy and responsiveness to different match situations.
example_shots <- data.frame(
Player = c("Penalty", "Kennedy", "Beto", "Welbeck", "Jimenez", "Borre", "Gustavo", "Gyokeres"),
Distance = c(11.0, 8.6, 5.0, 4.7, 5.0, 5.2, 5.5, 16.5),
Angle = c(90, 81, 83, 86, 70, 54, 85, 89),
Body = c("foot", "foot", "foot", "foot", "foot", "foot", "foot", "foot"),
Pressure = c("low", "low", "low", "medium", "medium", "medium", "high", "medium"),
Goal = c("Yes", "No", "Yes", "No", "No", "Yes", "No", "No")
)
example_shots$xg <- mapply(calculate_xg,
example_shots$Distance,
example_shots$Angle,
example_shots$Body,
MoreArgs = list(foot_strength = "strong"))
The following table summarizes the input data and the resulting xG calculations for the analyzed shots.
| Player | Distance | Angle | Body | Pressure | Goal | xg |
|---|---|---|---|---|---|---|
| Penalty | 11.0 | 90 | foot | low | Yes | 0.7900 |
| Kennedy | 8.6 | 81 | foot | low | No | 0.4982 |
| Beto | 5.0 | 83 | foot | low | Yes | 0.6187 |
| Welbeck | 4.7 | 86 | foot | medium | No | 0.6272 |
| Jimenez | 5.0 | 70 | foot | medium | No | 0.6152 |
| Borre | 5.2 | 54 | foot | medium | Yes | 0.6009 |
| Gustavo | 5.5 | 85 | foot | high | No | 0.6047 |
| Gyokeres | 16.5 | 89 | foot | medium | No | 0.1747 |
The CBU xG Calculator serves as a robust tool for objective performance analysis. By bridging the gap between raw match data and actionable insights, it allows for a deeper understanding of scoring opportunities and player efficiency.
Developed by Luigi Demasi for CBU Men’s Soccer.