# A tibble: 45 × 4
Comp Season post_VAR n
<chr> <chr> <dbl> <int>
1 de Bundesliga 2016-2017 0 18
2 de Bundesliga 2017-2018 1 18
3 de Bundesliga 2018-2019 1 18
4 de Bundesliga 2019-2020 1 18
5 de Bundesliga 2020-2021 1 18
6 de Bundesliga 2021-2022 1 18
7 de Bundesliga 2022-2023 1 18
8 de Bundesliga 2023-2024 1 18
9 de Bundesliga 2024-2025 1 18
10 eng Premier League 2016-2017 0 20
11 eng Premier League 2017-2018 0 20
12 eng Premier League 2018-2019 0 20
13 eng Premier League 2019-2020 1 20
14 eng Premier League 2020-2021 1 20
15 eng Premier League 2021-2022 1 20
16 eng Premier League 2022-2023 1 20
17 eng Premier League 2023-2024 1 20
18 eng Premier League 2024-2025 1 20
19 es La Liga 2016-2017 0 20
20 es La Liga 2017-2018 0 20
21 es La Liga 2018-2019 1 20
22 es La Liga 2019-2020 1 20
23 es La Liga 2020-2021 1 20
24 es La Liga 2021-2022 1 20
25 es La Liga 2022-2023 1 20
26 es La Liga 2023-2024 1 20
27 es La Liga 2024-2025 1 20
28 fr Ligue 1 2016-2017 0 20
29 fr Ligue 1 2017-2018 0 20
30 fr Ligue 1 2018-2019 1 20
31 fr Ligue 1 2019-2020 1 20
32 fr Ligue 1 2020-2021 1 20
33 fr Ligue 1 2021-2022 1 20
34 fr Ligue 1 2022-2023 1 20
35 fr Ligue 1 2023-2024 1 18
36 fr Ligue 1 2024-2025 1 18
37 it Serie A 2016-2017 0 20
38 it Serie A 2017-2018 1 20
39 it Serie A 2018-2019 1 20
40 it Serie A 2019-2020 1 20
41 it Serie A 2020-2021 1 20
42 it Serie A 2021-2022 1 20
43 it Serie A 2022-2023 1 20
44 it Serie A 2023-2024 1 20
45 it Serie A 2024-2025 1 20
The Effect of VAR Technology on Referee Decisions in European Soccer
Abstract
In 2018, professional soccer saw an enormous change when the VAR system was introduced. VAR stands for Video Assistant Referee. The actual VAR system is comprised of a video-assistant, as well as other human assistants, who review key events and decisions throughout the game. The system relies on video replay to correct clear errors made by the referee on the field. Our works seeks to identify and discuss the effect of the introduction of VAR on referee decisions: second yellow cards, red cards, penalty kicks and offsides in professional European soccer leagues, comparing seasons without VAR and seasons with VAR adoption.
The introduction of the VAR system to professional soccer between the years 2018 and 2020 was not met without controversy. It was long debated whether implementing the technology to the game would be beneficial, and is still being debated to this day.
chimichangaboii, “Debate.”
While public opinion of VAR seems to be improving, some hold the belief that this modern technology is benefitting certain teams over others.
Kuvvetli et al., “Is Video Assistant Referee (VAR) a Disadvantage for the Strong and a Protection for the Weak?”
With AI and technology in general at the forefront of every news article and discussion, the conversation around the impact of implementing technology in sports could not come at a more appropriate time. We hope the results of our research can help guide these conversations further by providing some statistical evidence around the impact of the VAR system.
Some have tried to quantify the impact of the VAR system by conducting research from one specific league.
Brown, “The Impact of Video Assistant Referee (VAR) on the English Premier League.”
Or even one specific tournament -
Zhang et al., “The Effect of the Video Assistant Referee (VAR) on Referees’ Decisions at FIFA Women’s World Cups.”
However, our attempt at answering this question will measure multiple professional leagues across multiple years. Our goal is not only to see if the VAR system has had a statistically significant impact on one professional soccer league, but multiple across Europe. Other models created have also failed to mention the potential for multilinearity in the data. For example, there could be data that’s being overrepresented between fouls and and yellow/red cards awarded. Our models will account for this possibility as best as we can, in theory creating a more robust estimate as to the impact of the VAR system.
1. Data and Variable Construction
1.1 Data Sources
Our dataset was sourced from FBref and contains 878 team-season observations across five major European leagues: the English Premier League, Italian Serie A, German Bundesliga, Spanish La Liga, and French Ligue 1. The data spans nine seasons from 2016-17 through 2024-25, with each row representing a single team’s performance statistics for a given season.
1.2 Variable Construction
From the full dataset, we selected ten variables relevant to our research question: season, team, league, yellow cards, red cards, fouls committed, offsides, penalties won, penalties conceded, and possession percentage.
Red cards contained 63 missing values concentrated in early seasons, which we addressed through median imputation. We then constructed several additional variables. The binary indicator post_VAR equals one for all team-season observations falling after a league’s VAR adoption date and zero otherwise. A COVID indicator flags the 2020-21 season, which was played under unusual conditions due to the pandemic and is included as a control in all models. We also constructed season_num as a numeric time index, first_treat as a numeric variable indicating the first treated season for each league’s adoption cohort, and team_id as a numeric team identifier. All three are required for the Callaway & Sant’Anna estimator introduced in Section 4.
1.3 Treatment Variable Verification
VAR adoption dates varied across leagues: the Bundesliga and Serie A were the earliest adopters in 2017-18, followed by Ligue 1 and La Liga in 2018-19, and the Premier League last in 2019-20. This staggered rollout is central to our identification strategy, as it provides natural variation in treatment timing across leagues.
1.4 Sample Description
The final dataset contains 878 team-season observations. Table 1 presents summary statistics for our key outcome variables split by VAR status.
| Pre-VAR (N=178) | Post-VAR (N=700) | |||
|---|---|---|---|---|
| Mean | Std. Dev. | Mean | Std. Dev. | |
| CrdY | 76.2 | 18.2 | 76.2 | 17.8 |
| CrdR | 2.7 | 2.0 | 3.6 | 2.3 |
| Fls | 472.4 | 73.6 | 453.4 | 69.1 |
| Off | 83.4 | 18.8 | 66.5 | 16.0 |
| PKwon | 4.9 | 2.7 | 4.5 | 2.4 |
| PKcon | 5.8 | 2.7 | 6.0 | 2.8 |
| Poss | 50.0 | 6.9 | 50.0 | 6.3 |
Yellow cards showed virtually no change between the pre and post VAR periods, averaging 76.2 per team-season in both groups. Red cards increased slightly after VAR adoption, rising from 2.7 to 3.6 per team-season. Fouls committed declined modestly from 472 to 453 per team-season post-VAR. The most notable shift was in offsides, which dropped from 83.4 to 66.5 per team-season. Penalty outcomes remained relatively stable across both periods.
2. Empirical Strategy and Assumption Testing
Our identification strategy relies on a two-way fixed effects difference-in-differences design, exploiting the staggered adoption of VAR across leagues. The estimating equation is:
\[Y_{it} = \beta_0 + \beta_1 \text{VAR}_{lt} + \beta_2 \text{COVID}_t + \gamma_i + \lambda_t + \varepsilon_{it}\]
\[\text{where } \gamma_i = \text{team FE}, \quad \lambda_t = \text{season FE}\] Our identification strategy relies on a two-way fixed effects difference-in-differences design, exploiting the staggered adoption of VAR across leagues. The estimating equation is:
where \(Y_{it}\) is the outcome for team \(i\) in season \(t\), \(\text{VAR}_{lt}\) is an indicator for whether league \(l\) has adopted VAR in season \(t\), \(\text{COVID}_t\) is a binary indicator for the 2020-21 season which was played under unusual conditions and is included to avoid conflating pandemic effects with VAR effects, \(\gamma_i\) are team fixed effects, \(\lambda_t\) are season fixed effects, and standard errors are clustered at the league level.
The key identifying assumption is parallel trends: in the absence of VAR adoption, treated and control groups would have followed the same trend over time. We test this assumption using event studies and formal pre-trends tests. As an additional robustness check we also implement the Callaway and Sant’Anna estimator in Section 4, which accounts for the staggered nature of VAR adoption across leagues and avoids using already-treated leagues as controls.
2.1 Event Study Analysis
We estimate event study specifications with leads and lags relative to VAR adoption to formally test for pre-treatment trends and examine treatment dynamics.
# Event Study: Red Cards
event_study_red <- feols(
CrdR ~ i(time_to_VAR, ref = -1) + COVID | Team + Season,
data = df, cluster = ~Comp
)
iplot(event_study_red,
main = "Event Study: Effect of VAR on Red Cards",
xlab = "Seasons Relative to VAR Adoption",
ylab = "Coefficient Estimate"
)The event study for red cards shows pre-treatment coefficients clustered near zero with wide confidence intervals crossing zero in all periods. This indicates no systematic divergence between leagues before VAR adoption, supporting the parallel trends assumption.
# Event Study: Yellow Cards
event_study_yellow <- feols(
CrdY ~ i(time_to_VAR, ref = -1) + COVID | Team + Season,
data = df, cluster = ~Comp
)
iplot(event_study_yellow,
main = "Event Study: Effect of VAR on Yellow Cards",
xlab = "Seasons Relative to VAR Adoption",
ylab = "Coefficient Estimate")The event study for yellow cards similarly shows pre-treatment coefficients near zero with no discernible trend. The wide confidence intervals throughout reflect the high variance in yellow card totals across teams and seasons, consistent with our null result in the main models.
# Event Study: Offsides
event_study_off <- feols(
Off ~ i(time_to_VAR, ref = -1) + COVID | Team + Season,
data = df, cluster = ~Comp
)
iplot(event_study_off,
main = "Event Study: Effect of VAR on Offsides",
xlab = "Seasons Relative to VAR Adoption",
ylab = "Coefficient Estimate")The event study for offsides shows a more nuanced pattern. Pre-treatment coefficients are slightly negative and show a mild downward drift before VAR adoption, which explains the borderline parallel trends test result (p = 0.071). This suggests that offsides were already declining modestly before VAR was introduced, and readers should interpret the post-adoption estimates cautiously as they may partially reflect a continuation of this pre-existing trend rather than a pure VAR effect.
2.2 Parallel Trends Test Results
Table 2 presents joint Wald tests of the null hypothesis that pre-treatment coefficients (t = -3 and t = -2) are jointly zero.
# Joint test for BOTH pre-treatment periods
wald_red <- wald(event_study_red, c("time_to_VAR::-3", "time_to_VAR::-2"))Wald test, H0: joint nullity of time_to_VAR::-3 and time_to_VAR::-2
stat = 0.461194, p-value = 0.660341, on 2 and 4 DoF, VCOV: Clustered (Comp).
wald_yellow <- wald(event_study_yellow, c("time_to_VAR::-3", "time_to_VAR::-2"))Wald test, H0: joint nullity of time_to_VAR::-3 and time_to_VAR::-2
stat = 0.750984, p-value = 0.528547, on 2 and 4 DoF, VCOV: Clustered (Comp).
wald_off <- wald(event_study_off, c("time_to_VAR::-3", "time_to_VAR::-2"))Wald test, H0: joint nullity of time_to_VAR::-3 and time_to_VAR::-2
stat = 5.51081, p-value = 0.070907, on 2 and 4 DoF, VCOV: Clustered (Comp).
wald_pk <- wald(event_study_pk, c("time_to_VAR::-3", "time_to_VAR::-2"))Wald test, H0: joint nullity of time_to_VAR::-3 and time_to_VAR::-2
stat = 12.0, p-value = 0.020317, on 2 and 4 DoF, VCOV: Clustered (Comp).
# Create summary table
pretrend_table <- data.frame(
Outcome = c("Red Cards", "Yellow Cards", "Offsides", "Penalties"),
F_statistic = c(
round(wald_red$stat, 3),
round(wald_yellow$stat, 3),
round(wald_off$stat, 3),
round(wald_pk$stat, 3)
),
P_value = c(
round(wald_red$p, 3),
round(wald_yellow$p, 3),
round(wald_off$p, 3),
round(wald_pk$p, 3)
),
Assessment = c("Pass", "Pass", "Pass (borderline)*", "Fail"),
Main_Analysis = c("Yes", "Yes", "Yes", "No")
)
kable(pretrend_table,
caption = "Table 2: Joint Wald Test of Pre-Treatment Coefficients",
col.names = c("Outcome", "F-Statistic", "P-Value", "Assessment", "Main Analysis?"),
align = c('l', 'r', 'r', 'c', 'c'))| Outcome | F-Statistic | P-Value | Assessment | Main Analysis? |
|---|---|---|---|---|
| Red Cards | 0.461 | 0.660 | Pass | Yes |
| Yellow Cards | 0.751 | 0.529 | Pass | Yes |
| Offsides | 5.511 | 0.071 | Pass (borderline)* | Yes |
| Penalties | 12.031 | 0.020 | Fail | No |
Red cards and yellow cards clearly satisfy the parallel trends assumption (p = 0.660 and p = 0.529 respectively). Offsides show borderline evidence of pre-existing trends (p = 0.071), just above the conventional 5% threshold, which is consistent with the mild downward drift visible in the event study plot. We include offsides in our main analysis but interpret results cautiously. Penalties conceded show clear evidence of pre-existing trends (p = 0.020) and are excluded from causal analysis, though results are presented in Appendix A.
3. Models and Results
We estimate two-way fixed effects models for each outcome, with standard errors clustered at the league level to account for arbitrary correlation among teams within the same league over time.
3.1 Effect on Red Cards
We find no evidence that VAR adoption affects red card frequency.
rcard_model1 <- feols(CrdR ~ post_VAR + COVID | Team + Season,
data = df, cluster = ~Comp)
rcard_model2 <- feols(CrdR ~ post_VAR + Fls + COVID | Team + Season,
data = df, cluster = ~Comp)
etable(rcard_model1, rcard_model2,
title = "Table 3: Effect of VAR on Red Cards",
dict = c("post_VAR" = "VAR Adoption",
"Fls" = "Fouls Committed",
"COVID" = "COVID Season")) rcard_model1 rcard_model2
Dependent Var.: CrdR CrdR
VAR Adoption 0.0637 (0.3239) -0.0782 (0.3343)
Fouls Committed 0.0074* (0.0021)
Fixed-Effects: --------------- ----------------
Team Yes Yes
Season Yes Yes
_______________ _______________ ________________
S.E.: Clustered by: Comp by: Comp
Observations 857 857
R2 0.35725 0.37440
Within R2 4.52e-5 0.02673
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The equation for our final red card model can be represented as
\[CrdR_{it} = \beta_0 + \beta_1 VAR_{lt} + \beta_2 Fls_{it} + \gamma_i + \lambda_t + \varepsilon_{it}\] The estimated coefficient of -0.078 in our preferred specification represents a negligible 2.8% decrease in red cards following VAR adoption and is not statistically significant (p = 0.826). Fouls committed is positively associated with red cards as expected, significant at the 10% level. The VAR coefficient is stable across both specifications, confirming the null result is not sensitive to the inclusion of controls. Note that the COVID indicator is absorbed by season fixed effects and therefore does not appear separately in the output.
3.2 Effect on Yellow Cards
Yellow cards show a small, statistically insignificant decrease following VAR adoption.
ycard_model1 ycard_model2
Dependent Var.: CrdY CrdY
VAR Adoption -2.350 (5.630) -5.116 (4.099)
Fouls Committed 0.1439*** (0.0152)
Fixed-Effects: -------------- ------------------
Team Yes Yes
Season Yes Yes
_______________ ______________ __________________
S.E.: Clustered by: Comp by: Comp
Observations 857 857
R2 0.64363 0.74386
Within R2 0.00170 0.28249
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The equation for our final model can be represented as
\[CrdY_{it} = \beta_0 + \beta_1 VAR_{lt} + \beta_2 Fls_{it} + \gamma_i + \lambda_t + \varepsilon_{it}\] The coefficient of -5.116 in our preferred specification represents a 6.7% decrease in yellow cards following VAR adoption, though this effect is not statistically significant (p = 0.280). Fouls committed is strongly positively associated with yellow cards, significant at the 0.1% level, which makes intuitive sense as referees book players primarily for fouls. The VAR coefficient is stable across both specifications. This null result is consistent with the institutional design of VAR, which does not permit review of yellow card decisions under current IFAB rules, and serves as a validation of our empirical approach.
3.3 Effect on Offsides
Offsides show a substantial and statistically significant decline following VAR adoption.
off_model1 <- feols(Off ~ post_VAR + COVID | Team + Season,
data = df, cluster = ~Comp)
off_model2 <- feols(Off ~ post_VAR + Poss + COVID | Team + Season,
data = df, cluster = ~Comp)
etable(off_model1, off_model2,
title = "Table 5: Effect of VAR on Offsides",
dict = c("post_VAR" = "VAR Adoption",
"Poss" = "Possession %",
"COVID" = "COVID Season")) off_model1 off_model2
Dependent Var.: Off Off
VAR Adoption -9.373* (2.844) -9.298* (2.772)
Possession % 0.2954. (0.1282)
Fixed-Effects: --------------- ----------------
Team Yes Yes
Season Yes Yes
_______________ _______________ ________________
S.E.: Clustered by: Comp by: Comp
Observations 857 857
R2 0.57627 0.57846
Within R2 0.02251 0.02758
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The equation for our final offsides model can be represented as
\[Off_{it} = \beta_0 + \beta_1 VAR_{lt} + \beta_2 Poss_{it} + \gamma_i + \lambda_t + \varepsilon_{it}\]
The coefficient of -9.298 in our preferred specification represents an 11.1% reduction in offsides following VAR adoption and is statistically significant (p = 0.028). Possession is positively associated with offsides at the 10% level, which makes sense as teams with more possession push higher up the pitch and trigger more offside calls. The VAR coefficient is stable across both specifications. However, given the borderline parallel trends test (p = 0.071) and the mild downward drift visible in the event study plot, this estimate should be interpreted cautiously. The observed decline may reflect either a causal effect of VAR’s precise offside detection technology or a continuation of pre-existing tactical shifts toward higher defensive lines.
3.4 Effect Size Interpretation
Table 6 presents effect sizes as both percentage changes relative to pre-VAR means and standardized effects (Cohen’s d).
red_model <- feols(CrdR ~ post_VAR + Fls | Team + Season,
data = df, cluster = ~Comp)
yellow_model <- feols(CrdY ~ post_VAR + Fls | Team + Season,
data = df, cluster = ~Comp)
off_model <- feols(Off ~ post_VAR + Poss | Team + Season,
data = df, cluster = ~Comp)
pre_var_means <- df %>%
filter(post_VAR == 0) %>%
summarise(
mean_red = mean(CrdR, na.rm = TRUE),
sd_red = sd(CrdR, na.rm = TRUE),
mean_yellow = mean(CrdY, na.rm = TRUE),
sd_yellow = sd(CrdY, na.rm = TRUE),
mean_off = mean(Off, na.rm = TRUE),
sd_off = sd(Off, na.rm = TRUE)
)
red_coef <- coef(red_model)["post_VAR"]
red_se <- se(red_model)["post_VAR"]
red_p <- pvalue(red_model)["post_VAR"]
yellow_coef <- coef(yellow_model)["post_VAR"]
yellow_se <- se(yellow_model)["post_VAR"]
yellow_p <- pvalue(yellow_model)["post_VAR"]
off_coef <- coef(off_model)["post_VAR"]
off_se <- se(off_model)["post_VAR"]
off_p <- pvalue(off_model)["post_VAR"]
red_pct <- (red_coef / pre_var_means$mean_red) * 100
yellow_pct <- (yellow_coef / pre_var_means$mean_yellow) * 100
off_pct <- (off_coef / pre_var_means$mean_off) * 100
red_std <- red_coef / pre_var_means$sd_red
yellow_std <- yellow_coef / pre_var_means$sd_yellow
off_std <- off_coef / pre_var_means$sd_off
effect_sizes <- data.frame(
Outcome = c("Red Cards", "Yellow Cards", "Offsides"),
Pre_VAR_Mean = round(c(pre_var_means$mean_red,
pre_var_means$mean_yellow,
pre_var_means$mean_off), 2),
Coefficient = round(c(red_coef, yellow_coef, off_coef), 3),
Std_Error = round(c(red_se, yellow_se, off_se), 3),
P_value = round(c(red_p, yellow_p, off_p), 3),
Pct_Change = paste0(round(c(red_pct, yellow_pct, off_pct), 1), "%"),
Std_Effect = round(c(red_std, yellow_std, off_std), 3),
Interpretation = c("Negligible effect", "Small decrease", "Small decrease")
)
kable(effect_sizes,
caption = "Table 6: Effect Sizes and Practical Significance",
align = c('l', 'r', 'r', 'r', 'r', 'r', 'r', 'l'))| Outcome | Pre_VAR_Mean | Coefficient | Std_Error | P_value | Pct_Change | Std_Effect | Interpretation |
|---|---|---|---|---|---|---|---|
| Red Cards | 2.75 | -0.078 | 0.334 | 0.826 | -2.8% | -0.039 | Negligible effect |
| Yellow Cards | 76.22 | -5.116 | 4.099 | 0.280 | -6.7% | -0.282 | Small decrease |
| Offsides | 83.42 | -9.298 | 2.772 | 0.028 | -11.1% | -0.495 | Small decrease |
Red cards show a negligible 2.8% decrease that is not statistically significant (p = 0.826). Yellow cards show a small 6.7% decrease, also not significant (p = 0.280). Offsides show an 11.1% decrease that is statistically significant (p = 0.028), with a standardized effect size of -0.495 SD approaching a medium effect. These effect sizes should be interpreted in light of the parallel trends caveat for offsides discussed in Section 2.2.
4. Robustness Checks
4.1 Sensitivity to Sample Composition
We test whether results are driven by a single league by re-estimating each model while sequentially excluding each of the five leagues.
leagues <- unique(df$Comp)
robustness_results <- data.frame()
# Red Cards
for (league in leagues) {
temp_df <- df %>% filter(Comp != league)
model <- feols(CrdR ~ post_VAR + Fls | Team + Season,
data = temp_df, cluster = ~Comp)
robustness_results <- rbind(robustness_results, data.frame(
Outcome = "Red Cards",
Excluded_League = league,
Coefficient = round(coef(model)["post_VAR"], 3),
Std_Error = round(se(model)["post_VAR"], 3),
P_value = round(pvalue(model)["post_VAR"], 3)
))
}
# Yellow Cards
for (league in leagues) {
temp_df <- df %>% filter(Comp != league)
model <- feols(CrdY ~ post_VAR + Fls | Team + Season,
data = temp_df, cluster = ~Comp)
robustness_results <- rbind(robustness_results, data.frame(
Outcome = "Yellow Cards",
Excluded_League = league,
Coefficient = round(coef(model)["post_VAR"], 3),
Std_Error = round(se(model)["post_VAR"], 3),
P_value = round(pvalue(model)["post_VAR"], 3)
))
}
# Offsides
for (league in leagues) {
temp_df <- df %>% filter(Comp != league)
model <- feols(Off ~ post_VAR + Poss | Team + Season,
data = temp_df, cluster = ~Comp)
robustness_results <- rbind(robustness_results, data.frame(
Outcome = "Offsides",
Excluded_League = league,
Coefficient = round(coef(model)["post_VAR"], 3),
Std_Error = round(se(model)["post_VAR"], 3),
P_value = round(pvalue(model)["post_VAR"], 3)
))
}
kable(robustness_results,
caption = "Table 7: Robustness to Excluding Individual Leagues",
align = c('l', 'l', 'r', 'r', 'r'),
row.names = FALSE)| Outcome | Excluded_League | Coefficient | Std_Error | P_value |
|---|---|---|---|---|
| Red Cards | eng Premier League | 0.409 | 0.568 | 0.524 |
| Red Cards | it Serie A | 0.107 | 0.382 | 0.797 |
| Red Cards | fr Ligue 1 | -0.070 | 0.514 | 0.900 |
| Red Cards | es La Liga | -0.361 | 0.193 | 0.158 |
| Red Cards | de Bundesliga | -0.221 | 0.119 | 0.159 |
| Yellow Cards | eng Premier League | -11.013 | 3.299 | 0.044 |
| Yellow Cards | it Serie A | -6.885 | 5.053 | 0.266 |
| Yellow Cards | fr Ligue 1 | -3.441 | 3.717 | 0.423 |
| Yellow Cards | es La Liga | -5.465 | 4.048 | 0.270 |
| Yellow Cards | de Bundesliga | -1.013 | 2.745 | 0.737 |
| Offsides | eng Premier League | -10.875 | 4.545 | 0.096 |
| Offsides | it Serie A | -11.979 | 2.145 | 0.011 |
| Offsides | fr Ligue 1 | -7.728 | 2.415 | 0.049 |
| Offsides | es La Liga | -9.591 | 3.108 | 0.054 |
| Offsides | de Bundesliga | -7.075 | 1.815 | 0.030 |
Red card coefficients range from -0.361 to 0.409 across all league exclusions, none statistically significant, confirming the null result is not driven by any single league. Yellow cards show some sensitivity when excluding the Premier League (p = 0.044), though all other specifications remain non-significant. Offsides coefficients are consistently negative and largely significant across all exclusions, demonstrating robustness of that finding.
4.2 Sensitivity to Control Variables
Table 8 compares specifications with and without control variables.
red_no_controls <- feols(CrdR ~ post_VAR | Team + Season,
data = df, cluster = ~Comp)
red_with_controls <- feols(CrdR ~ post_VAR + Fls | Team + Season,
data = df, cluster = ~Comp)
yellow_no_controls <- feols(CrdY ~ post_VAR | Team + Season,
data = df, cluster = ~Comp)
yellow_with_controls <- feols(CrdY ~ post_VAR + Fls | Team + Season,
data = df, cluster = ~Comp)
off_no_controls <- feols(Off ~ post_VAR | Team + Season,
data = df, cluster = ~Comp)
off_with_controls <- feols(Off ~ post_VAR + Poss | Team + Season,
data = df, cluster = ~Comp)
modelsummary(
list(
"Red (No Controls)" = red_no_controls,
"Red (Controls)" = red_with_controls,
"Yellow (No Controls)" = yellow_no_controls,
"Yellow (Controls)" = yellow_with_controls,
"Offsides (No Controls)" = off_no_controls,
"Offsides (Controls)" = off_with_controls
),
stars = c('*' = .1, '**' = .05, '***' = .01),
coef_map = c("post_VAR" = "VAR Adoption",
"Fls" = "Fouls Committed",
"Poss" = "Possession %"),
gof_map = c("nobs", "r.squared"),
title = "Table 8: Robustness to Control Variables"
)| Red (No Controls) | Red (Controls) | Yellow (No Controls) | Yellow (Controls) | Offsides (No Controls) | Offsides (Controls) | |
|---|---|---|---|---|---|---|
| * p < 0.1, ** p < 0.05, *** p < 0.01 | ||||||
| VAR Adoption | 0.064 | -0.078 | -2.350 | -5.116 | -9.373** | -9.298** |
| (0.324) | (0.334) | (5.630) | (4.099) | (2.844) | (2.772) | |
| Fouls Committed | 0.007** | 0.144*** | ||||
| (0.002) | (0.015) | |||||
| Possession % | 0.295* | |||||
| (0.128) | ||||||
| Num.Obs. | 857 | 857 | 857 | 857 | 857 | 857 |
| R2 | 0.357 | 0.374 | 0.644 | 0.744 | 0.576 | 0.578 |
The inclusion of control variables has minimal impact on estimated VAR coefficients across all outcomes. The VAR coefficient for red cards moves from 0.064 to -0.078, for yellow cards from -2.350 to -5.116, and for offsides from -9.373 to -9.298. In all cases the direction and significance of the VAR estimate is unchanged, strengthening confidence in our findings.
5. Callaway & Sant’Anna Estimator
The standard two-way fixed effects estimator assumes a homogeneous treatment effect across all leagues and time periods. With staggered adoption, however, this assumption may be violated — later-adopting leagues can be used as implicit controls for earlier-adopting ones in ways that distort estimates. To address this, we implement the Callaway and Sant’Anna (2021) estimator, which computes average treatment effects separately for each adoption cohort and compares each cohort only to not-yet-treated leagues at the time of adoption.
# Aggregate to overall ATT
agg_red <- aggte(cs_red, type = "simple")
agg_yellow <- aggte(cs_yellow, type = "simple")
agg_off <- aggte(cs_off, type = "simple")
# Print summary
summary(agg_red)
Call:
aggte(MP = cs_red, type = "simple")
Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
ATT Std. Error [ 95% Conf. Int.]
0.5628 0.4353 -0.2903 1.416
---
Signif. codes: `*' confidence band does not cover 0
Control Group: Not Yet Treated, Anticipation Periods: 0
Estimation Method: Doubly Robust
The Callaway and Sant’Anna estimator yields an ATT of 0.563 for red cards, a small positive estimate that is not statistically significant as the confidence interval includes zero. Notably the sign flips relative to the TWFE estimate of -0.078, suggesting that the standard estimator may have been influenced by contaminated comparisons between already-treated and later-treated leagues. Neither estimate is significant however, so the overall conclusion of no VAR effect on red cards remains unchanged.
summary(agg_yellow)
Call:
aggte(MP = cs_yellow, type = "simple")
Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
ATT Std. Error [ 95% Conf. Int.]
-0.0782 2.6148 -5.2031 5.0467
---
Signif. codes: `*' confidence band does not cover 0
Control Group: Not Yet Treated, Anticipation Periods: 0
Estimation Method: Doubly Robust
The C&S estimator yields an ATT of -0.078 for yellow cards, essentially zero and not statistically significant. This is consistent with the TWFE result and further confirms that VAR adoption had no detectable effect on yellow card frequency, as expected given that VAR cannot review yellow card decisions under current rules.
summary(agg_off)
Call:
aggte(MP = cs_off, type = "simple")
Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
ATT Std. Error [ 95% Conf. Int.]
-13.0354 3.0387 -18.9911 -7.0797 *
---
Signif. codes: `*' confidence band does not cover 0
Control Group: Not Yet Treated, Anticipation Periods: 0
Estimation Method: Doubly Robust
The C&S estimator yields an ATT of -13.035 for offsides, larger in magnitude than the TWFE estimate of -9.298 and statistically significant as the confidence interval excludes zero (-18.886 to -7.185). This suggests the TWFE estimate was actually conservative, once we account for staggered adoption and use only not-yet-treated leagues as controls, the estimated decline in offsides is stronger. This finding strengthens our conclusion that VAR adoption is associated with a meaningful reduction in offsides calls.
ggdid(cs_red, title = "C&S Event Study: Red Cards")The cohort-level plots show pre-treatment estimates close to zero for both adoption cohorts, with post-treatment estimates that include zero in their confidence intervals. This confirms no significant VAR effect on red cards for either the early adopters (Bundesliga and Serie A) or the mid adopters (Ligue 1 and La Liga).
ggdid(cs_yellow, title = "C&S Event Study: Yellow Cards")Group 2 shows a negative pre-treatment estimate and a near-zero post-treatment estimate. Group 3 shows a positive pre-treatment estimate which is a mild concern for parallel trends at the cohort level, though the post-treatment estimate is close to zero. Overall neither cohort shows a significant VAR effect on yellow cards.
ggdid(cs_off, title = "C&S Event Study: Offsides")Both adoption cohorts show substantial negative post-treatment estimates for offsides, with confidence intervals that largely exclude zero. Group 2 (Bundesliga and Serie A) shows a post-treatment estimate around -8, while Group 3 (Ligue 1 and La Liga) shows an even larger decline around -15. The pre-treatment estimate for Group 3 is negative and wide, which is consistent with the borderline parallel trends concern noted earlier. Nevertheless the consistent direction and magnitude of the post-treatment effects across both cohorts strengthens confidence that VAR adoption is associated with a meaningful reduction in offsides calls, corroborating the aggregated ATT of -13.035.
6. Discussion
6.1 Main Findings
First, VAR adoption has no meaningful impact on red card frequency. The TWFE estimate of -0.078 represents a negligible 2.8% decrease and is not statistically significant (p = 0.826). The C&S estimator yields an ATT of 0.563, also not significant. The sign difference between the two estimators suggests the TWFE may have been influenced by contaminated comparisons, though neither result supports a causal effect. The lack of effect likely reflects offsetting mechanisms: VAR enables referees to catch violent conduct they initially missed while also allowing them to overturn incorrect red card decisions.
Second, yellow cards show no significant change following VAR adoption. The TWFE estimate of -5.116 and the C&S ATT of -0.078 are both statistically insignificant. This null result is consistent with the institutional design of VAR, which does not permit review of yellow card decisions under current IFAB rules, and serves as a validation of our empirical approach.
Third, offsides decline substantially following VAR adoption. The TWFE estimate of -9.298 is statistically significant (p = 0.028), and the C&S estimator yields a larger and equally significant ATT of -13.035. The consistency of this finding across both estimators strengthens confidence in the result, though the borderline parallel trends test (p = 0.071) requires cautious interpretation.
6.2 Mechanisms
Why No Effect on Cards?
The null findings for disciplinary cards are notable given VAR’s mandate to review potential red card offenses. For red cards, offsetting mechanisms likely explain the null result: VAR catches serious foul play that referees miss (increasing cards) but also overturns mistaken red card decisions when replays show minimal contact (decreasing cards). Our data cannot separately identify these two channels, but the overall null effect across both TWFE and C&S estimators suggests they roughly balance.
For yellow cards, the null result is expected since yellow card decisions are not reviewable by VAR under current IFAB rules. VAR can only intervene for clear and obvious errors in red card decisions, penalty decisions, goals, and mistaken identity. The fact that we find no effect on yellow cards across both estimators serves as a validation of our empirical approach.
Offsides: Technology or Tactics?
The substantial reduction in offsides is consistent across both estimators, with the C&S ATT of -13.035 being larger than the TWFE estimate of -9.298. This suggests the standard TWFE was actually conservative in its estimate. VAR’s semi-automated offside technology provides millimeter-level precision in detecting offside positions, potentially catching marginal offsides that assistant referees previously missed.
However, our borderline parallel trends test (p = 0.071) suggests offsides were already declining before VAR adoption. This pre-existing trend likely reflects tactical evolution as teams adopted higher defensive lines and faster buildup play. Our estimates may capture both VAR’s detection effect and the continuation of this tactical trend, which we cannot fully disentangle with the available data.
6.3 Limitations
Several limitations should be acknowledged:
Short Pre-Treatment Period. Our pre-treatment window is relatively brief (1-3 seasons depending on league), limiting our ability to assess long-run trends. Early-adopting leagues (Bundesliga, Serie A) have only one pre-VAR season in our data, making it difficult to establish stable baseline trends.
Borderline Parallel Trends for Offsides. While offsides technically satisfies parallel trends (p = 0.071 > 0.05), the result is marginal. Individual pre-treatment coefficients show some evidence of declining offsides before VAR. Readers should interpret the offsides result as suggestive rather than definitive, even after accounting for staggered adoption through the C&S estimator.
Aggregated Data. We observe team-season aggregates rather than match-level data, preventing analysis of heterogeneous effects across match contexts such as high-stakes matches, different referee experience levels, or specific types of incidents.
Balanced Panel Restriction. The Callaway and Sant’Anna estimator requires a balanced panel, which led to dropping 50 teams that did not appear in every season due to promotion and relegation. This may introduce some selection bias if relegated or promoted teams differ systematically in their response to VAR.
Potential Confounders. While our two-way fixed effects specification controls for time-invariant team characteristics and common time trends, we cannot fully rule out league-specific changes that coincided with VAR adoption such as new referee training or disciplinary protocols.
External Validity. Our findings are specific to elite European soccer and may not generalize to lower divisions, other confederations, or other sports considering video review systems.
Penalties Excluded. Penalties conceded violate parallel trends (p = 0.020) and are excluded from our main causal analysis. This is precisely the type of decision VAR is designed to influence, so the exclusion limits our ability to assess VAR’s full impact.
7. Conclusions
This study provides credible causal evidence on VAR’s impact on referee decisions in professional soccer. Using difference-in-differences with staggered treatment adoption across five major European leagues, and validating our results with the Callaway and Sant’Anna estimator, we find that VAR has no detectable effect on red or yellow card frequency, though it is associated with a meaningful reduction in offsides.
The consistency of the offsides finding across both the standard TWFE estimator (-9.298) and the C&S estimator (-13.035) strengthens confidence in this result. The C&S estimator, which avoids using already-treated leagues as controls, actually produces a larger estimated decline, suggesting the TWFE was conservative rather than inflated. However, the borderline parallel trends result for offsides (p = 0.071) means this finding should still be interpreted cautiously.
For cards, the null results are robust across all specifications and both estimators. The yellow card null result in particular serves as a useful validation of our approach, since VAR cannot review yellow card decisions by design.
These findings challenge simple narratives about VAR’s impact. Rather than uniformly changing disciplinary decisions, VAR’s effect appears nuanced with opposing mechanisms potentially balancing out for cards, and possible confounding from tactical evolution for offsides.
Future research with longer pre-treatment periods, match-level data, and more granular measures of decision types could provide deeper insights into how video review technology shapes officiating in sports.
Bibliography
Akdağ, Eren, Ali Işın, Alberto Lorenzo Calvo, Enrique Alonso Pérez Chao, and Sergio L. Jiménez Sáiz. “Evaluating the Impact of Video Assistant Referee Implementation in Football: A Four-Season Analysis of Match Performance Trends.” Applied Sciences 15, no. 9 (2025): 4789.https://doi.org/10.3390/app15094789.
Brown, Jack. “The Impact of Video Assistant Referee (VAR) on the English Premier League.” 2024.https://doi.org/10.13140/RG.2.2.11917.96480.
chimichangaboii. “Debate: What do you think about VAR being used in games.” Reddit Post. R/Football, April 19, 2023.https://www.reddit.com/r/football/comments/12s5u5p/debate_what_do_you_think_about_var_being_used_in/.
Goodwin, Paul. “The Impact of VAR, One Year On….” Scottish Football Supporters Association - SFSA, December 10, 2023.https://scottishfsa.org/the-impact-of-var-one-year-on/.
Hope, Mike Rogerson, Daniel Knight,Reinhold Scherer,Ben Jones,Chris McManus,Sally Waterworth,Kelly Murray,Ed. “Meta-Analysis of the Effects of VAR on Goals Scored and Home Advantage in Football - Mike Rogerson, Daniel Knight, Reinhold Scherer, Ben Jones, Chris McManus, Sally Waterworth, Kelly Murray, Ed Hope, 2026.” Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology, April 8, 2024.https://journals.sagepub.com/doi/10.1177/17543371241242914.
Kuvvetli, Ümit, Esin Firuzan, and Ali Riza Firuzan. “Is Video Assistant Referee (VAR) a Disadvantage for the Strong and a Protection for the Weak? The Case of Turkish Super League.” Psychology of Sport and Exercise 80 (September 2025): 102924.https://doi.org/10.1016/j.psychsport.2025.102924.
The Evolution of VAR and Its Influence on Formative Football. ENG-BLOG. October 3, 2025.https://soccerinteraction.com/evolution-var-influence-football.
Zhang, Yeqin, Danyang Li, Miguel-Ángel Gómez-Ruano, Daniel Memmert, Chunman Li, and Ming Fu. “The Effect of the Video Assistant Referee (VAR) on Referees’ Decisions at FIFA Women’s World Cups.” Frontiers in Psychology 13 (August 2022): 984367.https://doi.org/10.3389/fpsyg.2022.984367.
Appendix A: Penalties (Parallel Trends Violated)
Penalties conceded show evidence of pre-existing trends across leagues (joint Wald test p = 0.020), violating the parallel trends assumption required for causal inference. We present results for completeness but caution that standard difference-in-differences estimates are likely biased and should not be interpreted as causal effects. The Callaway and Sant’Anna estimator was not applied to penalties given the clear parallel trends violation.
pk_model1 pk_model2
Dependent Var.: PKcon PKcon
VAR Adoption 0.3307 (0.2402) 0.2984 (0.1765)
Possession % -0.0568. (0.0263) -0.0383 (0.0267)
Fouls Committed 0.0040 (0.0034)
Yellow Cards 0.0183. (0.0082)
Red Cards 0.0500* (0.0116)
Fixed-Effects: ----------------- ----------------
Team Yes Yes
Season Yes Yes
_______________ _________________ ________________
S.E.: Clustered by: Comp by: Comp
Observations 857 857
R2 0.34770 0.36323
Within R2 0.00609 0.02975
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The observed post-VAR coefficients may reflect either: (1) a causal effect of VAR on penalty decisions, (2) continuation of pre-existing league-specific trends, or (3) other concurrent changes in penalty enforcement. Without credible parallel trends, we cannot disentangle these explanations.