In this project, we explore preferred foot (right vs. left) as a binary variable to see how it relates to player skills and physical attributes. Using logistic regression, we’ll interpret the effects of key attributes, build a confidence interval for one coefficient, and discuss the insights gained along with potential follow-up questions.
This column can be converted into binary value easily Left Foot = 0, Right Foot = 1. Preferred foot can significantly influence a player’s performance in certain positions or play styles, which might correlate with overall ratings, specific skill metrics (like crossing or finishing), or market value.
#Encoding the values as 0 and 1.
player_data <- player_data |>
mutate(preferred_foot_binary = ifelse(preferred_foot == "Right", 1, 0))
# Select the explanatory variables
explanatory_vars <- c("agility", "ball_control", "crossing", "dribbling")
model <- glm(preferred_foot_binary ~ agility + ball_control + crossing + dribbling,
data = player_data, family = binomial(link = "logit"))
summary(model)
##
## Call:
## glm(formula = preferred_foot_binary ~ agility + ball_control +
## crossing + dribbling, family = binomial(link = "logit"),
## data = player_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.997097 0.099173 20.137 <2e-16 ***
## agility 0.005034 0.002010 2.505 0.0122 *
## ball_control 0.030941 0.003536 8.750 <2e-16 ***
## crossing -0.054515 0.002186 -24.938 <2e-16 ***
## dribbling -0.002077 0.003434 -0.605 0.5452
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 19469 on 17953 degrees of freedom
## Residual deviance: 18417 on 17949 degrees of freedom
## AIC: 18427
##
## Number of Fisher Scoring iterations: 4
exp(coef(model))
## (Intercept) agility ball_control crossing dribbling
## 7.3676334 1.0050472 1.0314251 0.9469442 0.9979247
Intercept: 1.997097 Odds Ratio: exp(1.9971) ≈ 7.37 This suggests that at baseline, odds are a player is 7.37 times more likely to be right-footed than left-footed.
Agility Odds Ratio: exp(0.0050) ≈ 1.005 Each point increase in agility raises the odds of being right footed by 0.5%
Ball Control: 0.0309 Odds Ratio: exp(0.0309) ≈ 1.031 For each additional point in ball control, the odds of being right-footed increase by approximately 3.1%, indicating that ball control is positively associated with right-foot preference.
Crossing: -0.0545 Odds Ratio: exp(-0.0545) ≈ 0.947 Each additional point in crossing skill slightly decreases the odds of being right-footed by about 5.3%, suggesting that players with higher crossing ability may be more likely to prefer their left foot.
Dribbling: -0.0021 Odds Ratio: exp(-0.0021) ≈ 0.998 With an odds ratio near 1, each point increase in dribbling has no meaningful impact on the odds of being right-footed, indicating that dribbling skill does not significantly influence foot preference.
Ball control and Crossing have very small p-values < 0.05 indicating that they are having a significant effect on Preferred Foot. Whereas Agility and Dribbling have a p-value > 0.05 especially dribbling with a p-value of 0.5452 so they have an influence on whether a player is Right-Footed.
player_data$predicted_prob <- predict(model, type = "response")
# Create a plot
ggplot(player_data, aes(x = agility, y = predicted_prob)) +
geom_point(aes(color = preferred_foot), alpha = 0.5) + # Original data points
geom_line(stat = "smooth", method = "glm", method.args = list(family = "binomial"), color = "blue") + # Logistic regression curve
labs(title = "Logistic Regression: Predicted Probability of Being Right-Footed",
x = "Agility Score",
y = "Predicted Probability of Being Right-Footed") +
scale_color_manual(values = c("lightblue", "lightcoral"),
name = "Preferred Foot",
labels = c("Left-Footed", "Right-Footed")) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
### Interpretation
As Agility score increases it’s most probable that the player is Left footed.
ggplot(player_data, aes(x = preferred_foot, y = agility, fill = preferred_foot)) +
geom_boxplot() +
labs(title = "Distribution of Agility by Foot Preference",
x = "Foot Preference",
y = "Agility Score") +
theme_minimal() +
scale_fill_manual(values = c("lightblue", "lightcoral"))
coef_agility <- coef(model)["agility"]
se_agility <- summary(model)$coefficients["agility", "Std. Error"]
# Calculate the 95% Confidence Interval for agility coefficient
ci_lower <- coef_agility - 1.96 * se_agility
ci_upper <- coef_agility + 1.96 * se_agility
# Exponentiation of the CI to interpret it as an odds ratio range
ci_odds_ratio <- exp(c(ci_lower, ci_upper))
# Print results
cat("Odds Ratio for Agility:", exp(coef_agility), "\n")
## Odds Ratio for Agility: 1.005047
cat("95% Confidence Interval for Odds Ratio of Agility:", ci_odds_ratio, "\n")
## 95% Confidence Interval for Odds Ratio of Agility: 1.001096 1.009014
For each one-point increase in agility, the odds of a player being right-footed increase by approximately 0.5%.
CI ranges from 1.0011 to 1.0090 meaning range of values that we are 95% confident contains the true odds ratio of agility in the population from which the sample was drawn.
Since it’s positive, that means agility is positively associated with being right-footed meaning as the player has higher agility they are more likely to be Right footed.
How does these findings vary across different nationalities ?.
Does it have the same impact on players with high rating as well as players with low rating ?