Introduction

In this project, we explore preferred foot (right vs. left) as a binary variable to see how it relates to player skills and physical attributes. Using logistic regression, we’ll interpret the effects of key attributes, build a confidence interval for one coefficient, and discuss the insights gained along with potential follow-up questions.

Why Preferred foot ?

This column can be converted into binary value easily Left Foot = 0, Right Foot = 1. Preferred foot can significantly influence a player’s performance in certain positions or play styles, which might correlate with overall ratings, specific skill metrics (like crossing or finishing), or market value.

Logistic Regression Model for Preferred foot

#Encoding the values as 0 and 1.

player_data <- player_data |>
  mutate(preferred_foot_binary = ifelse(preferred_foot == "Right", 1, 0))

# Select the explanatory variables
explanatory_vars <- c("agility", "ball_control", "crossing", "dribbling")


model <- glm(preferred_foot_binary ~ agility + ball_control + crossing + dribbling,
             data = player_data, family = binomial(link = "logit"))


summary(model)
## 
## Call:
## glm(formula = preferred_foot_binary ~ agility + ball_control + 
##     crossing + dribbling, family = binomial(link = "logit"), 
##     data = player_data)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.997097   0.099173  20.137   <2e-16 ***
## agility       0.005034   0.002010   2.505   0.0122 *  
## ball_control  0.030941   0.003536   8.750   <2e-16 ***
## crossing     -0.054515   0.002186 -24.938   <2e-16 ***
## dribbling    -0.002077   0.003434  -0.605   0.5452    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 19469  on 17953  degrees of freedom
## Residual deviance: 18417  on 17949  degrees of freedom
## AIC: 18427
## 
## Number of Fisher Scoring iterations: 4
exp(coef(model))
##  (Intercept)      agility ball_control     crossing    dribbling 
##    7.3676334    1.0050472    1.0314251    0.9469442    0.9979247

Explaining the co-efficients

Intercept: 1.997097 Odds Ratio: exp(1.9971) ≈ 7.37 This suggests that at baseline, odds are a player is 7.37 times more likely to be right-footed than left-footed.

Agility Odds Ratio: exp(0.0050) ≈ 1.005 Each point increase in agility raises the odds of being right footed by 0.5%

Ball Control: 0.0309 Odds Ratio: exp(0.0309) ≈ 1.031 For each additional point in ball control, the odds of being right-footed increase by approximately 3.1%, indicating that ball control is positively associated with right-foot preference.

Crossing: -0.0545 Odds Ratio: exp(-0.0545) ≈ 0.947 Each additional point in crossing skill slightly decreases the odds of being right-footed by about 5.3%, suggesting that players with higher crossing ability may be more likely to prefer their left foot.

Dribbling: -0.0021 Odds Ratio: exp(-0.0021) ≈ 0.998 With an odds ratio near 1, each point increase in dribbling has no meaningful impact on the odds of being right-footed, indicating that dribbling skill does not significantly influence foot preference.

P - Value

Ball control and Crossing have very small p-values < 0.05 indicating that they are having a significant effect on Preferred Foot. Whereas Agility and Dribbling have a p-value > 0.05 especially dribbling with a p-value of 0.5452 so they have an influence on whether a player is Right-Footed.

Plotting the model using Agility

player_data$predicted_prob <- predict(model, type = "response")

# Create a plot
ggplot(player_data, aes(x = agility, y = predicted_prob)) +
  geom_point(aes(color = preferred_foot), alpha = 0.5) +  # Original data points
  geom_line(stat = "smooth", method = "glm", method.args = list(family = "binomial"), color = "blue") +  # Logistic regression curve
  labs(title = "Logistic Regression: Predicted Probability of Being Right-Footed",
       x = "Agility Score",
       y = "Predicted Probability of Being Right-Footed") +
  scale_color_manual(values = c("lightblue", "lightcoral"), 
                     name = "Preferred Foot", 
                     labels = c("Left-Footed", "Right-Footed")) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

### Interpretation

As Agility score increases it’s most probable that the player is Left footed.

Confidence Interval and Standard error for Agility

ggplot(player_data, aes(x = preferred_foot, y = agility, fill = preferred_foot)) +
  geom_boxplot() +
  labs(title = "Distribution of Agility by Foot Preference",
       x = "Foot Preference",
       y = "Agility Score") +
  theme_minimal() +
  scale_fill_manual(values = c("lightblue", "lightcoral"))

coef_agility <- coef(model)["agility"]
se_agility <- summary(model)$coefficients["agility", "Std. Error"]


# Calculate the 95% Confidence Interval for agility coefficient
ci_lower <- coef_agility - 1.96 * se_agility
ci_upper <- coef_agility + 1.96 * se_agility

# Exponentiation of the CI to interpret it as an odds ratio range
ci_odds_ratio <- exp(c(ci_lower, ci_upper))

# Print results
cat("Odds Ratio for Agility:", exp(coef_agility), "\n")
## Odds Ratio for Agility: 1.005047
cat("95% Confidence Interval for Odds Ratio of Agility:", ci_odds_ratio, "\n")
## 95% Confidence Interval for Odds Ratio of Agility: 1.001096 1.009014

Interpretation

For each one-point increase in agility, the odds of a player being right-footed increase by approximately 0.5%.

CI ranges from 1.0011 to 1.0090 meaning range of values that we are 95% confident contains the true odds ratio of agility in the population from which the sample was drawn.

Since it’s positive, that means agility is positively associated with being right-footed meaning as the player has higher agility they are more likely to be Right footed.

Further questions

How does these findings vary across different nationalities ?.

Does it have the same impact on players with high rating as well as players with low rating ?

THE END