Exercise 10

#1: Relationships of Interest

The main issue we are trying to solve is whether the team won more because it was home, or whether the home record looks better because the schedule was easier. The team went 6-2 at home and 2-6 away, so home field looks strong. The catch is that the home opponents were weaker, so opponent strength has to be part of the answer too.

Exercise 9 model: Win = β0 + β1(Home) + β2(Opp_win_pct) + error

Since Win is coded as 1 for a win and 0 for a loss, I read this as a linear probability model. The coefficients show how win likelihood changes based on location and opponent strength.

Home/Away Summary
Group Games Wins Losses Win % Avg Opp Win %
Home Games 8 6 2 0.75 0.388
Away Games 8 2 6 0.25 0.662
Overall 16 8 8 0.50 0.525

The raw split is clear. The team had a 75% win rate at home and a 25% win rate away. The schedule was not even, though. The average home opponent had a win percentage of 0.388, while the average away opponent was at 0.662.

Relationships From Exercise 9
Relationship Correlation What It Means
Home and Win 0.500 Home games lined up with more wins.
Opponent Win % and Win -0.320 Tougher opponents lined up with fewer wins.
Home and Opponent Win % -0.704 Home games were usually against easier opponents.

The correlations say the same thing. Home and Win were positively related, while Opp_win_pct and Win were negatively related. The biggest issue is that Home and Opp_win_pct had a correlation of -0.704, meaning the home games were usually the easier games. That is why the regression matters.

#2: Home Field and Win Likelihood

Exercise 10 Linear Probability Model Results
Variable Estimate Std. Error t-stat p-value Practical Meaning
Home 0.545 0.338 1.612 0.131 Home vs away: +54.5 percentage points
Opponent Win % 0.163 0.865 0.188 0.854 0.100 increase: +1.6 percentage points

The Home coefficient was 0.545. Holding opponent strength constant, being home was associated with about a 54.5 percentage point increase in predicted win likelihood.

That is a big estimate, but I would not call it proof. The p-value was 0.131, which is above 0.05. Home field looked helpful, but it was not statistically significant.

#3: Opponent Strength and Win Likelihood

The opponent strength result is where the model gets messy. The Opp_win_pct coefficient was 0.163, so a 0.100 increase in opponent winning percentage was associated with about a 1.6 percentage point change in predicted win likelihood.

Stronger opponents should usually make winning harder, and the raw correlation of -0.32 pointed that way. Since the p-value was 0.854, I would not read much into the positive coefficient. To me, this mostly shows how small and uneven the sample is.

Linear Model Fit
R-squared Adjusted R-squared F-statistic Model p-value
0.252 0.137 2.19 0.151

The R-squared was 0.252, so the model explained about 25.2% of the variation in wins. That helps, but plenty is still left out.


Exercise 11

#1: Logistic Regression Model

For Exercise 11, I used the same variables but switched to logistic regression:

logit(P(Win = 1)) = β0 + β1(Home) + β2(Opp_win_pct)

This fits better because Win is binary. The team either won or lost, so logistic regression makes more sense than a regular linear model.

Exercise 11 Logistic Regression Results
Variable Logit Coefficient Odds Ratio p-value
Home 2.45 11.584 0.150
Opponent Win % 0.89 2.435 0.835

For Home, the logit coefficient was 2.45, the odds ratio was 11.584, and the p-value was 0.15. The direction stayed positive, but it was not statistically significant.

For Opp_win_pct, the logit coefficient was 0.89, and the p-value was 0.835. For a 0.100 increase in opponent winning percentage, the odds ratio was about 1.093. That was not statistically significant either.

Predicted Probabilities

Logit coefficients are hard to read, so predicted probabilities are more useful here.

Predicted Win Probability at Average Opponent Strength
Scenario Opponent Win % Predicted Win Probability
Away game vs average opponent 0.525 22.7%
Home game vs average opponent 0.525 77.3%

At the average opponent winning percentage of 0.525, the model predicted a 22.7% win probability for an away game and a 77.3% win probability for a home game. That is a difference of about 54.6 percentage points.

Predicted Probabilities With a 0.100 Increase in Opponent Win %
Scenario Opponent Win % Predicted Win Probability
Away game, average opponent 0.525 22.7%
Away game, opponent win % + 0.100 0.625 24.3%
Home game, average opponent 0.525 77.3%
Home game, opponent win % + 0.100 0.625 78.8%

A 0.100 increase in opponent winning percentage changed predicted win probability by about 1.6 percentage points for an away game and about 1.5 percentage points for a home game. The direction is still slightly positive, but I would not take that literally with this sample.

The visual matches the tables. The home line is above the away line, so the model still favors home field. The opponent strength slope is not something I would trust much with only 16 games.

#2: Significance Comparison

Linear Model vs. Logistic Model Significance
Variable Linear p-value Logistic p-value Significant in Linear Model? Significant in Logistic Model?
Home 0.131 0.150 No No
Opponent Win % 0.854 0.835 No No

The significance results did not really change. In the linear model, neither Home nor Opp_win_pct was statistically significant at the 0.05 level. In the logistic model, neither variable was statistically significant either.

So the model changed, but the conclusion did not. Home field stayed positive, opponent strength stayed unclear, and neither variable was significant in either model.

Summary

Home field looked like the stronger variable here. The team went 6-2 at home and 2-6 away, and both models gave home field a positive effect.

I would not go too far with it, though. It is only 16 games, and the home games came against easier opponents. So my takeaway is pretty simple: home field looked helpful, opponent strength was still messy, and switching to logistic regression did not really change the results.