The main issue we are trying to solve is whether the team won more because it was home, or whether the home record looks better because the schedule was easier. The team went 6-2 at home and 2-6 away, so home field looks strong. The catch is that the home opponents were weaker, so opponent strength has to be part of the answer too.
Exercise 9 model:
Win = β0 + β1(Home) + β2(Opp_win_pct) + error
Since Win is coded as 1 for a win and 0 for a loss, I
read this as a linear probability model. The coefficients show how win
likelihood changes based on location and opponent strength.
| Group | Games | Wins | Losses | Win % | Avg Opp Win % |
|---|---|---|---|---|---|
| Home Games | 8 | 6 | 2 | 0.75 | 0.388 |
| Away Games | 8 | 2 | 6 | 0.25 | 0.662 |
| Overall | 16 | 8 | 8 | 0.50 | 0.525 |
The raw split is clear. The team had a 75% win rate at home and a 25% win rate away. The schedule was not even, though. The average home opponent had a win percentage of 0.388, while the average away opponent was at 0.662.
| Relationship | Correlation | What It Means |
|---|---|---|
| Home and Win | 0.500 | Home games lined up with more wins. |
| Opponent Win % and Win | -0.320 | Tougher opponents lined up with fewer wins. |
| Home and Opponent Win % | -0.704 | Home games were usually against easier opponents. |
The correlations say the same thing. Home and
Win were positively related, while Opp_win_pct
and Win were negatively related. The biggest issue is that
Home and Opp_win_pct had a correlation of
-0.704, meaning the home games were usually the easier games. That is
why the regression matters.
| Variable | Estimate | Std. Error | t-stat | p-value | Practical Meaning |
|---|---|---|---|---|---|
| Home | 0.545 | 0.338 | 1.612 | 0.131 | Home vs away: +54.5 percentage points |
| Opponent Win % | 0.163 | 0.865 | 0.188 | 0.854 | 0.100 increase: +1.6 percentage points |
The Home coefficient was 0.545. Holding opponent
strength constant, being home was associated with about a 54.5
percentage point increase in predicted win likelihood.
That is a big estimate, but I would not call it proof. The p-value was 0.131, which is above 0.05. Home field looked helpful, but it was not statistically significant.
The opponent strength result is where the model gets messy. The
Opp_win_pct coefficient was 0.163, so a 0.100 increase in
opponent winning percentage was associated with about a 1.6 percentage
point change in predicted win likelihood.
Stronger opponents should usually make winning harder, and the raw correlation of -0.32 pointed that way. Since the p-value was 0.854, I would not read much into the positive coefficient. To me, this mostly shows how small and uneven the sample is.
| R-squared | Adjusted R-squared | F-statistic | Model p-value |
|---|---|---|---|
| 0.252 | 0.137 | 2.19 | 0.151 |
The R-squared was 0.252, so the model explained about 25.2% of the variation in wins. That helps, but plenty is still left out.
For Exercise 11, I used the same variables but switched to logistic regression:
logit(P(Win = 1)) = β0 + β1(Home) + β2(Opp_win_pct)
This fits better because Win is binary. The team either
won or lost, so logistic regression makes more sense than a regular
linear model.
| Variable | Logit Coefficient | Odds Ratio | p-value |
|---|---|---|---|
| Home | 2.45 | 11.584 | 0.150 |
| Opponent Win % | 0.89 | 2.435 | 0.835 |
For Home, the logit coefficient was 2.45, the odds ratio
was 11.584, and the p-value was 0.15. The direction stayed positive, but
it was not statistically significant.
For Opp_win_pct, the logit coefficient was 0.89, and the
p-value was 0.835. For a 0.100 increase in opponent winning percentage,
the odds ratio was about 1.093. That was not statistically significant
either.
Logit coefficients are hard to read, so predicted probabilities are more useful here.
| Scenario | Opponent Win % | Predicted Win Probability |
|---|---|---|
| Away game vs average opponent | 0.525 | 22.7% |
| Home game vs average opponent | 0.525 | 77.3% |
At the average opponent winning percentage of 0.525, the model predicted a 22.7% win probability for an away game and a 77.3% win probability for a home game. That is a difference of about 54.6 percentage points.
| Scenario | Opponent Win % | Predicted Win Probability |
|---|---|---|
| Away game, average opponent | 0.525 | 22.7% |
| Away game, opponent win % + 0.100 | 0.625 | 24.3% |
| Home game, average opponent | 0.525 | 77.3% |
| Home game, opponent win % + 0.100 | 0.625 | 78.8% |
A 0.100 increase in opponent winning percentage changed predicted win probability by about 1.6 percentage points for an away game and about 1.5 percentage points for a home game. The direction is still slightly positive, but I would not take that literally with this sample.
The visual matches the tables. The home line is above the away line, so the model still favors home field. The opponent strength slope is not something I would trust much with only 16 games.
| Variable | Linear p-value | Logistic p-value | Significant in Linear Model? | Significant in Logistic Model? |
|---|---|---|---|---|
| Home | 0.131 | 0.150 | No | No |
| Opponent Win % | 0.854 | 0.835 | No | No |
The significance results did not really change. In the linear model,
neither Home nor Opp_win_pct was statistically
significant at the 0.05 level. In the logistic model, neither variable
was statistically significant either.
So the model changed, but the conclusion did not. Home field stayed positive, opponent strength stayed unclear, and neither variable was significant in either model.
Home field looked like the stronger variable here. The team went 6-2 at home and 2-6 away, and both models gave home field a positive effect.
I would not go too far with it, though. It is only 16 games, and the home games came against easier opponents. So my takeaway is pretty simple: home field looked helpful, opponent strength was still messy, and switching to logistic regression did not really change the results.