The dummy variable entered into R is:
Home <- c(0,1,1,0,1,1,0,0,0,0,1,1,0,1,0,1)
Home has 16 values. A value of 1 means the game was at
home, and a value of 0 means the game was away.
The outcome variable entered into R is:
Win <- c(0,0,1,1,1,1,0,0,0,0,0,1,1,1,0,1)
Win also has 16 values. A value of 1 means the team won,
and a value of 0 means the team lost.
The opponent winning percentage variable entered into R is:
Opp_win_pct <- c(0.8,0.5,0.4,0.7,0.5,0.4,0.7,0.8,0.8,0.3,0.2,0.4,0.6,0.5,0.6,0.2)
Opp_win_pct measures opponent strength. It is a regular
numeric variable, not a dummy variable. The higher the number, the
stronger the opponent.
| Variable | What It Measures | How To Read It | Main Summary |
|---|---|---|---|
| Home | Game location | 1 = home, 0 = away | 8 home games, 8 away games |
| Win | Game result | 1 = win, 0 = loss | 8 wins, 8 losses |
| Opp_win_pct | Opponent strength | Higher number = tougher opponent | Average opponent win percentage was 0.525 |
These variables are simple, but they tell the basic story.
Home tells us where the game was played, Win
tells us the result, and Opp_win_pct gives us a way to
account for opponent strength.
| Variable | Min | 1st Qu. | Median | Mean | 3rd Qu. | Max |
|---|---|---|---|---|---|---|
| Home | 0.0 | 0.0 | 0.5 | 0.500 | 1.0 | 1.0 |
| Win | 0.0 | 0.0 | 0.5 | 0.500 | 1.0 | 1.0 |
| Opp_win_pct | 0.2 | 0.4 | 0.5 | 0.525 | 0.7 | 0.8 |
The sample is balanced by location and result. There were 8 home games and 8 away games, along with 8 wins and 8 losses. That gives the team an overall winning percentage of 0.500.
The average opponent winning percentage was 0.525. So overall, the team played a slightly above-average schedule.
| Location | Games | Wins | Losses | Win % | Avg Opp Win % |
|---|---|---|---|---|---|
| Away | 8 | 2 | 6 | 0.25 | 0.662 |
| Home | 8 | 6 | 2 | 0.75 | 0.388 |
The home and away split is where the data starts to get more interesting. The team went 6-2 at home and 2-6 away. That comes out to a 0.750 winning percentage at home and a 0.250 winning percentage away.
At first, that makes home field look like a big factor. The team was 50 percentage points better at home than away.
The issue is that the schedule was not even. The average home opponent had a 0.388 winning percentage, while the average road opponent had a 0.663 winning percentage. So the team won more at home, but it also played easier opponents at home.
The visual shows the same pattern. The team won more at home, while the home schedule was also easier. That makes it hard to give home field all the credit.
| Relationship | Correlation |
|---|---|
| Home and Win | 0.500 |
| Opponent Win % and Win | -0.320 |
| Home and Opponent Win % | -0.704 |
The correlations add more context. The correlation between Home and Win was 0.500, which means home games had a positive relationship with winning in this sample. That matches the home/away split: the team went 6-2 at home and 2-6 away.
The correlation between Opp_win_pct and Win was -0.320. That means stronger opponents were generally connected with fewer wins in the raw data, which makes sense.
The bigger issue is the relationship between Home and Opp_win_pct. That correlation was -0.704, which means the home games tended to come against weaker opponents. So even though the team won more at home, the schedule was also easier at home. That is why the home-field result needs more context.
For Part 2, the regression model is:
Win = β0 + β1(Home) + β2(Opp_win_pct) + error
This model looks at home field while also controlling for opponent strength. That matters here because the home games were generally easier than the road games.
Since Win is coded as 0 or 1, this is basically a linear
probability model. The coefficients can be read as changes in predicted
win probability. It is not perfect with only 16 games, but it works for
a simple regression setup.
| Term | Estimate | Std. Error | t-stat | p-value |
|---|---|---|---|---|
| Intercept | 0.142 | 0.598 | 0.238 | 0.816 |
| Home | 0.545 | 0.338 | 1.612 | 0.131 |
| Opponent Win % | 0.163 | 0.865 | 0.188 | 0.854 |
The coefficient for Home is 0.545.
Holding opponent winning percentage constant, home games were associated
with about a 54.5 percentage point higher predicted chance of winning
compared to away games.
The coefficient for Opp_win_pct is
0.163. That is a little odd because the raw
relationship between opponent strength and winning was negative. I would
not read too much into that because this is only 16 games, and home
field and opponent strength are tied together pretty closely.
The intercept is 0.142. That is not the main takeaway because it represents an away game against an opponent with a 0.000 winning percentage, which is not a realistic comparison.
The p-values are also worth noting. The Home coefficient
is positive, but its p-value is 0.131, so it is not
statistically significant at the 0.05 level. Opp_win_pct
has a p-value of 0.854, so that one is not
statistically significant either.
That does not make the model useless. It just means I would treat this as a small-sample result, not proof that home field caused the team to win more.
| R-squared | Adjusted R-squared | F-statistic | p-value |
|---|---|---|---|
| 0.252 | 0.137 | 2.19 | 0.151 |
The R-squared is 0.252.
That means Home and Opp_win_pct explain
about 25.2% of the variation in wins. That is useful,
but it also leaves a lot unexplained. That makes sense because winning
depends on more than just location and opponent winning percentage.
The adjusted R-squared is 0.137, which is lower because the model has two predictors and only 16 observations.
The main takeaway is that home field looked important in this sample, but it was not the whole story.
The team went 6-2 at home and 2-6 away, so the raw split makes home
field look pretty strong. After controlling for opponent winning
percentage, the Home coefficient was still positive at
0.545.
The opponent strength result was messier. Stronger opponents were
connected with fewer wins in the raw data, but the regression
coefficient for Opp_win_pct turned slightly positive. I
would not take that as proof that tougher opponents made winning easier.
To me, it mostly shows that the sample was small and the schedule was
uneven.
The p-values also keep the results in perspective. Neither
Home nor Opp_win_pct was statistically
significant at the 0.05 level, so I would not overstate the model.
Overall, home field was associated with winning, but I would not give home field all the credit. The team also played weaker opponents at home and stronger opponents away. The R-squared was 0.252, so the model explains part of winning, but plenty is still left out.