Part 1: Input the Data and Summarize Each Variable

1. Enter the following dummy variable in R

The dummy variable entered into R is:

Home <- c(0,1,1,0,1,1,0,0,0,0,1,1,0,1,0,1)

Home has 16 values. A value of 1 means the game was at home, and a value of 0 means the game was away.

2. Enter the following outcome variable in R

The outcome variable entered into R is:

Win <- c(0,0,1,1,1,1,0,0,0,0,0,1,1,1,0,1)

Win also has 16 values. A value of 1 means the team won, and a value of 0 means the team lost.

3. Enter the opponent winning percentage variable in R

The opponent winning percentage variable entered into R is:

Opp_win_pct <- c(0.8,0.5,0.4,0.7,0.5,0.4,0.7,0.8,0.8,0.3,0.2,0.4,0.6,0.5,0.6,0.2)

Opp_win_pct measures opponent strength. It is a regular numeric variable, not a dummy variable. The higher the number, the stronger the opponent.

4. Run this code and then summarize each variable

Variable What It Measures How To Read It Main Summary
Home Game location 1 = home, 0 = away 8 home games, 8 away games
Win Game result 1 = win, 0 = loss 8 wins, 8 losses
Opp_win_pct Opponent strength Higher number = tougher opponent Average opponent win percentage was 0.525

These variables are simple, but they tell the basic story. Home tells us where the game was played, Win tells us the result, and Opp_win_pct gives us a way to account for opponent strength.

Variable Min 1st Qu. Median Mean 3rd Qu. Max
Home 0.0 0.0 0.5 0.500 1.0 1.0
Win 0.0 0.0 0.5 0.500 1.0 1.0
Opp_win_pct 0.2 0.4 0.5 0.525 0.7 0.8

The sample is balanced by location and result. There were 8 home games and 8 away games, along with 8 wins and 8 losses. That gives the team an overall winning percentage of 0.500.

The average opponent winning percentage was 0.525. So overall, the team played a slightly above-average schedule.

Location Games Wins Losses Win % Avg Opp Win %
Away 8 2 6 0.25 0.662
Home 8 6 2 0.75 0.388

The home and away split is where the data starts to get more interesting. The team went 6-2 at home and 2-6 away. That comes out to a 0.750 winning percentage at home and a 0.250 winning percentage away.

At first, that makes home field look like a big factor. The team was 50 percentage points better at home than away.

The issue is that the schedule was not even. The average home opponent had a 0.388 winning percentage, while the average road opponent had a 0.663 winning percentage. So the team won more at home, but it also played easier opponents at home.

The visual shows the same pattern. The team won more at home, while the home schedule was also easier. That makes it hard to give home field all the credit.

Relationship Correlation
Home and Win 0.500
Opponent Win % and Win -0.320
Home and Opponent Win % -0.704

The correlations add more context. The correlation between Home and Win was 0.500, which means home games had a positive relationship with winning in this sample. That matches the home/away split: the team went 6-2 at home and 2-6 away.

The correlation between Opp_win_pct and Win was -0.320. That means stronger opponents were generally connected with fewer wins in the raw data, which makes sense.

The bigger issue is the relationship between Home and Opp_win_pct. That correlation was -0.704, which means the home games tended to come against weaker opponents. So even though the team won more at home, the schedule was also easier at home. That is why the home-field result needs more context.

Part 2: Regression

1. Regress Win on Home and Opp_win_pct

For Part 2, the regression model is:

Win = β0 + β1(Home) + β2(Opp_win_pct) + error

This model looks at home field while also controlling for opponent strength. That matters here because the home games were generally easier than the road games.

Since Win is coded as 0 or 1, this is basically a linear probability model. The coefficients can be read as changes in predicted win probability. It is not perfect with only 16 games, but it works for a simple regression setup.

2. Report the results

Term Estimate Std. Error t-stat p-value
Intercept 0.142 0.598 0.238 0.816
Home 0.545 0.338 1.612 0.131
Opponent Win % 0.163 0.865 0.188 0.854

The coefficient for Home is 0.545. Holding opponent winning percentage constant, home games were associated with about a 54.5 percentage point higher predicted chance of winning compared to away games.

The coefficient for Opp_win_pct is 0.163. That is a little odd because the raw relationship between opponent strength and winning was negative. I would not read too much into that because this is only 16 games, and home field and opponent strength are tied together pretty closely.

The intercept is 0.142. That is not the main takeaway because it represents an away game against an opponent with a 0.000 winning percentage, which is not a realistic comparison.

The p-values are also worth noting. The Home coefficient is positive, but its p-value is 0.131, so it is not statistically significant at the 0.05 level. Opp_win_pct has a p-value of 0.854, so that one is not statistically significant either.

That does not make the model useless. It just means I would treat this as a small-sample result, not proof that home field caused the team to win more.

3. What is the R-squared?

R-squared Adjusted R-squared F-statistic p-value
0.252 0.137 2.19 0.151

The R-squared is 0.252.

That means Home and Opp_win_pct explain about 25.2% of the variation in wins. That is useful, but it also leaves a lot unexplained. That makes sense because winning depends on more than just location and opponent winning percentage.

The adjusted R-squared is 0.137, which is lower because the model has two predictors and only 16 observations.

Summary

The main takeaway is that home field looked important in this sample, but it was not the whole story.

The team went 6-2 at home and 2-6 away, so the raw split makes home field look pretty strong. After controlling for opponent winning percentage, the Home coefficient was still positive at 0.545.

The opponent strength result was messier. Stronger opponents were connected with fewer wins in the raw data, but the regression coefficient for Opp_win_pct turned slightly positive. I would not take that as proof that tougher opponents made winning easier. To me, it mostly shows that the sample was small and the schedule was uneven.

The p-values also keep the results in perspective. Neither Home nor Opp_win_pct was statistically significant at the 0.05 level, so I would not overstate the model.

Overall, home field was associated with winning, but I would not give home field all the credit. The team also played weaker opponents at home and stronger opponents away. The R-squared was 0.252, so the model explains part of winning, but plenty is still left out.