2024-09-22
Simple linear regression is a method to model the relationship between a dependent variable and an independent variable. The goal is to find the best-fitting straight line that can predict the dependent variable based on the independent variable.
We’ll use an example to demonstrate how this works.
Imagine we want to predict the number of goals a soccer team will score based on their average shots per game. In this case: - The independent variable is the average shots per game. - The dependent variable is the number of goals scored.
We will fit a regression line to the data and see how well the average shots per game predict the number of goals scored.
Let’s start by plotting the relationship between average shots per game and goals scored. This will help us see if there’s a pattern we can model with Simple Linear Regression.
The red line in the plot is the regression line. This line shows the best fit for the relationship between shots per game and goals scored.
Next, we’ll calculate the equation for this regression line.
To calculate the equation of the regression line, we use the lm() function in R, which stands for linear model.
The equation of the line will be in the form of y = mx + b, where: - m is the slope (how much goals increase for each shot). - b is the y-intercept (the starting value when shots are zero).
The output below shows the detailed results of the linear regression model:
## ## Call: ## lm(formula = goals_scored ~ shots_per_game, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.35622 -0.14587 -0.03618 0.19411 0.28386 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.40445 0.20308 -1.992 0.0816 . ## shots_per_game 0.32004 0.01556 20.572 3.26e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.2284 on 8 degrees of freedom ## Multiple R-squared: 0.9814, Adjusted R-squared: 0.9791 ## F-statistic: 423.2 on 1 and 8 DF, p-value: 3.264e-08
The linear regression analysis shows the following key points:
In conclusion, more shots per game generally lead to more goals scored.
The equation for the regression line is given by:
\[ y = mx + b \]
Where: - \(m\) is the slope (0.32 in our case). - \(b\) is the intercept (-0.404 in our case).
Here is the R code used to calculate the regression line:
# Calculate the linear regression model model <- lm(goals_scored ~ shots_per_game, data = data) # Display the model summary summary(model)