Simple Linear Regression is used for modeling the correlation between two variables.
iris where Petal.Length is used as an explanatory variable for Petal.Width.Simple Linear Regression is used for modeling the correlation between two variables.
iris where Petal.Length is used as an explanatory variable for Petal.Width.Our simple linear regression model for the given data is shown by this equation: \[\text{Petal Width} = \beta_0 + \beta_1\cdot \text{(Petal Length)} + \varepsilon; \hspace{1cm} \varepsilon \sim \mathcal{N} (0; \sigma^2)\] where \({\varepsilon}\) is the error term.
- Step 1: For each \({(x,y)}\) point, calculate \({x^2}\) and \({xy}\)
- Step 2: Sum all \({x}\), \({y}\), \({xy}\), and \({x^2}\), which gives us \({\Sigma x}\), \({\Sigma y}\), \({\Sigma xy}\), and \({\Sigma x^2}\).
- Step 3: Find \({\hat{\beta}_1}\) where \({n}\) is the number of data points:
\[\hat{\beta}_1 = {n\Sigma(xy) - \Sigma x\Sigma y\over n \Sigma(x^2)-(\Sigma x)^2}\] - Step 4: Find \({\hat{\beta}_0}\):
\[\hat{\beta}_0 = {\Sigma y - \hat{\beta}_1\Sigma x\over n }\]
A straight line at the mean of y would satisfy the second condition, but not the first.
The straight line going straight through the most number of points could reduce error terms overall but could have a disproportionate number of points above the line, meaning the first condition would not be satisfied.
The best way to meet both conditions is to use the least squares method, which calculates the differences between the actual values and the predicted values, squares them, and finds the line that keeps the sum of the differences squared to a minimum.
To run a simple regression in R, follow these commands with the variables and dataset of your choice:
model1 <- lm(dependentvar ~ independentvar, data=dataset) summary(model1)
Example using Petal Width by Petal Length:
mod <- lm(Petal.Width ~ Petal.Length, data=iris) summary(mod)
Call:
lm(formula = Petal.Width ~ Petal.Length, data = iris)
Residuals:
Min 1Q Median 3Q Max
-0.56515 -0.12358 -0.01898 0.13288 0.64272
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.363076 0.039762 -9.131 4.7e-16 ***
Petal.Length 0.415755 0.009582 43.387 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2065 on 148 degrees of freedom
Multiple R-squared: 0.9271, Adjusted R-squared: 0.9266
F-statistic: 1882 on 1 and 148 DF, p-value: < 2.2e-16
R prints out a lot of information you may or may not be looking for. Right now, we are just looking for \({\hat{\beta}_0}\) and \({\hat{\beta}_1}\). Remembering back to the model: \[\text{Petal Width} = \hat{\beta}_0 + \hat{\beta}_1\cdot \text{(Petal Length)}\] we can see that we are looking for the coefficients of the intercept and Petal Length. Those are shown in the summary (shown on the previous slide), but you can also use the code shown below.
coefficients(mod)
(Intercept) Petal.Length -0.3630755 0.4157554