Simple linear regression models the relationship between two variables by fitting a linear equation. Useful in many disciplines, from predicting petal length from sepal length, to stock returns from interest rates, etc.
Simple linear regression models the relationship between two variables by fitting a linear equation. Useful in many disciplines, from predicting petal length from sepal length, to stock returns from interest rates, etc.
\[ Y = \beta_0 + \beta_1 X + \varepsilon \]
Find \(\beta_0\) and \(\beta_1\) to minimize:
\[ \sum_{i=1}^n (y_i - \hat{y}_i)^2 = \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \]
This gives us the line of best fit.
data(iris) model <- lm(Petal.Length ~ Sepal.Length, data = iris) summary(model)
## ## Call: ## lm(formula = Petal.Length ~ Sepal.Length, data = iris) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.47747 -0.59072 -0.00668 0.60484 2.49512 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -7.10144 0.50666 -14.02 <2e-16 *** ## Sepal.Length 1.85843 0.08586 21.65 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.8678 on 148 degrees of freedom ## Multiple R-squared: 0.76, Adjusted R-squared: 0.7583 ## F-statistic: 468.6 on 1 and 148 DF, p-value: < 2.2e-16
We model petal length as a function of sepal length in iris flowers.
The curved pattern in the residuals suggests that the relationship between the predictors and response may not be perfectly linear, which can limit the accuracy of a linear regression model.
Simple linear regression is a useful tool for modeling and interpreting relationships between variables. While it can capture general trends, diagnostic plots like residuals vs. fitted values help reveal when the relationship may not be truly linear, as seen in the iris dataset example.