2026-04-12

Simple Linear Regression

A statistical method you can use to understand the relationship of two variables, x and y.

  • X is known as the predictor variable
  • Y is known as the response variable

Why is this useful:

  • establishes trends
  • makes predictions
  • shows how variables affect each other

Data set example

The data set used for this analysis is the iris dataset.

  • X (predictor): petal length
  • Y (response): petal width

We will use linear regression to examine the relationship between these two variables.

Linear Regression Equation

The formulas below show the theoretical equation and the fitted regression equation.

Theoretical model:

\[ Y = \beta_0 + \beta_1 X + \epsilon \] - \(\beta_0\): intercept
- \(\beta_1\): slope
- \(\epsilon\): error term

Fitted model:

\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x \] This is also referred to as the prediction equation

Equation creates positive ggplot visualization

The graph shows a strong relationship between petal length and width, as indicated by the regression line.

Equation creates weak relationship

The graph shows a weak relationship between length and width because of the scattered points.

Plotly Graph

R Code Slide

## 
## Call:
## lm(formula = Petal.Width ~ Petal.Length, data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56515 -0.12358 -0.01898  0.13288  0.64272 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.363076   0.039762  -9.131  4.7e-16 ***
## Petal.Length  0.415755   0.009582  43.387  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2065 on 148 degrees of freedom
## Multiple R-squared:  0.9271, Adjusted R-squared:  0.9266 
## F-statistic:  1882 on 1 and 148 DF,  p-value: < 2.2e-16