2025-06-03

Simple Linear Regression

`geom_smooth()` using formula = 'y ~ x'

What is Regression Analysis?

  • A method for modeling relationships between variables

  • Possible to predict a variable based on one or more other variables

Simple Linear Regression

  • A method to understand the relationship between two variables

  • A predictive (x-axis) variable and a response variable (y-axis)

Below we have a table containing information about orange trees

age circumference
118 30
484 58
664 87
1004 115
1231 120
1372 142

Let’s look at a visualization of the data

The figure shows a scatter plot of age vs. circumference

Performing the analysis

  • We can calculate how strongly the variables are related using a correlation function:

cor(Orange$age, Orange$circumference)
[1] 0.9135189

The result is close to +1, meaning there is a positive correlation between the two variables

  • We can fit a linear regression to the data:

LinMod <- lm(Orange$circumference ~ Orange$age)
  • We will use this when we add a regression line to our scatter plot and it will allow us to predict values that are outside of the data range

The simple linear regression model equation

\[Y={\beta_0 + \beta_1 \cdot X + \epsilon}\] \(\cdot\) Y is our response variable

\(\cdot\) X is our predictive variable

\(\cdot\) \(\beta_0\) is our intercept

\(\cdot\) \(\beta_1\) is our slope

\(\cdot\) \(\epsilon\) is our error term

We can view a summary of the linear model:

  • The value in red is the intercept

  • The value in blue is the slope

Our equation then becomes

\[Y={17.4 + 0.11 \cdot X + \epsilon}\] Which we can use to calculate values outside of the data range

Here is our linear model fitted to the plot

`geom_smooth()` using formula = 'y ~ x'