Simple Linear Regression

2025-03-13

Why Use Simple Linear Regression?

A linear regression attempts to see if two variables have a linear correlation, this means when one increases or decreases the other responds in a linear fashion.

if two variables have a significant enough linear correlation, we can build a predictive model to estimate values of certain unknown inputs.

Linear Regression in R

to initiate a linear regression in RStudio we use the following command

model = lm(Petal.Length ~ Petal.Width, data = iris)

this sets up a linear model for Petal Width vs Petal Length.

Lets start by plotting the two on the next page.

Petal Width vs. Petal Length Plotted

ggplot(data = iris, aes(x = Petal.Width,
y = Petal.Length, color = Species)) + geom_point()

Visual Analysis

We can see from looking at this plot visually that their is likely a high linear correlation. I say this because it seems all the points a clustered together and are moving in a monotonic fashion.

Once again this is because they are all increasing at a relatively constant (linear) rate.

Linear Model

## 
## Call:
## lm(formula = Petal.Length ~ Petal.Width, data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.33542 -0.30347 -0.02955  0.25776  1.39453 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.08356    0.07297   14.85   <2e-16 ***
## Petal.Width  2.22994    0.05140   43.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4782 on 148 degrees of freedom
## Multiple R-squared:  0.9271, Adjusted R-squared:  0.9266 
## F-statistic:  1882 on 1 and 148 DF,  p-value: < 2.2e-16

Important Model Take Aways

While all of the information given to us in our linear model is important, when doing a regression we typically care about a few specific values, our coefficients and our multiple R-squared.

Our coefficients give us what we need in order to plot are linear model, in this example we are given the weight of our input (Petal.Width) and our bias, which is the constant of our linear equation.

So our models equaion should be

\[ y = 2.22994x + 1.08356 \]

where x is the inputted Petal.Width

Our Linear Model Plotted

x = seq(from=1, to=7, by = .01)
y = 2.22994*x+1.08356
ggplot(data = NULL, aes(x = x, y = y)) + geom_point()

Multiple R-squared

So as we can see in the previous slide we have developed a linear equation based off of our linear model, but how do we know if its any good. Well other than looking at the model compared to our points, we can use the Multiple R-squared to determine the accuracy of our model.

multiple R-squared will be a value between 0 and 1, 0 being no linear relation and our model is useless, and 1 being a perfect relation and our model is flawless. But we typically look for an R-squared over 0.7, this indicates a fairly good model. Multiple R-squared can be calculated as follows.

\[ R^2 = 1 - \frac { \sum_{i}(y_i - \hat{y}_i)^2 }{ \sum_{i}(y_i - \bar{y})^2 } \]

Multiple R-squared Continued

The previous equation is defined where \(y_i\) are the real values, \(\hat{y}_i\) are the estimated value from our model, and \(\bar{y}\) is the mean value.

Our R-Squared was .9271 which tells us it should be a very good fit.

Now we can put everything together to view our model over the points to see if it looks as good as our R squared says it should.

Putting it all together

Conclusion

So as we can see, we created a fairly well fitting linear model to estimate Petal Length when given a Petal Width.

Thank you