2025-06-08

Introduction

  • What is Simple Linear Regression (SLR)?
    • A statistical method to model the relationship between a dependent variable (\(y\)) and one or more independent variables (\(x\)).
  • Objective: Predict \(y\) using a linear equation
    \[ y = \beta_0 + \beta_1 x + \epsilon \]
    • \(\beta_0\): Intercept
    • \(\beta_1\): Slope
    • \(\epsilon\): Error term
    • In case of Multiple Linear Regressions, there are more coefficients involved but the equation stays linear.

Example Dataset

  • For this example, we will be using the iris dataset to predict Pedal Length from Sepal Length using SLR.
head(iris[, c("Sepal.Length", "Petal.Length")])
##   Sepal.Length Petal.Length
## 1          5.1          1.4
## 2          4.9          1.4
## 3          4.7          1.3
## 4          4.6          1.5
## 5          5.0          1.4
## 6          5.4          1.7

Scatter plot of iris

Interactive Scatterplot (using Plotly)

  • Since we will be plotting a 3D graph, we will add another variable Sepal.Width. The colors stay the same for the species.
plot_ly(iris, 
        x = ~Sepal.Length, 
        y = ~Sepal.Width, 
        z = ~Petal.Length,
        color = ~Species,
        colors = c("#8C1D40", "#1D8C39", "#1D398C"),  # Custom colors
        type = "scatter3d", 
        mode = "markers",
        marker = list(size = 2)) %>%
  layout(scene = list(xaxis = list(title = 'Sepal Length'),
                      yaxis = list(title = 'Sepal Width'),
                      zaxis = list(title = 'Petal Length')),
         title = "3D View of Iris Measurements")

Interactive Scatterplot (using Plotly)

Model Summary

model <- lm(Petal.Length ~ Sepal.Length, data = iris)
summary(model)
## 
## Call:
## lm(formula = Petal.Length ~ Sepal.Length, data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.47747 -0.59072 -0.00668  0.60484  2.49512 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -7.10144    0.50666  -14.02   <2e-16 ***
## Sepal.Length  1.85843    0.08586   21.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8678 on 148 degrees of freedom
## Multiple R-squared:   0.76,  Adjusted R-squared:  0.7583 
## F-statistic: 468.6 on 1 and 148 DF,  p-value: < 2.2e-16

Scatter Plot with the Line (using ggplot)

  • Based on our summary, the equation to the residual like is: \[ y = -7.10144 + 1.85843 x + \epsilon \]

Plot of the Residuals (using ggplot)

Final Thoughts

  • Simple linear regression models show us the correlation between two variables (independent and dependent).
  • Based on the model, we can make conclusions on the strength of the correlation.
  • In this case, iris has a strong correlation between sepal length and petal length due to the R statistic.
  • The R statistic, in this case 0.76, is close to +1 which means that the correlation is strong and positive.