1

What is Simple Linear Regression?

  • A statistical method to understand the relationship between two variables. Basically, you can predict a variable based on another variable.
  • For example, linear regression can help us predict a planet’s temperature based on its distance from the sun.

Variables:

  • Dependent Variable: What we’re trying to predict (planet’s temperature).
  • Independent Variable: What we use for prediction (distance from the sun).

Simple Linear Regression Formula

Simple linear regression is a method to model the relationship between a dependent variable (y) and an independent variable (x).

The formula for simple linear regression is: \[ y = \beta_0 + \beta_1 x + \epsilon \]

Where \(y\) is the dependent variable, \(x\) is the independent variable, \(\beta_0\) is the intercept, \(\beta_1\) is the slope, and \(\epsilon\) is the error term.

Assumptions of Simple Linear Regression

  1. Linearity: The relationship between \(x\) and \(y\) is linear.
  2. Independence: The residuals (errors) are independent.
  3. Homoscedasticity: The residuals have constant variance.
  4. Normality: The residuals are normally distributed.

Using Iris Dataset

# Load the iris dataset
data(iris)
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Summary of the Dataset

# Show summary statistics of the iris dataset
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

The Concept Visualized

## `geom_smooth()` using formula = 'y ~ x'

Observation:As Petal Width increases, Petal Length also increases.(Positive Relatioship)

The Mathematical Model

The linear regression equation is:

\[ Petal Length = \beta_0 + \beta_1 \times Petal Width + \epsilon \]

Based on our data, the regression equation is:

\[ Petal Length = 0.37 + 2.22 \times Petal Width \]

Interpretation: - Each additional unit of petal width increases the petal length by approximately 2.22 units.

Implementing Linear Regression

model <- lm(Petal.Length ~ Petal.Width, data = iris)
summary(model)$coefficients
##             Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 1.083558 0.07296696 14.84998 4.043318e-31
## Petal.Width 2.229940 0.05139623 43.38724 4.675004e-86
cat("R-squared:", summary(model)$r.squared, "\n")
## R-squared: 0.9271098

Coefficients: Show the estimated intercept and slope.

P-values: Indicate whether the relationship is statistically significant.

R-squared: Represents how much of the variation in Petal Length is explained by Petal Width.

Residuals and OLS Method

Observation: The residuals seem randomly scattered around zero, indicating that the linear regression model fits the data well.

There is no clear pattern in the residuals, suggesting that the assumptions of linear regression (linearity and homoscedasticity) are met.

Interactive 3D Plot: Petal Length, Petal Width, and Sepal Length

This observation highlights the relationship between the three variables based on the 3D visualization.

Conclusion

  • Linear regression helps us understand the relationship between variables (e.g., petal length and petal width).
  • For the iris dataset:
    • Petal Width is a significant predictor of Petal Length.
    • Each additional unit of petal width increases petal length by approximately 2.22 units.
    • The model explains 79.7% of the variation in petal length.

Key Takeaways: - Linear regression is a powerful tool for prediction and analysis.

Conclusion

  • Linear regression helps us understand the relationship between variables (e.g., petal length and petal width).
  • For the iris dataset:
    • Petal Width is a significant predictor of Petal Length.
    • Each additional unit of petal width increases petal length by approximately 2.22 units.
    • The model explains 79.7% of the variation in petal length.

Key Takeaways: - Linear regression is a powerful tool for prediction and analysis.