2025-03-16

Introduction

Simple Linear Regression is a statistical method to model the relationship between two variables using a linear equation.

Equation

The equation of a simple linear regression model is:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

where: - \(Y\) is the dependent variable
- \(X\) is the independent variable
- \(\beta_0\) is the intercept
- \(\beta_1\) is the slope
- \(\epsilon\) is the error term

Example Dataset

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
# Generate Example Data
set.seed(42)
data <- data.frame(
  X = 1:100,
  Y = 5 + 0.5 * (1:100) + rnorm(100, mean = 0, sd = 5)
)

head(data)  # Show first few rows
##   X         Y
## 1 1 12.354792
## 2 2  3.176509
## 3 3  8.315642
## 4 4 10.164313
## 5 5  9.521342
## 6 6  7.469377

Scatter Plot with Regression Line

ggplot(data, aes(x=X, y=Y)) +
  geom_point(color="blue") +
  geom_smooth(method="lm", col="red") +
  labs(title="Scatter Plot with Regression Line", x="X", y="Y")
## `geom_smooth()` using formula = 'y ~ x'

Residuals Plot

model <- lm(Y ~ X, data=data)
data$residuals <- resid(model)

ggplot(data, aes(x=X, y=residuals)) +
  geom_point(color="purple") +
  geom_hline(yintercept=0, linetype="dashed", color="red") +
  labs(title="Residuals vs X", x="X", y="Residuals")

3D Visualization (Plotly)

R Code Output

summary(model)
## 
## Call:
## lm(formula = Y ~ X, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.0974  -3.3091   0.4045   3.2635  11.1318 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.34467    1.05434   5.069 1.89e-06 ***
## X            0.49639    0.01813  27.386  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.232 on 98 degrees of freedom
## Multiple R-squared:  0.8844, Adjusted R-squared:  0.8833 
## F-statistic:   750 on 1 and 98 DF,  p-value: < 2.2e-16

Conclusion

Simple Linear Regression is a foundational statistical tool for modeling relationships between two variables. It helps in prediction and understanding data trends.

Thank You!

Any Questions?