2025-02-09

Introduction to Simple Linear Regression

  • Simple Linear Regression is used to model the relationship between two variables.
  • One variable is the predictor (independent variable), and the other is the response (dependent variable).

Equation of Simple Linear Regression

The model is represented as:

\[ Y = \beta_0 + \beta_1X + \epsilon \]

where: - \(Y\) = dependent variable - \(X\) = independent variable - \(\beta_0\) = intercept - \(\beta_1\) = slope - \(\epsilon\) = error term

Generating Sample Data

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(htmlwidgets)  
set.seed(42)
x <- rnorm(100, mean = 50, sd = 10)
y <- 5 + 2*x + rnorm(100, sd = 5)
data <- data.frame(x, y)

Scatter Plot of Data

ggplot(data, aes(x = x, y = y)) +
  geom_point(color = "blue", alpha = 0.6) +
  ggtitle("Scatter Plot of X vs Y") +
  theme_minimal()

Performing Linear Regression

model <- lm(y ~ x, data = data)
summary(model)
## 
## Call:
## lm(formula = y ~ x, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.4421 -2.5332  0.0612  2.7053 14.3120 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.87919    2.25215   1.722   0.0881 .  
## x            2.01358    0.04383  45.938   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.542 on 98 degrees of freedom
## Multiple R-squared:  0.9556, Adjusted R-squared:  0.9552 
## F-statistic:  2110 on 1 and 98 DF,  p-value: < 2.2e-16

Regression Line Visualization

ggplot(data, aes(x = x, y = y)) +
  geom_point(color = "blue", alpha = 0.6) +
  geom_smooth(method = "lm", color = "red") +
  ggtitle("Regression Line on Scatter Plot") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

3D Visualization with Plotly

z <- 5 + 1.5*x + rnorm(100, sd = 5)
plot_ly(x = ~x, y = ~y, z = ~z, type = "scatter3d", mode = "markers")