2026-04-13

Introduction

Linear regression models relationships between variables.

Mathematical Model

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Objective Function

\[ \min \sum (y_i - \hat{y}_i)^2 \]

Generate Data

set.seed(1)
x <- 1:10
y <- 2*x + rnorm(10, 0, 2)
data <- data.frame(x, y)
data
##     x          y
## 1   1  0.7470924
## 2   2  4.3672866
## 3   3  4.3287428
## 4   4 11.1905616
## 5   5 10.6590155
## 6   6 10.3590632
## 7   7 14.9748581
## 8   8 17.4766494
## 9   9 19.1515627
## 10 10 19.3892232

Scatter Plot (ggplot)

ggplot(data, aes(x, y)) +
  geom_point()

Regression Line (ggplot)

ggplot(data, aes(x, y)) +
  geom_point() +
  geom_smooth(method = "lm", color = "blue")
## `geom_smooth()` using formula = 'y ~ x'

Interactive Plot (Plotly)

plot_ly(data, x = ~x, y = ~y, type = "scatter", mode = "markers")

Linear Model

model <- lm(y ~ x, data = data)
summary(model)
## 
## Call:
## lm(formula = y ~ x, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9601 -1.2820  0.4677  0.5357  3.0903 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.3376     1.1054  -0.305    0.768    
## x             2.1095     0.1782  11.841 2.37e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.618 on 8 degrees of freedom
## Multiple R-squared:  0.946,  Adjusted R-squared:  0.9393 
## F-statistic: 140.2 on 1 and 8 DF,  p-value: 2.373e-06

Code Example

model <- lm(y ~ x, data = data)
summary(model)

Conclusion

Linear regression helps in prediction and understanding relationships between variables.