Simple Linear Regression

Simple linear regression models the relationship between two variables using a straight line.

Mathematical Model

The model:

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

Where:
- \(Y\): response variable
- \(X\): predictor variable
- \(\beta_0\): intercept
- \(\beta_1\): slope
- \(\varepsilon\): error term

Coefficient Estimation (Math Slide 2)

To estimate the slope and intercept of the regression line, we use the least squares method:

\[ \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})} {\sum_{i=1}^n (x_i - \bar{x})^2} \]

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

Simulating Data in R

set.seed(1)
x <- rnorm(100, mean = 50, sd = 10)
y <- 5 + 0.8 * x + rnorm(100, sd = 5)
data <- data.frame(x, y)

Scatter Plot (ggplot2 #1)

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
ggplot(data, aes(x = x, y = y)) +
  geom_point(color = "darkblue") +
  labs(title = "Scatterplot of X and Y")

Fitting the Linear Model (R Code)

model <- lm(y ~ x, data = data)
summary(model)
## 
## Call:
## lm(formula = y ~ x, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3842 -3.0688 -0.6975  2.6970 11.7309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.83805    2.79361   1.732   0.0865 .  
## x            0.79947    0.05386  14.843   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.814 on 98 degrees of freedom
## Multiple R-squared:  0.6921, Adjusted R-squared:  0.689 
## F-statistic: 220.3 on 1 and 98 DF,  p-value: < 2.2e-16

Regression Line (ggplot2 #2)

ggplot(data, aes(x = x, y = y)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Linear Regression Fit")
## `geom_smooth()` using formula = 'y ~ x'

Interactive Scatter Plot (plotly)

library(plotly)
## Warning: package 'plotly' was built under R version 4.4.3
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(data, x = ~x, y = ~y, type = "scatter", mode = "markers")