2025-03-17

Introduction:

Simple linear regression is used to estimate the relationship between two quantitative variables.

\[ Y = a + bX \] —

Estimating Parameters

To estimate the parameters of a linear regression model, the most common way is using the least squares method:

Slope : \[ \hat{b} = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2} \]

Intercept:

\[ \hat{a} = \bar{Y} - \hat{b} \bar{X} \] —

Data Visualization Code

library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(x=wt, y=mpg)) + 
    geom_point() + 
    geom_smooth(method='lm', col='red') +
    labs(title='Scatterplot with Regression Line', x='Weight', y='MPG')

Data Visualization

Model Diagnostics

3D Visualization (I used some AI help on this)

R Code for Model

lm_model <- lm(mpg ~ wt, data=mtcars)
summary(lm_model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Conclusion

Simple linear regression is a powerful tool for modeling relationships between variables. Diagnostics help assess model validity, and visualizations enhance interpretation.