2025-03-15

Example Data

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

What is Simple Linear Regression?

  • A technique to model the relationship between two variables by fitting a linear equation.
  • Formula:
    \[ Y = \beta_0 + \beta_1X + \epsilon \]
    where:
  • \(Y\) = dependent variable
  • \(X\) = independent variable
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\epsilon\) = error term

Hypothesis Testing for Slope

  • Null Hypothesis:
    \[ H_0 : \beta_1 = 0 \]
  • Alternative Hypothesis:
    \[ H_A : \beta_1 \neq 0 \]
  • Test statistic:
    \[ t = \frac{\hat{\beta_1} - 0}{SE(\hat{\beta_1})} \]
    where \(SE(\hat{\beta_1})\) is the standard error of the slope.
  • p-value determines the statistical significance of the slope.

3D Plot with Plotly

plot_ly(mtcars, x = ~wt, y = ~mpg, z = ~disp, type = "scatter3d", 
        mode = "markers")

Scatter Plot with ggplot

ggplot(mtcars, aes(x=wt, y=mpg)) +
  geom_point() +
  geom_smooth(method="lm", col="blue") +
  labs(title="Regression Line of MPG vs Weight", x="Weight", 
       y="Miles per Gallon")

Residual Plot

model <- lm(mpg ~ wt, data = mtcars)
residuals <- resid(model)
ggplot(mtcars, aes(x = wt, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, col = "red") +
  labs(title="Residual Plot", x="Weight", y="Residuals")

R Code for Linear Regression

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Slide with R Output

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00