2025-10-26

Slide 1 - What is Simple Linear Regression?

We model a numeric response \(Y\) using one predictor \(X\): \[ Y_i \;=\; \beta_0 \;+\; \beta_1 X_i \;+\; \varepsilon_i,\qquad \varepsilon_i \stackrel{iid}{\sim} N(0,\sigma^2). \]

Goals - Estimate \(\beta_0\) (intercept) and \(\beta_1\) (slope)
- Test if \(\beta_1 = 0\) (no linear association)
- Make predictions with uncertainty (CIs vs PIs)

Slide 2 - Estimation & Inference (Math)

To estimate the regression line, we use Ordinary Least Squares (OLS):

\[ \hat\beta_1 = \frac{\sum_{i=1}^n (X_i-\bar X)(Y_i-\bar Y)} {\sum_{i=1}^n (X_i-\bar X)^2}, \qquad \hat\beta_0 = \bar Y - \hat\beta_1 \bar X. \]

To test whether the predictor \(X\) matters, we perform a hypothesis test:

\[ H_0:\beta_1 = 0 \quad \text{vs} \quad H_1:\beta_1 \neq 0, \qquad t = \frac{\hat\beta_1}{\text{SE}(\hat\beta_1)} \sim t_{n-2}. \]

Interpretation - If the slope \(\beta_1\) is close to 0 → no linear relationship
- If the p-value is small → evidence of a linear relationship between \(X\) and \(Y\)

Slide 3 - Creating the Dataset using R

##      xside              yside       
##  Min.   :0.001195   Min.   :-3.135  
##  1st Qu.:1.440053   1st Qu.: 4.380  
##  Median :2.883242   Median : 7.607  
##  Mean   :2.687641   Mean   : 7.552  
##  3rd Qu.:3.864972   3rd Qu.:10.492  
##  Max.   :4.944459   Max.   :18.541

Slide 4-ggplot ScatterPlot

Slide 5- Model Output

## $coefficients
## # A tibble: 2 × 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)     3.31     0.629      5.26 4.92e- 7     2.07      4.55
## 2 xside           1.58     0.206      7.68 2.01e-12     1.17      1.98
## 
## $quality
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.285         0.280  3.69      59.0 2.01e-12     1  -408.  821.  830.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Slide 6- ggplot Residuals vs Fitted

Slide 7- 3D Plotly

Slide 8- Summary Latex

Simple Linear Regression models the relationship between a response variable \(Y\) and a predictor \(X\) using the equation \(Y = \beta_0 + \beta_1 X + \varepsilon\), where \(\varepsilon \sim N(0,\sigma^2)\). The goal is to estimate the slope and intercept, test whether a linear relationship exists, and make predictions. Model fit and assumptions are evaluated using residual diagnostics, and the strength of the relationship is often summarized by \(R^2\).