What is Simple Linear Regression?

  • Models the linear relationship between two variables
  • Independent variable (X) : predictor
  • Dependent variable (Y) : response

\[Y = \beta_0 + \beta_1 X + \epsilon\]

Example: Height vs. Weight

Research question: Can we predict a person’s weight based on their height?

  • X = Height (inches)
  • Y = Weight (pounds)

Load and Explore the Data

##     Height   Weight
## 1 64.75810 295.2112
## 2 66.07929 296.9286
## 3 73.23483 328.9137
## 4 67.28203 323.2982
## 5 67.51715 300.4406
## 6 73.86026 355.1182

Scatter Plot (ggplot)

Interactive 3D Plot (plotly)

Fitting the Regression Model

We use a simple linear regression model:

\[ \hat{Weight} = \hat{\beta}_0 + \hat{\beta}_1 \cdot Height \]

This model describes a linear relationship between height and weight.

The slope \(\hat{\beta}_1\) represents the expected change in weight for each additional inch in height.

Interpreting the Results

From the model summary:

  • Slope (\(\hat{\beta}_1\)) : For each 1-inch increase in height, weight increases by about 4.5 pounds
  • Hypothesis test: \(H_0: \beta_1 = 0\) vs \(H_a: \beta_1 \neq 0\)

\[t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)}\]

Confidence Interval for Slope

##                  2.5 %    97.5 %
## (Intercept) -60.486075 82.541533
## Height        3.304857  5.432053

\[CI = \hat{\beta}_1 \pm t_{\alpha/2, n-2} \times SE(\hat{\beta}_1)\]

We are 95% confident that the slope is between 3.30 and 5.43

Residual Diagnostics (ggplot)

Making Predictions

##        fit      lwr     upr
## 1 316.8196 288.8062 344.833

For a person with height 70 inches, predicted weight is about 317 pounds.

Model Summary

The model shows a statistically significant positive relationship between height and weight (p < 0.001).

  • Slope = 4.37 : Weight increases by 4.37 pounds per inch of height
  • R-squared = 0.587 : Height explains 58.7% of weight variation

Conclusion

  • Simple linear regression models the relationship between X and Y
  • The model shows a strong linear relationship in the simulated data
  • Slope indicates how much Y changes when X increases by 1 unit
  • Diagnostics help check model assumptions