Slide 1: Introduction

Simple Linear Regression models the relationship between a independent variable \(X\) and a dependent variable \(Y\).

  • Used for prediction and inference
  • We will use the diamonds dataset from ggplot2

Slide 2: Model Definition

The simple linear regression model is:

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

Where:
- \(Y\): dependent variable (price)
- \(X\): independent variable (carat)
- \(\beta_0, \beta_1\): model parameters
- \(\varepsilon\): error term

Slide 3: Load the Diamonds Dataset

library(ggplot2)
data(diamonds)

The dataset contains:
- price of diamonds
- carat weight.
- cut, color, clarity
- depth and table percentages

Slide 4: ggplot Scatterplot

Slide 5: Regression Line (ggplot)

## `geom_smooth()` using formula = 'y ~ x'

Slide 6: Parameter Estimation

Least Squares Estimators:

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

These formulas choose the line that minimizes squared error.

Slide 7: 3D Plotly Plot

Slide 8: Fit Regression in R

model <- lm(price ~ carat, data = diamonds)
summary(model)

The output would show:
- estimated coefficients
- p-values
- model fit statistics (R²)

Slide 9: Conclusion

  • Simple linear regression is powerful and interpretable
  • Diamonds dataset shows strong linear relationship between price and carat