Simple Linear Regression models the relationship between a independent variable \(X\) and a dependent variable \(Y\).
- Used for prediction and inference
- We will use the diamonds dataset from ggplot2
Simple Linear Regression models the relationship between a independent variable \(X\) and a dependent variable \(Y\).
The simple linear regression model is:
\[ Y = \beta_0 + \beta_1 X + \varepsilon \]
Where:
- \(Y\): dependent variable (price)
- \(X\): independent variable (carat)
- \(\beta_0, \beta_1\): model parameters
- \(\varepsilon\): error term
library(ggplot2) data(diamonds)
The dataset contains:
- price of diamonds
- carat weight.
- cut, color, clarity
- depth and table percentages
## `geom_smooth()` using formula = 'y ~ x'
Least Squares Estimators:
\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]
\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]
These formulas choose the line that minimizes squared error.
model <- lm(price ~ carat, data = diamonds) summary(model)
The output would show:
- estimated coefficients
- p-values
- model fit statistics (R²)