2025-11-17

What is linear regression?

Linear regression is a statistical method for modeling the relationship between two continuous variables: one independent (predictor) and one dependent (response).

Linear Regression Equation

\[ y = \beta_0 + \beta_1 x + \varepsilon \] y = the dependent variable you’re trying to predict
x = the independent variable
\(\beta_0\) = the intercept (the predicted value of y when x = 0)
\(\beta_1\) = the slope
\(\varepsilon\) = the error

Multiple Linear Regression Equation

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p + \varepsilon \] Multiple linear regression is used when there are several explanatory variables to predict the dependent variable.

The linear model for the diamonds dataset is:

\[ \text{price} = \beta_0 + \beta_1 \cdot \text{carat} + \varepsilon \]

Where:
- \(\text{price}\) = diamond price in USD - \(\text{carat}\) = weight in carats
- \(\beta_0\) = intercept
- \(\beta_1\) = slope (change in price per 1 carat)
- \(\varepsilon\) = error

ggplot #1 - Scatterplot and Regression Line

ggplot #1 - Code

## ggplot(diamonds, aes(x = carat, y = price)) +
##   geom_point(alpha = 0.5, color = "blue") +
##   geom_smooth(method = "lm", se = FALSE, color = "red") +
##   labs(
##     title = "Diamond Price vs Carat",
##     x = "Carat",
##     y = "Price (USD)"
##   ) +
##   theme_minimal()
## This scatterplot shows the relationship between diamond carat and price. 
##     The regression line shows that as carat increases, 
##     the price increases.
##     
##     Method = "lm", 
##     tells ggplot to fit a linear regression line to the data

ggplot #2 - Histogram

3D Relationship