Linear regression is a statistical method for modeling the relationship between two continuous variables: one independent (predictor) and one dependent (response).
2025-11-17
Linear regression is a statistical method for modeling the relationship between two continuous variables: one independent (predictor) and one dependent (response).
\[
y = \beta_0 + \beta_1 x + \varepsilon
\] y = the dependent variable you’re trying to predict
x = the independent variable
\(\beta_0\) = the intercept (the predicted value of y when x = 0)
\(\beta_1\) = the slope
\(\varepsilon\) = the error
\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p + \varepsilon \] Multiple linear regression is used when there are several explanatory variables to predict the dependent variable.
The linear model for the diamonds dataset is:
\[ \text{price} = \beta_0 + \beta_1 \cdot \text{carat} + \varepsilon \]
Where:
- \(\text{price}\) = diamond price in USD - \(\text{carat}\) = weight in carats
- \(\beta_0\) = intercept
- \(\beta_1\) = slope (change in price per 1 carat)
- \(\varepsilon\) = error
## ggplot(diamonds, aes(x = carat, y = price)) + ## geom_point(alpha = 0.5, color = "blue") + ## geom_smooth(method = "lm", se = FALSE, color = "red") + ## labs( ## title = "Diamond Price vs Carat", ## x = "Carat", ## y = "Price (USD)" ## ) + ## theme_minimal()
## This scatterplot shows the relationship between diamond carat and price. ## The regression line shows that as carat increases, ## the price increases. ## ## Method = "lm", ## tells ggplot to fit a linear regression line to the data