Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.
2024-10-22
Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.
\[ y_i = \beta_0 + \beta_1x_i + \epsilon_i \] Where: - \(y_i\): Dependent variable - \(x_i\): Independent variable - \(\beta_0\): Intercept - \(\beta_1\): Slope - \(\epsilon_i\): Error term
set.seed(42) x <- rnorm(50, mean = 5, sd = 2) y <- 2 * x + rnorm(50)
model <- lm(y ~ x) ggplot(data = data.frame(x, y), aes(x = x, y = y)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + theme_minimal()
x_seq <- seq(0, 10, length.out = 50)
y_seq <- 2 * x_seq + rnorm(50)
plot_ly(x = ~x_seq, y = ~y_seq, z = ~rnorm(50), type = "scatter3d", mode = "markers") %>%
layout(title = "3D Plot of Simple Linear Regression",
scene = list(xaxis = list(title = 'X'),
yaxis = list(title = 'Y'),
zaxis = list(title = 'Residuals')))
The equation of the best fit line is:
\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x \]
Where: - \(\hat{\beta}_0\) is the intercept estimate - \(\hat{\beta}_1\) is the slope estimate
residuals <- residuals(model) ggplot(data = data.frame(x = x, residuals = residuals), aes(x = x, y = residuals)) + geom_point() + geom_hline(yintercept = 0, linetype = "dashed") + theme_minimal()