In this presentation, we explore Simple Linear Regression, a fundamental technique in statistics used to model relationships between two continuous variables.
2025-06-11
In this presentation, we explore Simple Linear Regression, a fundamental technique in statistics used to model relationships between two continuous variables.
We model the relationship between a response variable \(y\) and a predictor variable \(x\) as:
\[ y = \beta_0 + \beta_1 x + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2) \]
data(mtcars) mod <- lm(mpg ~ wt, data=mtcars) summary(mod)
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
library(ggplot2) ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point() + geom_smooth(method="lm", se=FALSE, color="green") + labs(title="MPG vs Weight", x="Weight", y="Miles Per Gallon")
Estimating Parameters: From the model \(y = \beta_0 + \beta_1 x + \varepsilon\), we estimate:
\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}, \quad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]
ggplot(mtcars, aes(x=wt, y=resid(mod))) + geom_point() + geom_hline(yintercept = 0, linetype="dashed") + labs(title="Residuals vs Weight", y="Residuals", x="Weight")
library(plotly)
plot_ly(data=mtcars, x=~wt, y=~hp, z=~mpg,
type="scatter3d", mode="markers",
color=~factor(cyl)) %>%
layout(title="MPG vs Weight and Horsepower")