A basic statistical method for modeling relationships between two variables.
2025-11-15
A basic statistical method for modeling relationships between two variables.
First we want to understand how a predictor \(x\) influences a response variable \(y\). We model this relationship with a straight line.
Mathematically: \[ y = \beta_0 + \beta_1 x + \varepsilon \]
To estimate the best lines, we must minimize squared residuals.
Closed-form estimates: \[ \hat{\beta}_1 = \frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2} \]
\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]
set.seed(12) x <- runif(80, 0, 12) y <- 3 + 2*x + rnorm(80, sd = 3) df <- data.frame(x, y) model <- lm(y ~ x, data = df) summary(model)
## ## Call: ## lm(formula = y ~ x, data = df) ## ## Residuals: ## Min 1Q Median 3Q Max ## -6.7721 -1.7142 -0.0674 1.8425 5.8331 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.41886 0.59929 5.705 2e-07 *** ## x 1.95822 0.08799 22.256 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.63 on 78 degrees of freedom ## Multiple R-squared: 0.864, Adjusted R-squared: 0.8622 ## F-statistic: 495.3 on 1 and 78 DF, p-value: < 2.2e-16
## `geom_smooth()` using formula = 'y ~ x'
The slope \(\hat{\beta}_1\): - Represents the estimated change in the \(y\) for each single unit increase in \(x\). - A positive slope suggests \(y\) increased as \(x\) increases.
Example interpretation(based on simulated data): A 1 unit increase in \(x\) is associated with about a 2 unit increase in \(y\).
## fit lwr upr ## 1 7.335288 6.430968 8.239608 ## 2 15.168153 14.582674 15.753632 ## 3 23.001018 22.079262 23.922774
This gives confidence intervals for the mean response