Simple linear regression models the relationship between two variables using a straight line.
Simple linear regression models the relationship between two variables using a straight line.
The model:
\[ Y = \beta_0 + \beta_1 X + \varepsilon \]
Where:
- \(Y\): response variable
- \(X\): predictor variable
- \(\beta_0\): intercept
- \(\beta_1\): slope
- \(\varepsilon\): error term
To estimate the slope and intercept of the regression line, we use the least squares method:
\[ \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})} {\sum_{i=1}^n (x_i - \bar{x})^2} \]
\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]
set.seed(1) x <- rnorm(100, mean = 50, sd = 10) y <- 5 + 0.8 * x + rnorm(100, sd = 5) data <- data.frame(x, y)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
ggplot(data, aes(x = x, y = y)) + geom_point(color = "darkblue") + labs(title = "Scatterplot of X and Y")
model <- lm(y ~ x, data = data) summary(model)
## ## Call: ## lm(formula = y ~ x, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.3842 -3.0688 -0.6975 2.6970 11.7309 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.83805 2.79361 1.732 0.0865 . ## x 0.79947 0.05386 14.843 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.814 on 98 degrees of freedom ## Multiple R-squared: 0.6921, Adjusted R-squared: 0.689 ## F-statistic: 220.3 on 1 and 98 DF, p-value: < 2.2e-16
ggplot(data, aes(x = x, y = y)) + geom_point() + geom_smooth(method = "lm", se = FALSE, color = "red") + labs(title = "Linear Regression Fit")
## `geom_smooth()` using formula = 'y ~ x'
library(plotly)
## Warning: package 'plotly' was built under R version 4.4.3
## ## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2': ## ## last_plot
## The following object is masked from 'package:stats': ## ## filter
## The following object is masked from 'package:graphics': ## ## layout
plot_ly(data, x = ~x, y = ~y, type = "scatter", mode = "markers")