Linear regression is used to model the relationship between a continuous dependent variable and one or more independent variables. It is a straight line through data to help predict outcomes and understand how factors influence the target variable.
2026-06-07
Linear regression is used to model the relationship between a continuous dependent variable and one or more independent variables. It is a straight line through data to help predict outcomes and understand how factors influence the target variable.
The Model is
\[ Y = \beta_0 + \beta_1X + \epsilon \]
We assume that
\[ \epsilon \sim N(0,\sigma^2) \]
ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
Residual Plots are used to tell how good the line is created by linear regression. A good residual plot should show points scattered around the zero line in no obvious pattern
model <- lm(mpg ~ wt, data = mtcars) ggplot(data.frame( fitted = fitted(model), residuals = resid(model) ), aes(fitted, residuals)) + geom_point() + geom_hline(yintercept = 0, linetype = "dashed") #Adds x-axis line at 0 to help
Here is a more interactive plot using Plotly to get a better idea of the data
plot_ly(
data = mtcars,
x = ~wt,
y = ~mpg,
type = "scatter",
mode = "markers",
color = ~factor(cyl)
) %>%
layout(
title = "Miles per Gallon vs Weight",
xaxis = list(title = "Weight"),
yaxis = list(title = "Miles per Gallon")
)
#I do not know why it doesn't show up in the html file
model <- lm(mpg ~ wt, data = mtcars) summary(model)
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10