- A linear regression is a statistical model used to make predictions or estimates
- describes the relationship between an independent variable (X) and a dependent variable (Y)
- The model is linear, meaning changes in X lead to proportional changes in Y
October 16, 2025
Here is a General Regression Equation:
\[ Y = \alpha + \beta X + \epsilon \]
We will do a linear regression on height and weight in the women data set:
## ## Call: ## lm(formula = height ~ weight, data = women) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.83233 -0.26249 0.08314 0.34353 0.49790 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 25.723456 1.043746 24.64 2.68e-12 *** ## weight 0.287249 0.007588 37.85 1.09e-14 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.44 on 13 degrees of freedom ## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903 ## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14
Equation Gained from Regression:
\[ Y = 25.72 + 0.29X \]
women_plot <- plot_ly(women, x = ~weight, y = ~height, type = 'scatter', mode = 'markers', marker = list(size = 5, line = list(color = 'steelblue', width = 1)), name = 'Women\'s data', text = ~paste('Weight:', weight, '<br>Height:', height) ) |> add_trace(x = women$weight, y = fitted(women_regression), mode = 'lines', line = list(color = 'tomato', width = 2), name = 'Line of Best Fit', inherit = FALSE ) |> layout( title = 'Linear Regression of Height Vs Weight for Women', xaxis = list(title = 'Weight (lbs)'), yaxis = list(title = 'Height (Inches)') )
## ## Call: ## lm(formula = Temp ~ Solar.R, data = clean_airqual) ## ## Residuals: ## Min 1Q Median 3Q Max ## -22.3787 -4.9572 0.8932 5.9111 18.4013 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 72.863012 1.693951 43.014 < 2e-16 *** ## Solar.R 0.028255 0.008205 3.444 0.000752 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 8.898 on 144 degrees of freedom ## Multiple R-squared: 0.07609, Adjusted R-squared: 0.06967 ## F-statistic: 11.86 on 1 and 144 DF, p-value: 0.0007518
clean_pen <- penguins |> filter(!is.na(bill_len) & !is.na(bill_dep)) pen_plot <- ggplotly( ggplot(clean_pen, aes(x = bill_dep, y = bill_len, color = species)) + geom_point(size = 2, alpha = .7) + geom_smooth(method = 'lm', se = FALSE) + labs(title = 'LR of Bill Length on Bill Depth by Species', x = 'Bill Depth (mm)', y = 'Bill Length (mm)') + theme_solarized())
Hopefully you were able to learn a little about simple linear regressions and how you can interpret/plot them using R.