2025-06-02

Simple Linear Regression

The equation for linear regression is \[y=mx+b\] which is derived from \[y=b_0+b_1x+\epsilon\] where: - \(y\): dependent variable - \(x\): independent variable - \(b\): initial condition/intercept - \(m\): change in dependent vs independent variable - \(\epsilon\): residual error

R squared coefficient

R squared is used to predict if the line of best fit matches the prediction for a scatterplot

The math formula is this \[R^2= 1-\frac{SS_{res}}{SS_{tot}}\] Usually, R^2 is determined through estimation or from graphs.

Food Waste Example Data

print(head(foodwaste))
##   Annual.Grocery.Cost Million.tons.of.food.waste
## 1                2500                       13.5
## 2                2300                       12.4
## 3                3499                       20.5
## 4                1203                       43.8
## 5                4458                       30.9
## 6                6690                       23.4

Slide: R Code with Output (Food Waste Graph Example)

ggplot(foodwaste, aes(x = Annual.Grocery.Cost, 
                      y = Million.tons.of.food.waste)) +
  geom_point(color = "green") +
  geom_smooth(method = "lm", se = TRUE, color = "yellow") +
labs(title = "Cost of Groceries vs Million Tons of Food Waste produced",
       x = "Annual Grocery Cost",
       y = "Million Tons of Food Waste")

GGPlot2 Graph, Example: Grocery Cost vs Food Waste

## `geom_smooth()` using formula = 'y ~ x'

Example Dataset 2- GGPlot2 Whale latitude vs Whale depth

## `geom_smooth()` using formula = 'y ~ x'

Plotly Example-Foodwaste