I wanted to use a real dataset that was not just cars or something super basic. The penguin data seemed more interesting because it has actual animal measurements, and I wanted to see if bigger flippers usually mean a heavier penguin.
2026-05-04
I wanted to use a real dataset that was not just cars or something super basic. The penguin data seemed more interesting because it has actual animal measurements, and I wanted to see if bigger flippers usually mean a heavier penguin.
The main two variables I used were flipper length and body mass. I mostly wanted to see whether penguins with longer flippers also tend to weigh more. It’s a simple question, but it works well for regression because both variables are numerical.
Simple linear regression tries to fit a straight line through data. For this example, the model is basically:
\[ body\_mass = intercept + slope * flipper\_length + error \]
So the model is using flipper length to make a prediction about body mass.
The pattern is very easy to see. Most of the penguins with shorter flippers are lower in body mass, and the ones with longer flippers are usually heavier. It is not perfect because real data almost never lines up perfectly, but the general direction makes sense.
The fitted model has the form:
\[ \hat{y} = b_0 + b_1x \]
Here, y-hat is the predicted body mass. The slope tells us how much the predicted body mass changes when flipper length goes up by 1 mm.
fit_1 = lm(body_mass_g ~ flipper_length_mm, data = penguins2) ggplot(penguins2, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(size = 2.5) + geom_smooth(method = "lm", se = FALSE) + theme_minimal()
## ## Call: ## lm(formula = body_mass_g ~ flipper_length_mm, data = penguins2) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1057.33 -259.79 -12.24 242.97 1293.89 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5872.09 310.29 -18.93 <2e-16 *** ## flipper_length_mm 50.15 1.54 32.56 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 393.3 on 331 degrees of freedom ## Multiple R-squared: 0.7621, Adjusted R-squared: 0.7614 ## F-statistic: 1060 on 1 and 331 DF, p-value: < 2.2e-16
## 1 ## 4158.561
If a penguin has a flipper length of 200 mm, this gives the model’s estimated body mass. I would not treat it like an exact answer, but it is a reasonable guess from the pattern in the data.
Overall, the model shows a clear positive relationship between flipper length and body mass. Longer flippers usually go with heavier penguins. This was a good example for regression because the plot already tells most of the story before even looking at the model output.