2026-05-04

Why I Picked This

I wanted to use a real dataset that was not just cars or something super basic. The penguin data seemed more interesting because it has actual animal measurements, and I wanted to see if bigger flippers usually mean a heavier penguin.

What I Looked At

The main two variables I used were flipper length and body mass. I mostly wanted to see whether penguins with longer flippers also tend to weigh more. It’s a simple question, but it works well for regression because both variables are numerical.

Regression Idea

Simple linear regression tries to fit a straight line through data. For this example, the model is basically:

\[ body\_mass = intercept + slope * flipper\_length + error \]

So the model is using flipper length to make a prediction about body mass.

First Plot

What I Notice

The pattern is very easy to see. Most of the penguins with shorter flippers are lower in body mass, and the ones with longer flippers are usually heavier. It is not perfect because real data almost never lines up perfectly, but the general direction makes sense.

Looking by Species

Regression Line

Slope Meaning

The fitted model has the form:

\[ \hat{y} = b_0 + b_1x \]

Here, y-hat is the predicted body mass. The slope tells us how much the predicted body mass changes when flipper length goes up by 1 mm.

Code I Used

fit_1 = lm(body_mass_g ~ flipper_length_mm, data = penguins2)

ggplot(penguins2, aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(size = 2.5) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal()

Model Output

## 
## Call:
## lm(formula = body_mass_g ~ flipper_length_mm, data = penguins2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1057.33  -259.79   -12.24   242.97  1293.89 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -5872.09     310.29  -18.93   <2e-16 ***
## flipper_length_mm    50.15       1.54   32.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 393.3 on 331 degrees of freedom
## Multiple R-squared:  0.7621, Adjusted R-squared:  0.7614 
## F-statistic:  1060 on 1 and 331 DF,  p-value: < 2.2e-16

Interactive Plot

Prediction Example

##        1 
## 4158.561

If a penguin has a flipper length of 200 mm, this gives the model’s estimated body mass. I would not treat it like an exact answer, but it is a reasonable guess from the pattern in the data.

Next Steps

  • It might be useful to compare each species seperately.
  • I could also look at bill length and see if that changes the model much.

Final Thoughts

Overall, the model shows a clear positive relationship between flipper length and body mass. Longer flippers usually go with heavier penguins. This was a good example for regression because the plot already tells most of the story before even looking at the model output.