Exploring how penguin body mass relates to their flipper length using the ‘palmerpenguins’ dataset.

Here’s a lil penguin:)

Simple Linear Regression

Using the penguins dataset, I will model the relationship between two continuous variables:

\[ Y = \beta_0 + \beta_1 X + \epsilon \] Here, - \(Y\): Body Mass (Dependent variable) - \(X\): Flipper Length (Independent variable)
- \(\beta_0\): Intercept
- \(\beta_1\): Slope
- \(\epsilon\): Random error term

Data Background

The data set includes information about three penguin species: Adelie, Chinstrap, and Gentoo. I will attempt to model how the length of their flippers relates to their body mass.

head(penguins)
## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           NA            NA                  NA          NA
## 5 Adelie  Torgersen           36.7          19.3               193        3450
## 6 Adelie  Torgersen           39.3          20.6               190        3650
## # ℹ 2 more variables: sex <fct>, year <int>

Scatterplot Flipper Length vs. Body Mass

ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  geom_point(size = 2) +
  labs(title = "Flipper Length vs. Body Mass by Species", x = "Flipper Length (mm)",
       y = "Body Mass (g)") +
  theme_minimal()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

Fitting the model

penguins_clean <- penguins %>%
  filter(!is.na(flipper_length_mm), !is.na(body_mass_g))  #tidying data

model <- lm(body_mass_g ~ flipper_length_mm, data = penguins_clean)

coef(model)
##       (Intercept) flipper_length_mm 
##       -5780.83136          49.68557

here, flipper length (mm) is the slope, β₁.

Regression Line

Modelling a Regression Line in the scatterplot -

ggplot(penguins_clean, aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(color = "#6666CC") +
  geom_smooth(method = "lm", se = TRUE, color = "thistle4") +
  labs(title = "Linear Regression: Body Mass vs Flipper Length",
       x = "Flipper Length (mm)",
       y = "Body Mass (g)") +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

3D Visualization

plot_ly(
  penguins_clean,
  x = ~flipper_length_mm,
  y = ~body_mass_g,
  z = ~as.numeric(species),
  color = ~species,
  type = "scatter3d",
  mode = "markers") %>%
  layout(
    scene = list(
      xaxis = list(title = "Flipper Length (mm)"),
      yaxis = list(title = "Body Mass (g)"),
      zaxis = list(title = "Species")
    )
  )

Regression Equation

\[ Body Mass = \beta_0 + \beta_1 \times FlipperLength + \epsilon \] Looking at the equation, it can be said that for each 1 mm increase in flipper length, body mass increases by approximately β₁ grams.

R Code for Plots

model <- lm(body_mass_g ~ flipper_length_mm, data = penguins_clean)

Summary and Insights

After modelling the relationship between the flipper lengths and body mass across three species of penguins, it can be said that - Penguins with longer flippers tend to weigh more. - The relationship is linear, and positive. - The regression model can help us predict body mass from flipper length. - This exploration is also a great sample of how statistics is vital in zoology and modelling of different characteristics.