Here’s a lil penguin:)
Here’s a lil penguin:)
Using the penguins dataset, I will model the relationship between two continuous variables:
\[
Y = \beta_0 + \beta_1 X + \epsilon
\] Here, - \(Y\): Body Mass (Dependent variable) - \(X\): Flipper Length (Independent variable)
- \(\beta_0\): Intercept
- \(\beta_1\): Slope
- \(\epsilon\): Random error term
The data set includes information about three penguin species: Adelie, Chinstrap, and Gentoo. I will attempt to model how the length of their flippers relates to their body mass.
head(penguins)
## # A tibble: 6 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## <fct> <fct> <dbl> <dbl> <int> <int> ## 1 Adelie Torgersen 39.1 18.7 181 3750 ## 2 Adelie Torgersen 39.5 17.4 186 3800 ## 3 Adelie Torgersen 40.3 18 195 3250 ## 4 Adelie Torgersen NA NA NA NA ## 5 Adelie Torgersen 36.7 19.3 193 3450 ## 6 Adelie Torgersen 39.3 20.6 190 3650 ## # ℹ 2 more variables: sex <fct>, year <int>
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
geom_point(size = 2) +
labs(title = "Flipper Length vs. Body Mass by Species", x = "Flipper Length (mm)",
y = "Body Mass (g)") +
theme_minimal()
## Warning: Removed 2 rows containing missing values or values outside the scale range ## (`geom_point()`).
penguins_clean <- penguins %>% filter(!is.na(flipper_length_mm), !is.na(body_mass_g)) #tidying data model <- lm(body_mass_g ~ flipper_length_mm, data = penguins_clean) coef(model)
## (Intercept) flipper_length_mm ## -5780.83136 49.68557
here, flipper length (mm) is the slope, β₁.
Modelling a Regression Line in the scatterplot -
ggplot(penguins_clean, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(color = "#6666CC") +
geom_smooth(method = "lm", se = TRUE, color = "thistle4") +
labs(title = "Linear Regression: Body Mass vs Flipper Length",
x = "Flipper Length (mm)",
y = "Body Mass (g)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
plot_ly(
penguins_clean,
x = ~flipper_length_mm,
y = ~body_mass_g,
z = ~as.numeric(species),
color = ~species,
type = "scatter3d",
mode = "markers") %>%
layout(
scene = list(
xaxis = list(title = "Flipper Length (mm)"),
yaxis = list(title = "Body Mass (g)"),
zaxis = list(title = "Species")
)
)
\[ Body Mass = \beta_0 + \beta_1 \times FlipperLength + \epsilon \] Looking at the equation, it can be said that for each 1 mm increase in flipper length, body mass increases by approximately β₁ grams.
model <- lm(body_mass_g ~ flipper_length_mm, data = penguins_clean)
After modelling the relationship between the flipper lengths and body mass across three species of penguins, it can be said that - Penguins with longer flippers tend to weigh more. - The relationship is linear, and positive. - The regression model can help us predict body mass from flipper length. - This exploration is also a great sample of how statistics is vital in zoology and modelling of different characteristics.