This presentation walks through a full data analysis of the Palmer Penguins dataset using R and Quarto. We’ll demonstrate:
The palmerpenguins dataset includes:
| Variable | Description |
|---|---|
| species | Penguin species (Adelie, Chinstrap, Gentoo) |
| island | Island name (Biscoe, Dream, Torgersen) |
| bill_length_mm | Bill length (mm) |
| bill_depth_mm | Bill depth (mm) |
| flipper_length_mm | Flipper length (mm) |
| body_mass_g | Body mass (g) |
| sex | Male or Female |
| year | Study year (2007–2009) |
Welch Two Sample t-test
data: body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-840.5783 -526.2453
sample estimates:
mean in group female mean in group male
3862.273 4545.685
We test if there’s a significant difference in average body mass between male and female penguins.
set.seed(123)
split <- initial_split(penguins, prop = 0.8)
train <- training(split)
test <- testing(split)
lm_model <- linear_reg() %>%
set_engine("lm") %>%
fit(body_mass_g ~ flipper_length_mm + bill_length_mm + species, data = train)
summary(lm_model$fit)
Call:
stats::lm(formula = body_mass_g ~ flipper_length_mm + bill_length_mm +
species, data = data)
Residuals:
Min 1Q Median 3Q Max
-797.77 -228.95 -41.52 204.95 1051.60
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3716.670 579.993 -6.408 6.84e-10 ***
flipper_length_mm 25.972 3.458 7.511 9.32e-13 ***
bill_length_mm 63.726 8.040 7.926 6.60e-14 ***
speciesChinstrap -760.859 92.314 -8.242 8.29e-15 ***
speciesGentoo 125.330 99.393 1.261 0.208
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 339.4 on 261 degrees of freedom
Multiple R-squared: 0.8275, Adjusted R-squared: 0.8249
F-statistic: 313.1 on 4 and 261 DF, p-value: < 2.2e-16
log_model <- logistic_reg() %>%
set_engine("glm") %>%
fit(sex ~ bill_length_mm + flipper_length_mm + body_mass_g, data = train)
tidy(log_model)# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 6.46 2.95 2.19 0.0287
2 bill_length_mm 0.134 0.0370 3.63 0.000280
3 flipper_length_mm -0.117 0.0247 -4.74 0.00000210
4 body_mass_g 0.00264 0.000438 6.04 0.00000000159
# A tibble: 1 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.144