Purpose & Data Overview

The purpose of this project is to explore the Palmer Penguins dataset using graphs and statistical analysis.

The dataset comes from the palmerpenguins R package and contains information about penguins in Antarctica, including:

  • Species
  • Island
  • Bill length and depth
  • Flipper length
  • Body mass

First Look at the Data

## # A tibble: 6 × 4
##   species island    body_mass_g flipper_length_mm
##   <fct>   <fct>           <int>             <int>
## 1 Adelie  Torgersen        3750               181
## 2 Adelie  Torgersen        3800               186
## 3 Adelie  Torgersen        3250               195
## 4 Adelie  Torgersen        3450               193
## 5 Adelie  Torgersen        3650               190
## 6 Adelie  Torgersen        3625               181

This shows the first few rows of the cleaned data.

GGPlot1: Body Mass by Species

ggplot(penguins_df, aes(x = species, y = body_mass_g, fill = species)) +
  geom_boxplot()

ggplot(penguins_df, aes(x = species, y = body_mass_g, fill = species)) +
  geom_boxplot() +
  labs(
    title = "Body Mass by Species",
    x = "Species",
    y = "Body Mass (g)"
  )

Gentoo penguins tend to have larger body mass than Adelie and Chinstrap penguins.

GGPlot 2: Bill Length vs Bill Depth

The three species form different clusters, that means bill measurements help separate the penguins by species.

Plotly 1: Average Body Mass

Plotly 2: 3D Plot Bill Length, Bill Depth, Flipper Length

Descriptive Statistics

## # A tibble: 3 × 6
##   species   mean_body_mass median_body_mass sd_body_mass mean_flipper
##   <fct>              <dbl>            <dbl>        <dbl>        <dbl>
## 1 Adelie             3706.             3700         459.         190.
## 2 Chinstrap          3733.             3700         384.         196.
## 3 Gentoo             5092.             5050         501.         217.
## # ℹ 1 more variable: mean_bill_length <dbl>

Gentoo penguins have the highest average body mass.

ANOVA Analysis

##              Df    Sum Sq  Mean Sq F value Pr(>F)    
## species       2 145190219 72595110   341.9 <2e-16 ***
## Residuals   330  70069447   212332                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is very small, so body mass significantly differs by species.

Regression Analysis

##                       Estimate Std. Error    t value     Pr(>|t|)
## (Intercept)       -5836.298732 312.603503 -18.669972 1.341791e-53
## flipper_length_mm    48.889692   2.034204  24.033815 1.737931e-74
## bill_length_mm        4.958601   5.213505   0.951107 3.422461e-01

This model examines whether larger flipper length and bill length are associated with greater body mass.

Main takeaways:

  • Gentoo penguins tend to be heavier
  • Species differ in bill size patterns
  • Body mass is related to flipper length and bill length
  • Both plots and statistics suggest meaningful species-level differences