“The Association Between Bill Size and Body Mass of Penguins: With differing effects depending on the Species, Diet, and Island”
This is my research to explore how bill size of penguins influences body mass in relation to their species, diet, and island. I hypothesize that bill size is positively associated with the body mass of penguins, with varying effects depending on species, diet, and island. The results of the statistical analysis support my hypothesis that bill size is positively associated with body mass, with varying effects depending on species, diet, and island. For example, according to the linear regression model, there is approximately 70.27% of the variability in body mass (Adjusted R-squared = 0.7027). The overall model was highly significant, F-statistic is 208.8, with 39 and 3390 degrees of freedom, and the result was highly significant at p-value < 2.2e-16. This indicates that the combination of predictor variables effectively accounted for a substantial proportion of the variance in body mass.The ANOVA results also confirmed the significance of those variables, including bill size (F = 1752.61, p < 2.2e-16), species (F = 43.50, p < 2.2e-16), and diet (F = 542.27, p < 2.2e-16), as well as interactions such as bill size:species (F = 5.30, p = 0.005) and bill size:diet (F = 9.50, p < 0.001).
# Load the penguins data
penguins <- read.csv("palmerpenguins.csv", header = TRUE)
# Check the data to ensure it was loaded correctly
# View the first and last rows of the dataset
head(penguins) # first 6 rows
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## 1 Adelie Biscoe 53.4 17.8 219 5687
## 2 Adelie Biscoe 49.3 18.1 245 6811
## 3 Adelie Biscoe 55.7 16.6 226 5388
## 4 Adelie Biscoe 38.0 15.6 221 6262
## 5 Adelie Biscoe 60.7 17.9 177 4811
## 6 Adelie Biscoe 35.7 16.8 194 5266
## sex diet life_stage health_metrics year
## 1 female fish adult overweight 2021
## 2 female fish adult overweight 2021
## 3 female fish adult overweight 2021
## 4 female fish adult overweight 2021
## 5 female fish juvenile overweight 2021
## 6 female fish juvenile overweight 2021
tail(penguins) # last 6 rows
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## 3425 Gentoo Biscoe 39.6 21.7 265 7230
## 3426 Gentoo Biscoe 44.0 20.4 252 6447
## 3427 Gentoo Biscoe 54.5 25.2 245 6872
## 3428 Gentoo Biscoe 51.4 20.4 258 7409
## 3429 Gentoo Biscoe 55.9 20.5 247 6491
## 3430 Gentoo Biscoe 43.9 22.9 206 6835
## sex diet life_stage health_metrics year
## 3425 male squid adult overweight 2025
## 3426 male squid adult healthy 2025
## 3427 male squid adult healthy 2025
## 3428 male squid adult overweight 2025
## 3429 male squid adult healthy 2025
## 3430 male squid adult healthy 2025
colnames(penguins) # view column names
## [1] "species" "island" "bill_length_mm"
## [4] "bill_depth_mm" "flipper_length_mm" "body_mass_g"
## [7] "sex" "diet" "life_stage"
## [10] "health_metrics" "year"
# Check the structure of the data
str(penguins)# structure, giving a concise summary of this information, showing data types and first few values of each variable
## 'data.frame': 3430 obs. of 11 variables:
## $ species : chr "Adelie" "Adelie" "Adelie" "Adelie" ...
## $ island : chr "Biscoe" "Biscoe" "Biscoe" "Biscoe" ...
## $ bill_length_mm : num 53.4 49.3 55.7 38 60.7 35.7 61 66.1 61.4 54.9 ...
## $ bill_depth_mm : num 17.8 18.1 16.6 15.6 17.9 16.8 20.8 20.8 19.9 22.3 ...
## $ flipper_length_mm: num 219 245 226 221 177 194 211 246 270 230 ...
## $ body_mass_g : num 5687 6811 5388 6262 4811 ...
## $ sex : chr "female" "female" "female" "female" ...
## $ diet : chr "fish" "fish" "fish" "fish" ...
## $ life_stage : chr "adult" "adult" "adult" "adult" ...
## $ health_metrics : chr "overweight" "overweight" "overweight" "overweight" ...
## $ year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
# Load dataset
penguins <- read.csv("palmerpenguins.csv", header = TRUE)
# Step 1: Check the structure of the data ####
str(penguins) # This will show the structure of the dataset, including variable names and data types
## 'data.frame': 3430 obs. of 11 variables:
## $ species : chr "Adelie" "Adelie" "Adelie" "Adelie" ...
## $ island : chr "Biscoe" "Biscoe" "Biscoe" "Biscoe" ...
## $ bill_length_mm : num 53.4 49.3 55.7 38 60.7 35.7 61 66.1 61.4 54.9 ...
## $ bill_depth_mm : num 17.8 18.1 16.6 15.6 17.9 16.8 20.8 20.8 19.9 22.3 ...
## $ flipper_length_mm: num 219 245 226 221 177 194 211 246 270 230 ...
## $ body_mass_g : num 5687 6811 5388 6262 4811 ...
## $ sex : chr "female" "female" "female" "female" ...
## $ diet : chr "fish" "fish" "fish" "fish" ...
## $ life_stage : chr "adult" "adult" "adult" "adult" ...
## $ health_metrics : chr "overweight" "overweight" "overweight" "overweight" ...
## $ year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
# Step 2: Count the number of variables ####
num_variables <- ncol(penguins) # Count the number of columns (variables)
cat("The dataset has", num_variables, "variables.\n")
## The dataset has 11 variables.
# Step 3: Identify categorical variables
# Categorical variables are stored as characters or factors
categorical.vars <- names(penguins)[sapply(penguins, is.character) | sapply(penguins, is.factor)]
cat("Categorical variables:", categorical.vars, "\n")
## Categorical variables: species island sex diet life_stage health_metrics
# Step 4: Find the levels within each categorical variable
# Use the 'unique()' function to display levels (categories) for each categorical variable
cat("\nLevels within each categorical variable:\n")
##
## Levels within each categorical variable:
cat("---------------------------\n")
## ---------------------------
# Print the variable name
for (var in categorical.vars) {cat("Variable:", var, "\n")
print(unique(penguins[[var]]))
# Show unique levels
cat("---------------------------\n")}
## Variable: species
## [1] "Adelie" "Chinstrap" "Gentoo"
## ---------------------------
## Variable: island
## [1] "Biscoe" "Dream" "Torgensen"
## ---------------------------
## Variable: sex
## [1] "female" "male"
## ---------------------------
## Variable: diet
## [1] "fish" "krill" "parental" "squid"
## ---------------------------
## Variable: life_stage
## [1] "adult" "juvenile" "chick"
## ---------------------------
## Variable: health_metrics
## [1] "overweight" "healthy" "underweight"
## ---------------------------
# Summary statistics for each variable
summary(penguins)
## species island bill_length_mm bill_depth_mm
## Length:3430 Length:3430 Min. :13.60 Min. : 9.10
## Class :character Class :character 1st Qu.:28.90 1st Qu.:16.60
## Mode :character Mode :character Median :34.50 Median :18.40
## Mean :38.53 Mean :18.45
## 3rd Qu.:46.60 3rd Qu.:20.30
## Max. :88.20 Max. :27.90
## flipper_length_mm body_mass_g sex diet
## Min. :140 Min. : 2477 Length:3430 Length:3430
## 1st Qu.:185 1st Qu.: 3844 Class :character Class :character
## Median :203 Median : 4634 Mode :character Mode :character
## Mean :207 Mean : 4835
## 3rd Qu.:226 3rd Qu.: 5622
## Max. :308 Max. :10549
## life_stage health_metrics year
## Length:3430 Length:3430 Min. :2021
## Class :character Class :character 1st Qu.:2022
## Mode :character Mode :character Median :2024
## Mean :2023
## 3rd Qu.:2024
## Max. :2025
# Check the levels of each factor variable
sapply(penguins, function(x) if (is.factor(x)) levels(x)) # lists levels for categorical variables
## $species
## NULL
##
## $island
## NULL
##
## $bill_length_mm
## NULL
##
## $bill_depth_mm
## NULL
##
## $flipper_length_mm
## NULL
##
## $body_mass_g
## NULL
##
## $sex
## NULL
##
## $diet
## NULL
##
## $life_stage
## NULL
##
## $health_metrics
## NULL
##
## $year
## NULL
# Data summaries
# How many observations per group? to check if the number of observations is evenly distributed across different groups in the categorical variables
table(penguins$species) # counts of species
##
## Adelie Chinstrap Gentoo
## 1560 623 1247
table(penguins$island) # counts of islands
##
## Biscoe Dream Torgensen
## 1785 1133 512
table(penguins$sex)
##
## female male
## 1726 1704
table(penguins$diet)
##
## fish krill parental squid
## 958 1419 860 193
table(penguins$life_stage)
##
## adult chick juvenile
## 1029 860 1541
table(penguins$health_metrics)
##
## healthy overweight underweight
## 1550 1167 713
table(penguins$year)
##
## 2021 2022 2023 2024 2025
## 356 658 695 877 844
library(dplyr) # Load the package
## Warning: package 'dplyr' was built under R version 4.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Calculate mean difference between numerical variables across groups
# Select the 4 numerical variables
numerical_data <- penguins %>%
select(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g)
# Calculate mean and standard deviation for each variable
summary_stats <- numerical_data %>%
summarise(
mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
sd_bill_length = sd(bill_length_mm, na.rm = TRUE),
mean_bill_depth = mean(bill_depth_mm, na.rm = TRUE),
sd_bill_depth = sd(bill_depth_mm, na.rm = TRUE),
mean_flipper_length = mean(flipper_length_mm, na.rm = TRUE),
sd_flipper_length = sd(flipper_length_mm, na.rm = TRUE),
mean_body_mass = mean(body_mass_g, na.rm = TRUE),
sd_body_mass = sd(body_mass_g, na.rm = TRUE)
)
# View the calculated means and spreads
print(summary_stats)
## mean_bill_length sd_bill_length mean_bill_depth sd_bill_depth
## 1 38.52983 13.17517 18.44714 2.774428
## mean_flipper_length sd_flipper_length mean_body_mass sd_body_mass
## 1 207.0289 28.94476 4834.71 1311.091
# Check for normal distribution
# Body Mass
m_body_mass <- mean(penguins$body_mass_g, na.rm=TRUE) # Calculate mean
std_body_mass <- sd(penguins$body_mass_g, na.rm=TRUE) # Calculate standard deviation
hist(penguins$body_mass_g,
density=20,
breaks=20,
prob=TRUE,
xlab="Body Mass (g)",
ylim=c(0, 0.0005),
main="Normal Curve over Body Mass Histogram")
curve(dnorm(x, mean=m_body_mass, sd=std_body_mass),
col="red",
lwd=2,
add=TRUE,
yaxt="n")
# Repeat for Bill Length
m_bill_length <- mean(penguins$bill_length_mm, na.rm=TRUE)
std_bill_length <- sd(penguins$bill_length_mm, na.rm=TRUE)
hist(penguins$bill_length_mm,
density=20,
breaks=20,
prob=TRUE,
xlab="Bill Length (mm)",
ylim=c(0, 0.06),
main="Normal Curve over Bill Length Histogram")
curve(dnorm(x, mean=m_bill_length, sd=std_bill_length),
col="red",
lwd=2,
add=TRUE,
yaxt="n")
# Repeat for Bill Depth
m_bill_depth <- mean(penguins$bill_depth_mm, na.rm=TRUE)
std_bill_depth <- sd(penguins$bill_depth_mm, na.rm=TRUE)
hist(penguins$bill_depth_mm,
density=20,
breaks=20,
prob=TRUE,
xlab="Bill Depth (mm)",
ylim=c(0, 0.12),
main="Normal Curve over Bill Depth Histogram")
curve(dnorm(x, mean=m_bill_depth, sd=std_bill_depth),
col="red",
lwd=2,
add=TRUE,
yaxt="n")
# Repeat for Flipper Length
m_flipper_length <- mean(penguins$flipper_length_mm, na.rm=TRUE)
std_flipper_length <- sd(penguins$flipper_length_mm, na.rm=TRUE)
hist(penguins$flipper_length_mm,
density=20,
breaks=20,
prob=TRUE,
xlab="Flipper Length (mm)",
ylim=c(0, 0.04),
main="Normal Curve over Flipper Length Histogram")
curve(dnorm(x, mean=m_flipper_length, sd=std_flipper_length),
col="red",
lwd=2,
add=TRUE,
yaxt="n")
# Quantile-quantile (QQ) plot for normality check
# Body Mass
qqnorm(penguins$body_mass_g, ylab="Standardized Residuals",
xlab="Normal Scores", main="QQ Plot for Body Mass", pch=16)
qqline(penguins$body_mass_g, col="red")
# Bill Length
qqnorm(penguins$bill_length_mm, ylab="Standardized Residuals",
xlab="Normal Scores", main="QQ Plot for Bill Length", pch=16)
qqline(penguins$bill_length_mm, col="red")
# Bill Depth
qqnorm(penguins$bill_depth_mm, ylab="Standardized Residuals",
xlab="Normal Scores", main="QQ Plot for Bill Depth", pch=16)
qqline(penguins$bill_depth_mm, col="red")
# Flipper Length
qqnorm(penguins$flipper_length_mm, ylab="Standardized Residuals",
xlab="Normal Scores", main="QQ Plot for Flipper Length", pch=16)
qqline(penguins$flipper_length_mm, col="red")
# create two simple figures based on bill dimensions and body mass
# Load necessary library
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
citation("ggplot2")
## To cite ggplot2 in publications, please use
##
## H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
## Springer-Verlag New York, 2016.
##
## A BibTeX entry for LaTeX users is
##
## @Book{,
## author = {Hadley Wickham},
## title = {ggplot2: Elegant Graphics for Data Analysis},
## publisher = {Springer-Verlag New York},
## year = {2016},
## isbn = {978-3-319-24277-4},
## url = {https://ggplot2.tidyverse.org},
## }
citation("lme4")
## To cite lme4 in publications use:
##
## Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015).
## Fitting Linear Mixed-Effects Models Using lme4. Journal of
## Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {Fitting Linear Mixed-Effects Models Using {lme4}},
## author = {Douglas Bates and Martin M{\"a}chler and Ben Bolker and Steve Walker},
## journal = {Journal of Statistical Software},
## year = {2015},
## volume = {67},
## number = {1},
## pages = {1--48},
## doi = {10.18637/jss.v067.i01},
## }
# Scatterplot for bill length vs body mass
ggplot(penguins, aes(x = bill_length_mm, y = body_mass_g)) +
geom_point(size = 1, alpha = 0.7, color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red") + # Adds linear regression line
labs(title = "Bill Length vs Body Mass",
x = "Bill Length (mm)",
y = "Body Mass (g)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#Scatterplot for bill depth vs body mass
ggplot(penguins, aes(x = bill_depth_mm, y = body_mass_g)) +
geom_point(size = 1, alpha = 0.7, color = "pink") +
geom_smooth(method = "lm", se = TRUE, color = "orange") + # Adds linear regression line
labs(title = "Bill Depth vs Body Mass",
x = "Bill Depth (mm)",
y = "Body Mass (g)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#Hypothesis: The bill size is positively correlated to the body mass of the penguins with differing effects depending on the species, diet and island.
#Dependent variable(y): Body mass (g)
#Independent variable(x): Bill size (calculated as bill length × bill dept)
#Control variables: Species, Diet, and Island
# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ✔ readr 2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(car)
## Warning: package 'car' was built under R version 4.4.3
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:purrr':
##
## some
##
## The following object is masked from 'package:dplyr':
##
## recode
# Step 1: Create a new variable for bill size (bill length x bill depth)
penguins$bill_size <- penguins$bill_length_mm * penguins$bill_depth_mm
# Step 2: Remove missing values
penguins <- na.omit(penguins)
penguins$species <- as.factor(penguins$species)
penguins$diet <- as.factor(penguins$diet)
penguins$island <- as.factor(penguins$island)
# Step 3: Fit a linear regression model
# Include bill size, species, diet, and island, with interactions
model <- lm(body_mass_g ~ bill_size * species * diet * island, data = penguins)
# View the summary of the model
summary(model)
##
## Call:
## lm(formula = body_mass_g ~ bill_size * species * diet * island,
## data = penguins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2273.2 -445.0 -48.7 410.4 4180.2
##
## Coefficients: (32 not defined because of singularities)
## Estimate Std. Error
## (Intercept) 4.199e+03 2.403e+02
## bill_size 1.557e+00 2.447e-01
## speciesChinstrap -7.282e+02 5.048e+02
## speciesGentoo -2.290e+02 2.840e+02
## dietkrill -1.320e+03 3.288e+02
## dietparental -1.247e+03 3.747e+02
## dietsquid 1.834e+02 7.322e+02
## islandDream 2.232e+02 4.599e+02
## islandTorgensen -2.079e+02 3.557e+02
## bill_size:speciesChinstrap 9.402e-01 4.835e-01
## bill_size:speciesGentoo 6.290e-01 2.730e-01
## bill_size:dietkrill 1.504e+00 4.966e-01
## bill_size:dietparental -8.474e-01 6.404e-01
## bill_size:dietsquid -6.342e-01 8.627e-01
## speciesChinstrap:dietkrill 6.316e+02 5.659e+02
## speciesGentoo:dietkrill 5.931e+02 3.980e+02
## speciesChinstrap:dietparental 2.862e+02 6.791e+02
## speciesGentoo:dietparental 3.528e+02 4.598e+02
## speciesChinstrap:dietsquid 7.892e+02 9.695e+02
## speciesGentoo:dietsquid -3.529e+02 8.417e+02
## bill_size:islandDream -3.528e-01 4.612e-01
## bill_size:islandTorgensen 1.459e-01 3.661e-01
## speciesChinstrap:islandDream NA NA
## speciesGentoo:islandDream NA NA
## speciesChinstrap:islandTorgensen NA NA
## speciesGentoo:islandTorgensen NA NA
## dietkrill:islandDream 4.807e+02 5.395e+02
## dietparental:islandDream 7.796e+01 6.263e+02
## dietsquid:islandDream -1.156e+03 1.042e+03
## dietkrill:islandTorgensen 8.048e+02 4.788e+02
## dietparental:islandTorgensen 1.292e+02 5.402e+02
## dietsquid:islandTorgensen -7.821e+02 9.702e+02
## bill_size:speciesChinstrap:dietkrill -4.918e-01 6.776e-01
## bill_size:speciesGentoo:dietkrill -8.465e-01 5.707e-01
## bill_size:speciesChinstrap:dietparental 1.674e-01 9.925e-01
## bill_size:speciesGentoo:dietparental -1.102e-01 7.313e-01
## bill_size:speciesChinstrap:dietsquid -7.323e-01 1.130e+00
## bill_size:speciesGentoo:dietsquid 8.255e-01 9.612e-01
## bill_size:speciesChinstrap:islandDream NA NA
## bill_size:speciesGentoo:islandDream NA NA
## bill_size:speciesChinstrap:islandTorgensen NA NA
## bill_size:speciesGentoo:islandTorgensen NA NA
## bill_size:dietkrill:islandDream -1.168e+00 7.153e-01
## bill_size:dietparental:islandDream -2.074e-01 9.747e-01
## bill_size:dietsquid:islandDream 1.562e+00 1.258e+00
## bill_size:dietkrill:islandTorgensen -1.363e+00 7.130e-01
## bill_size:dietparental:islandTorgensen 1.613e-02 8.937e-01
## bill_size:dietsquid:islandTorgensen 1.219e+00 1.179e+00
## speciesChinstrap:dietkrill:islandDream NA NA
## speciesGentoo:dietkrill:islandDream NA NA
## speciesChinstrap:dietparental:islandDream NA NA
## speciesGentoo:dietparental:islandDream NA NA
## speciesChinstrap:dietsquid:islandDream NA NA
## speciesGentoo:dietsquid:islandDream NA NA
## speciesChinstrap:dietkrill:islandTorgensen NA NA
## speciesGentoo:dietkrill:islandTorgensen NA NA
## speciesChinstrap:dietparental:islandTorgensen NA NA
## speciesGentoo:dietparental:islandTorgensen NA NA
## speciesChinstrap:dietsquid:islandTorgensen NA NA
## speciesGentoo:dietsquid:islandTorgensen NA NA
## bill_size:speciesChinstrap:dietkrill:islandDream NA NA
## bill_size:speciesGentoo:dietkrill:islandDream NA NA
## bill_size:speciesChinstrap:dietparental:islandDream NA NA
## bill_size:speciesGentoo:dietparental:islandDream NA NA
## bill_size:speciesChinstrap:dietsquid:islandDream NA NA
## bill_size:speciesGentoo:dietsquid:islandDream NA NA
## bill_size:speciesChinstrap:dietkrill:islandTorgensen NA NA
## bill_size:speciesGentoo:dietkrill:islandTorgensen NA NA
## bill_size:speciesChinstrap:dietparental:islandTorgensen NA NA
## bill_size:speciesGentoo:dietparental:islandTorgensen NA NA
## bill_size:speciesChinstrap:dietsquid:islandTorgensen NA NA
## bill_size:speciesGentoo:dietsquid:islandTorgensen NA NA
## t value Pr(>|t|)
## (Intercept) 17.473 < 2e-16 ***
## bill_size 6.365 2.22e-10 ***
## speciesChinstrap -1.443 0.149206
## speciesGentoo -0.807 0.419950
## dietkrill -4.013 6.12e-05 ***
## dietparental -3.328 0.000883 ***
## dietsquid 0.250 0.802234
## islandDream 0.485 0.627373
## islandTorgensen -0.584 0.558942
## bill_size:speciesChinstrap 1.945 0.051888 .
## bill_size:speciesGentoo 2.304 0.021274 *
## bill_size:dietkrill 3.028 0.002480 **
## bill_size:dietparental -1.323 0.185800
## bill_size:dietsquid -0.735 0.462315
## speciesChinstrap:dietkrill 1.116 0.264466
## speciesGentoo:dietkrill 1.490 0.136198
## speciesChinstrap:dietparental 0.421 0.673511
## speciesGentoo:dietparental 0.767 0.442952
## speciesChinstrap:dietsquid 0.814 0.415674
## speciesGentoo:dietsquid -0.419 0.675077
## bill_size:islandDream -0.765 0.444416
## bill_size:islandTorgensen 0.399 0.690168
## speciesChinstrap:islandDream NA NA
## speciesGentoo:islandDream NA NA
## speciesChinstrap:islandTorgensen NA NA
## speciesGentoo:islandTorgensen NA NA
## dietkrill:islandDream 0.891 0.373004
## dietparental:islandDream 0.124 0.900941
## dietsquid:islandDream -1.109 0.267479
## dietkrill:islandTorgensen 1.681 0.092871 .
## dietparental:islandTorgensen 0.239 0.810911
## dietsquid:islandTorgensen -0.806 0.420263
## bill_size:speciesChinstrap:dietkrill -0.726 0.467970
## bill_size:speciesGentoo:dietkrill -1.483 0.138108
## bill_size:speciesChinstrap:dietparental 0.169 0.866040
## bill_size:speciesGentoo:dietparental -0.151 0.880187
## bill_size:speciesChinstrap:dietsquid -0.648 0.517006
## bill_size:speciesGentoo:dietsquid 0.859 0.390527
## bill_size:speciesChinstrap:islandDream NA NA
## bill_size:speciesGentoo:islandDream NA NA
## bill_size:speciesChinstrap:islandTorgensen NA NA
## bill_size:speciesGentoo:islandTorgensen NA NA
## bill_size:dietkrill:islandDream -1.633 0.102492
## bill_size:dietparental:islandDream -0.213 0.831537
## bill_size:dietsquid:islandDream 1.242 0.214261
## bill_size:dietkrill:islandTorgensen -1.912 0.055972 .
## bill_size:dietparental:islandTorgensen 0.018 0.985604
## bill_size:dietsquid:islandTorgensen 1.034 0.301170
## speciesChinstrap:dietkrill:islandDream NA NA
## speciesGentoo:dietkrill:islandDream NA NA
## speciesChinstrap:dietparental:islandDream NA NA
## speciesGentoo:dietparental:islandDream NA NA
## speciesChinstrap:dietsquid:islandDream NA NA
## speciesGentoo:dietsquid:islandDream NA NA
## speciesChinstrap:dietkrill:islandTorgensen NA NA
## speciesGentoo:dietkrill:islandTorgensen NA NA
## speciesChinstrap:dietparental:islandTorgensen NA NA
## speciesGentoo:dietparental:islandTorgensen NA NA
## speciesChinstrap:dietsquid:islandTorgensen NA NA
## speciesGentoo:dietsquid:islandTorgensen NA NA
## bill_size:speciesChinstrap:dietkrill:islandDream NA NA
## bill_size:speciesGentoo:dietkrill:islandDream NA NA
## bill_size:speciesChinstrap:dietparental:islandDream NA NA
## bill_size:speciesGentoo:dietparental:islandDream NA NA
## bill_size:speciesChinstrap:dietsquid:islandDream NA NA
## bill_size:speciesGentoo:dietsquid:islandDream NA NA
## bill_size:speciesChinstrap:dietkrill:islandTorgensen NA NA
## bill_size:speciesGentoo:dietkrill:islandTorgensen NA NA
## bill_size:speciesChinstrap:dietparental:islandTorgensen NA NA
## bill_size:speciesGentoo:dietparental:islandTorgensen NA NA
## bill_size:speciesChinstrap:dietsquid:islandTorgensen NA NA
## bill_size:speciesGentoo:dietsquid:islandTorgensen NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 714.9 on 3390 degrees of freedom
## Multiple R-squared: 0.7061, Adjusted R-squared: 0.7027
## F-statistic: 208.8 on 39 and 3390 DF, p-value: < 2.2e-16
# Step 5: Diagnostic plots for regression
par(mfrow = c(2, 2)) # Split plot into 2x2 grid
plot(model) # Check residuals, normality, homoscedasticity
# Step 6: Visualize the relationship
# Scatter plot of bill size vs body mass, colored by species and diet, faceted by island
ggplot(penguins, aes(x = bill_size, y = body_mass_g, color = species, shape = diet)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~island) + # Separate plots by island
labs(title = "Relationship Between Bill Size and Body Mass Across Islands",
x = "Bill Size (Bill Length x Bill Depth)",
y = "Body Mass (g)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
library(dplyr)
# Check the relationship between species, bill length, bill depth, and body mass
library(dplyr)
# Check the relationship between species, bill length, bill depth, and body mass
penguins %>%
group_by(species) %>%
summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
mean_bill_depth = mean(bill_depth_mm, na.rm = TRUE),
mean_body_mass = mean(body_mass_g, na.rm = TRUE))
## # A tibble: 3 × 4
## species mean_bill_length mean_bill_depth mean_body_mass
## <fct> <dbl> <dbl> <dbl>
## 1 Adelie 35.4 17.5 4445.
## 2 Chinstrap 35.0 18.4 4603.
## 3 Gentoo 44.2 19.7 5438.
# Step 7: Hypothesis testing with the model
library(car)
Anova(model)
## Note: model has aliased coefficients
## sums of squares computed by model comparison
## Anova Table (Type II tests)
##
## Response: body_mass_g
## Sum Sq Df F value Pr(>F)
## bill_size 384645096 1 752.6134 < 2.2e-16 ***
## species 44468270 2 43.5043 < 2.2e-16 ***
## diet 831423576 3 542.2666 < 2.2e-16 ***
## island 508055 2 0.4970 0.608372
## bill_size:species 5414758 2 5.2974 0.005046 **
## bill_size:diet 14559888 3 9.4962 3.032e-06 ***
## species:diet 3416891 6 1.1143 0.351161
## bill_size:island 2166814 2 2.1198 0.120210
## species:island 0
## diet:island 1425614 6 0.4649 0.834715
## bill_size:species:diet 2211472 6 0.7212 0.632536
## bill_size:species:island 0
## bill_size:diet:island 3766734 6 1.2284 0.288279
## species:diet:island 0
## bill_size:species:diet:island 0
## Residuals 1732558656 3390
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1