BIOL5404 EDA - Impact of Sex on Mudpuppy Biology in a Creek Environment

Author

Douglas Strick

Published

April 10, 2026

Introduction

Background Information: Mudpuppies 101

The mudpuppy (Necturus maculosus) is a nocturnal, cold-adaptive, fully aquatic salamander (McDaniel et. al., 2009) and is the largest salamander species in Canada (Petranka, 2010). The species has 2 distinct populations within Canada; a western population in the watersheds of Lake Winnipeg and Lake Manitoba in the province of Manitoba and an eastern population in the watersheds of the Great Lakes and the St. Lawrence River, in Ontario and western Quebec (COSEWIC, 2023). Both populations have been declining in recent years, with COSEWIC listing the Manitoba population as Threatened and the Great Lakes/St. Lawrence population as Special Concern in 2023 as a result. Very little is known about the ecology of the species (COSEWIC, 2023), and what little studies have been performed focus on river or lake environments, leaving basic questions unanswered for mudpuppies that make their homes in smaller bodies of water such as creeks or streams.

The mudpuppy is an ecologically and scientifically important species within Canada’s freshwater ecosystems. The species acts as a key predator and/or prey species for many forms of freshwater invertebrates, fish, herps, and mammals, acting as biological control for invasive species like the rusty crayfish (Orconectes rusticus), round goby (Neogobius melanostomus), and zebra/quagga mussel (Dreissena polymorpha) (Beattie et. al., 2017). The species are the only known host of the Endangered salamander mussel (Simpsonaias ambigua) (Morris, 2011). As an aquatic amphibian, mudpuppies are among the more sensitive species to ecosystem changes and are monitored as an indicator species for the overall health of their ecosystems, and the effectiveness of restoration efforts (Sutherland et. al., 2020). The species is also a frequent subject of genetics research due to their cold adaptation and large amounts of DNA per cell.

The purpose of this script is to analyse mudpuppy capture and measurement data collected from a known population in Kemptville Creek in Oxford Mills, Ontario, in order to fill basic knowledge gaps on the the species’ dynamics in creek and stream environments. This data was collected over the course of 3 winter capture periods (2023/24, 2024/25, and 2025/26), creating a different data set for each year. The data will first be cleaned and combined, then analysed and visualized to explore trends such as sexual dimorphism and sex bias within the population, using both total length and body length (otherwise known as snout ventral length).

Research Questions

1: Is there a significant difference in the number of male and female mudpuppies present in Kemptville Creek?

2: Is there a significant difference between the Total Length body length measurement of male and female mudpuppies in Kemptville Creek?

3: Is there a significant difference between the Snout Ventral Length body length measurement of male and female mudpuppies in Kemptville Creek?

Importing, Cleaning, and Combining Data

The first step is to load the necessary packages and libraries for the script being performed:

Code
library(tidyverse)
library(dslabs)
library(dplyr)
library(knitr)

Next we will load in the 3 data sets being used:

Code
Year1<-read_csv("Winter 2023 (Matt).csv")
Year2<-read_csv("Winter 2024 (Matt).csv")
Year3<-read_csv("Winter 2025.csv")

The 3 csv files all contain the same data, but are organized differently. We are going to format them to have the same rows containing the same type of data before we combine them. This will lead to some repetition in the cleaning process, but is simpler and will lead to less errors than combining the files and then attempting to fix different parts of the same column at the same time.

After cleanup, the files should contain data for: Observation ID, Capture Date, PIT tag prefix, PIT tag suffix, Sex, Recap (TRUE or FALSE), Total Length (Head to tail length), and Snout Ventral Length (Body length not including tail)

Cleaning Year 1

We will start by cleaning up Year 1.

Code
kable(head(Year1), format = "html")
obs_id date prefix suffix sex recap tl svl svl.short svl.long vie notes problem ...14 ...15 Field Description
1 2024-01-12 3D6.15344 69AB4 NA FALSE NA NA NA NA NA NA FALSE NA NA obs_id observation ID
2 2024-01-12 3D6.15344 69A93 NA FALSE NA NA NA NA NA NA FALSE NA NA date year-month-day
3 2024-01-23 3D6.15344 69983 F FALSE NA NA NA NA NA NA FALSE NA NA prefix Starting digits of PITT number (all the same)
4 2024-01-23 3D6.15344 6994B F FALSE 190 NA NA NA NA NA FALSE NA NA suffix Last 5 PITT digits (hex format). Note that a small number of individuals were photographed but not PITT tagged. Currently applied conditional formatting highlights PITT tag duplicates, which indicates there are previous captures or subsequent recaptures
5 2024-01-23 3D6.15344 69970 F FALSE 200 130 NA NA NA NA FALSE NA NA sex observed sex. Not recorded for recaptured individuals that were scanned and released without being processed.
6 2024-01-23 3D6.15344 6995D F FALSE 200 145 NA NA NA NA FALSE NA NA recap true/false. Individuals captured earlier in the season released at the end of survey. New individuals or individuals last captured in the previous season were retained for processing (measuring, sexing, photographs)

The SVL measurement is represented by the ‘svl’ column here, so the ‘svl.short’ and ‘svl.long’ columns are unnecessary.

Code
Year1<-Year1 %>% 
  select(-svl.short, -svl.long)

The ‘vie’ column is unnecessary as implanting Visual Implant Elastomer tags (sub-dermal glowing tags for recapture ID) was a procedure not done in any other year as PIT tagging makes it redundant, and thus provides no additional information for our purposes with this script.

Code
Year1<-Year1 %>% 
  select(-vie)

The columns ‘…14’ and ‘…15’ are placeholders and can be removed.

Code
Year1<-Year1 %>% 
  select(-...14, -...15)

The ‘problem’ column can also be removed, as any issues would be denoted in the ‘notes’ column.

Code
Year1<-Year1 %>% 
  select(-problem)

The descriptions for all of the above removed columns can also be removed, as they describe columns that are no longer being used.

Code
Year1 <- Year1 %>%
  mutate(
    Field = ifelse(row_number() %in% c(9, 10, 11, 13), NA, Field),
    Description = ifelse(row_number() %in% c(9, 10, 11, 13), NA, Description)
  )

This data set is now ready to be combined with the other 2, so we will move on to Year 2!

Cleaning Year 2

Code
kable(head(Year2), format = "html")
obs_id date prefix suffix sex recap tl svl svl.short svl.long vie notes problem ...14 ...15 Field Description
52 2024-10-18 3D6.15344 69A8B F FALSE 225 158 NA NA NA NA FALSE NA NA obs_id observation ID
53 2024-10-18 3D6.15344 69AAD M FALSE 201 145 NA NA NA NA FALSE NA NA date year-month-day
54 2024-10-25 3D6.15344 69A59 M FALSE 265 NA 184 192 NA NA FALSE NA NA prefix Starting digits of PITT number (all the same)
55 2024-10-25 3D6.15344 69A89 M FALSE 234 NA 170 178 NA NA FALSE NA NA suffix Last 5 PITT digits (hex format). Note that a small number of individuals were photographed but not PITT tagged. Currently applied conditional formatting highlights PITT tag duplicates, which indicates there are previous captures or subsequent recaptures
56 2024-10-25 3D6.15344 69A57 F FALSE 224 NA 155 161 NA NA FALSE NA NA sex observed sex. Not recorded for recaptured individuals that were scanned and released without being processed.
57 2024-11-01 3D6.15344 69A6F F FALSE 237 NA 160 166 NA NA FALSE NA NA recap true/false. Individuals captured earlier in the season released at the end of survey. New individuals or individuals last captured in the previous season were retained for processing (measuring, sexing, photographs)

Just like Year 1, the ‘vie’, ‘problem’, ‘…14’, and ‘…15’ columns can be removed.

Code
Year2<-Year2 %>% 
  select(-vie, -problem, -...14, -...15)

For SVL measurements, Year 2 records both the length from the tip of the head to the back of the cloaca (referred to as ‘svl’ or ‘svl.long’) and the length from the tip of the head to the front of the cloaca. For our data set we are only worried about the SVL (head to back) measurement. This means we can remove ‘svl.short’, and will need to combine ‘svl’ and ‘svl.long’ into one column, as this measurement is recorded as ‘svl’ for the first 2 individuals and ‘svl.long’ for the rest.

Code
Year2<-Year2 %>% 
  select(-svl.short)
Year2$svl <- ifelse(!is.na(Year2$svl), Year2$svl, Year2$svl.long)

Now that all of the data is in the ‘svl’ column, we can remove the now redundant ‘svl.long’.

Code
Year2<-Year2 %>% 
  select(-svl.long)

We can also remove the ‘Field’ and ‘Description’ columns from this data set entirely, as it will be combined with Year1, which has identical columns.

Code
Year2<-Year2 %>% 
  select(-Field, -Description)

This dataset is now ready to be combined with the others, so we can move on to Year 3!

Cleaning Year 3

Code
kable(head(Year3), format = "html")
obs_id date prefix suffix sex recap tl (mm) svl.long(mm) notes problem ...11 ...12 Field Description
190 2025-10-24 N/A 6994B Not Recorded N 205 150 NA FALSE NA NA obs_id observation ID
191 2025-11-03 N/A B19A2 M N 236 171 NA FALSE NA NA date year-month-day
192 2025-11-03 N/A B19AE F N 248 177 NA FALSE NA NA prefix Starting digits of PITT number (all the same)
193 2025-11-07 N/A 697A6 F Y 235 153 A recap for Matt but new to us FALSE NA NA suffix Last 5 PITT digits (hex format). Note that a small number of individuals were photographed but not PITT tagged. Currently applied conditional formatting highlights PITT tag duplicates, which indicates there are previous captures or subsequent recaptures
194 2025-11-17 N/A 6995F F Y 263 172 A recap for Matt but new to us FALSE NA NA sex observed sex. Not recorded for recaptured individuals that were scanned and released without being processed.
195 2025-11-21 N/A 69A87 F Y 240 175 A recap for Matt but new to us FALSE NA NA recap true/false. Individuals captured earlier in the season released at the end of survey. New individuals or individuals last captured in the previous season were retained for processing (measuring, sexing, photographs)

We can start by once again removing unnecessary columns, in this case ‘…11’, ‘…12’, ‘Field’, and ‘Description’.

Code
Year3<-Year3 %>% 
  select(-...11, -...12, -Field, -Description)

In this data set, the ‘problem’ and ‘notes’ columns are being used to record the same thing, however both are being used in different circumstances. We can therefore combine them in the same way we combined ‘svl’ and ‘svl.long’ for Year 2.

First we will make sure there are no rows with data in both columns. This first requires changing FALSE values to NAs for the problem column.

Code
Year3$problem <- ifelse(Year3$problem == FALSE, NA, Year3$problem)
count <- sum(!is.na(Year3$problem) & !is.na(Year3$notes))
count
[1] 1

There is 1 overlapping row. This row shows “Photo ID: No tag jan 16” in the ‘notes’ column and “too small” in the ‘problems’ column. The “too small” statement in this case is redundant, as we can determine this using the recorded TL measurement. We can therefore combine the 2 columns in a way that prioritizes data from the ‘notes’ column.

Code
Year3$notes <- ifelse(!is.na(Year3$notes), Year3$notes, Year3$problem)

We can now remove the ‘problem’ column as it is redundant.

Code
Year3<-Year3 %>% 
  select(-problem)

We must also change the values in the ‘recap’ column from Y or N to TRUE or FALSE, to match the other 2 data sets.

Code
Year3$recap <- ifelse(Year3$recap == "Y", TRUE, FALSE)

Finally, we will rename 2 of the columns to match Year 1 and Year 2, and change the ‘svl’ column to numeric data.

Code
Year3 <- Year3 %>%
  rename(tl = `tl (mm)`, svl = `svl.long(mm)`)

Year3 <- Year3 %>%
  mutate(svl = as.numeric(svl))

Now let’s make sure that all 3 data sets show the same columns with the same data.

Code
kable(head(Year1), format = "html")
obs_id date prefix suffix sex recap tl svl notes Field Description
1 2024-01-12 3D6.15344 69AB4 NA FALSE NA NA NA obs_id observation ID
2 2024-01-12 3D6.15344 69A93 NA FALSE NA NA NA date year-month-day
3 2024-01-23 3D6.15344 69983 F FALSE NA NA NA prefix Starting digits of PITT number (all the same)
4 2024-01-23 3D6.15344 6994B F FALSE 190 NA NA suffix Last 5 PITT digits (hex format). Note that a small number of individuals were photographed but not PITT tagged. Currently applied conditional formatting highlights PITT tag duplicates, which indicates there are previous captures or subsequent recaptures
5 2024-01-23 3D6.15344 69970 F FALSE 200 130 NA sex observed sex. Not recorded for recaptured individuals that were scanned and released without being processed.
6 2024-01-23 3D6.15344 6995D F FALSE 200 145 NA recap true/false. Individuals captured earlier in the season released at the end of survey. New individuals or individuals last captured in the previous season were retained for processing (measuring, sexing, photographs)
Code
kable(head(Year2), format = "html")
obs_id date prefix suffix sex recap tl svl notes
52 2024-10-18 3D6.15344 69A8B F FALSE 225 158 NA
53 2024-10-18 3D6.15344 69AAD M FALSE 201 145 NA
54 2024-10-25 3D6.15344 69A59 M FALSE 265 192 NA
55 2024-10-25 3D6.15344 69A89 M FALSE 234 178 NA
56 2024-10-25 3D6.15344 69A57 F FALSE 224 161 NA
57 2024-11-01 3D6.15344 69A6F F FALSE 237 166 NA
Code
kable(head(Year3), format = "html")
obs_id date prefix suffix sex recap tl svl notes
190 2025-10-24 N/A 6994B Not Recorded FALSE 205 150 NA
191 2025-11-03 N/A B19A2 M FALSE 236 171 NA
192 2025-11-03 N/A B19AE F FALSE 248 177 NA
193 2025-11-07 N/A 697A6 F TRUE 235 153 A recap for Matt but new to us
194 2025-11-17 N/A 6995F F TRUE 263 172 A recap for Matt but new to us
195 2025-11-21 N/A 69A87 F TRUE 240 175 A recap for Matt but new to us

All 3 data sets now have the same columns with the same data (with Year 1 also having the Field and Description columns) and are ready to be combined.

Combining the Data (With Some Finishing Touches)

First we will combine the data into a new data set.

Code
Mudpuppies <- bind_rows(Year1, Year2, Year3)

Now we will finish cleaning up this new data set.

We can fix the ‘Field’ and ‘Description’ columns by renaming the TL ‘Field’ data point from ’’’ to ‘tl’, and by shifting the ‘notes’ data point up to the rest of the ‘Field’/‘Description’ data. We will also tidy up the descriptions a bit.

Code
Mudpuppies[7, "Field"] <- "tl"

Mudpuppies[9, "Field"] <- "notes"
Mudpuppies[9, "Description"] <- "any problems or other remarks"
Mudpuppies[12, "Field"] <- NA
Mudpuppies[12, "Description"] <- NA

Mudpuppies[8, "Description"] <- "snout vent length (tip of head to back of cloaca) (mm)"

Now let’s check the data columns we will be focusing on for our analysis.

Code
unique(Mudpuppies$recap)
[1] FALSE  TRUE

The ‘recap’ column is properly formatted!

Code
unique(Mudpuppies$sex)
[1] NA             "F"            "M"            "Not Recorded" "unk"         

There are 5 unique values for sex, let’s clean that up to “Female” for female, “Male” for male, and NA for unknown/unrecorded values.

Code
Mudpuppies <- Mudpuppies %>% 
  mutate(
    sex = case_when(
      sex == NA | sex == "unk" | sex == "Not Recorded" ~ NA,
      sex == "M" ~ "Male",
      sex == "F" ~ "Female"
    )
  )
unique(Mudpuppies$sex)
[1] NA       "Female" "Male"  

The data is now ready to be analysed!!!

Analysis

I want to investigate the impact of sex on the population, looking at both male to female ratios and sexual dimorphism.

First we will create a subset of the dataset without recaptures, as we don’t want individuals appearing in the dataset more than once. We will also filter out NA values for sex.

Code
Unique<-Mudpuppies %>% 
  subset(Mudpuppies$recap == "FALSE") %>% 
  filter(!is.na(sex))

Sex Ratios

First let’s examine sex ratios using a binomial test.

Male <- 89 Female <- 175 Total <- Male + Female

Code
Male<-sum(Unique$sex == "Male")
Female<-sum(Unique$sex == "Female")
Total <- Male + Female

print(Male)
[1] 89
Code
print(Female)
[1] 175
Code
print(Total)
[1] 264

There are 89 males to 175 females captured, or ~1.97 female mudpuppies found for every 1 male mudpuppy in Kemptville Creek, out of a total of 264 unique sex-ID’ed captures.

Let’s run the binomial test to check for statistical significance.

Code
ratio_result <- binom.test(x = Male, n = Total, p = 0.5)
print(ratio_result$p.value)
[1] 1.315655e-07

The binomial test returned a p-value of 1.32E-07, which indicates a statistically significant difference in the likelihood of finding male and female mudpuppies. In other words, there is a statistically significant difference in the ratio between male and female mudpuppies captured in Kemptville Creek!

Let’s visualize this ratio.

Code
RatioPlot<-Unique %>%
  ggplot(aes(x=sex, fill = sex))+
  geom_bar()+
  geom_text(stat = 'count', aes(label = after_stat(count)), vjust = 1) +
  labs(title = "Oxford Mills Mudpuppy Sex Ratio (n=264)")+
  theme_classic()+
  theme(legend.position = "none")
RatioPlot

Sexual Dimorphism (Total Length)

Let’s analyse sexual dimorphism using a 2-sample t-test to compare the mean total lengths of males and females. We will be doing this for both Total Length and Snout Ventral Length (including and excluding the tail length), starting by looking at Total Length.

First we will examine the distribution of Total Length data, including the tail.

Code
Unique %>% 
  ggplot(aes(x=tl, colour=sex))+
  geom_histogram()+
  theme_classic()

Both males and females exhibit normal distribution patterns, meaning a t-test can be used for this analysis.

Code
TLTest<-t.test(tl~sex, data=Unique)
print(TLTest)

    Welch Two Sample t-test

data:  tl by sex
t = -3.1724, df = 186.79, p-value = 0.001768
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -18.524447  -4.319154
sample estimates:
mean in group Female   mean in group Male 
            225.2299             236.6517 

The Two Sample t-test returned an average Male length of 236.65mm, an average Female length of 225.23mm, and a p-value of 0.0018. This means that from the mudpuppies we caught over 3 years, males had a significantly longer total length than females on average.

Let’s visualize this! I will be making both a mean and whisker plot and a violin plot, as both are efective displays of this test with pros and cons to each.

Code
TLBoxplot<-Unique %>% 
  ggplot(aes(x=sex, y=tl, fill = sex))+
  geom_boxplot(outlier.shape = NA)+
  stat_boxplot(geom = "errorbar", width = 0.5)+
  labs(x = "Sex", y = "Total Length (mm)", title = "Sexual Dimorphism in Kemptville Creek Mudpuppies (n=264)")+
  scale_x_discrete(labels = c("Female" = "Female (n=175)", "Male" = "Male (n=89)"))+
  geom_jitter(width = 0.2, alpha = 0.4, size = 1.5, color="black")+
  theme_classic()+
  theme(legend.position = "none")
TLBoxplot

Code
TLViolin<-Unique %>% 
  ggplot(aes(x=sex, y=tl, fill = sex))+
  geom_violin()+
  stat_summary(fun = mean, geom = "point", size = 4) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2)+
  labs(x = "Sex", y = "Total Length (mm)", title = "Sexual Dimorphism in Kemptville Creek Mudpuppies (n=264)")+
  scale_x_discrete(labels = c("Female" = "Female (n=175)", "Male" = "Male (n=89)"))+
  theme_classic()+
  theme(legend.position = "none")
TLViolin

Sexual Dimorphism (Snout Ventral Length)

Next we will examine sexual dimorphism based on the Snout Ventral Length body length measurement, not including the tail length. This is a commonly used measurement in amphibian analysis as tails can be partially bitten or torn off, leading to Total Length measurements being inaccurate to what the length of the animal should be.

First we will examine the distribution of Snout Ventral Length.

Code
Unique %>% 
  ggplot(aes(x=svl, colour=sex))+
  geom_histogram()+
  theme_classic()

Like Total Length, both males and females show normal distribution patterns, meaning a t-test can be used for this analysis.

Code
SVLTest<-t.test(svl~sex, data=Unique)
print(SVLTest)

    Welch Two Sample t-test

data:  svl by sex
t = -3.5768, df = 182.35, p-value = 0.0004454
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -14.35519  -4.14829
sample estimates:
mean in group Female   mean in group Male 
            156.8382             166.0899 

The Two Sample t-test yielded an average male body length of 166.09mm, an average female length of 156.84mm, and a p-value of 0.0004. This means that from the mudpuppies we caught over 3 years, males had a significantly longer body length than females on average, matching the trend seen in total length.

Let’s visualize this, in the same ways as we did for Total Length.

Code
SVLBoxplot<-Unique %>% 
  ggplot(aes(x=sex, y=svl, fill = sex))+
  geom_boxplot(outlier.shape = NA)+
  stat_boxplot(geom = "errorbar", width = 0.5)+
  labs(x = "Sex", y = "Snout Ventral Length (mm)", title = "Sexual Dimorphism in Kemptville Creek Mudpuppies (n=264)")+
  scale_x_discrete(labels = c("Female" = "Female (n=175)", "Male" = "Male (n=89)"))+
  geom_jitter(width = 0.2, alpha = 0.4, size = 1.5, color="black")+
  theme_classic()+
  theme(legend.position = "none")
SVLBoxplot

Code
SVLViolin<-Unique %>% 
  ggplot(aes(x=sex, y=svl, fill = sex))+
  geom_violin()+
  stat_summary(fun = mean, geom = "point", size = 4) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2)+
  labs(x = "Sex", y = "Snout Ventral Length (mm)", title = "Sexual Dimorphism in Kemptville Creek Mudpuppies (n=264)")+
  scale_x_discrete(labels = c("Female" = "Female (n=175)", "Male" = "Male (n=89)"))+
  theme_classic()+
  theme(legend.position = "none")
SVLViolin

Conclusion

Our analysis today has shown that across 264 unique mudpuppies captured over a span of 3 winter seasons in Kemptville Creek, there was a statistically significant ~1:1.97 male to female ratio (Research Question 1), and males were on average statistically longer in both snout ventral length (Research Question2) and total length (Research Question 3) measurements.

Citations

Beattie, A., M. Whiles, and P. Willink. 2017. Diets, population structure, and seasonal activity patterns of mudpuppies (Necturus maculosus) in an urban, Great Lakes coastal habitat. Journal of Great Lakes Research 43:132-143.

Committee on the Status of Endangered Wildlife in Canada, issuing body. (2023). COSEWIC assessment and status report on the mudpuppy, Necturus maculosus, Manitoba population, Great Lakes / St. Lawrence population, in Canada. Committee on the Status of Endangered Wildlife in Canada = Comité sur la situation des espèces en péril au Canada. https://publications.gc.ca/site/eng/9.938345/publication.html

McDaniel, T.V., P.A. Martin, G.C. Barrett, K. Hughes, A., Gendron, L. Shirose, and C. Bishop. 2009. Relative abundance, age structure, and body size in mudpuppy populations in southwestern Ontario. Journal of Great Lakes Research 3:182-189.

Morris, T. J. 2011. COSEWIC Status appraisal summary on the Salamander mussel, simpsonaias ambigua in Canada. https://wildlife-species.canada.ca/species-risk-registry/virtual_sara/files/cosewic/salamander_mussel_sse_0911_eng.pdf

Petranka, J.W. 2010. Salamanders of the United States and Canada (2nd ed.). Smithsonian Books, Washington, DC. Petranka, J.W., E.M. Harp, C.T. Holbrook

Sutherland, J., Mifsud, D., Stapleton, M., Spear, S. F., & Greenwald, K. 2020. Environmental DNA Assessment Reveals Restoration Success for Mudpuppies (Necturus maculosus). Herpetologica, 76(4). https://doi.org/10.1655/0018-0831-76.4.366

All photographs were taken by the author (Douglas Strick)

I would like to extend a special thanks to Neal the Seal for his moral support throughout this project.