Examining Cyprinodon variegatus pupfish and their relationship to the Latitudinal Diversity Gradient

Note: A group of students and I worked on collecting the data and conducting the statistical analysis for this project in IB177 - a class on ichthyology taught by Chris Martin. I want to use this final project as an extension of the study - telling the story of why and how we collected the data, and sharing our results.

“Cyprinodon” is a genus of pupfish found across North, Central, and South America. Some species in the genus are highly threatened or even extinct, while others occur across wide ranges and are of least concern. They are small fish - the larger individuals occuring at around 4 inches. They are diverse in their range, habitats, and behaviors - in fact 2 species are excellent examples of ecological novelty in their feeding mechanisms. One species has developed a scale-eating mechanism and another has developed a snail-eating mechanism.

The Latitudinal Diversity Gradient is a theory that describes the decrease in species diversity as one moves away from the equator. There are many unproven hypotheses regarding the LDG. One hypothesis asserts that there is a relationship between the increased net primary productivity at the equator, thus there is an increased species diversity. Another suggests that species near the equator have an older evolutionary history.

Due to their wide latitudinal range, “Cyprinodon” are the ideal system to use to look at Latitudinal Diversity Gradient.

Before we dive deeper, let’s take a broader look at the genus Cyprinodon using the package “rfishbase” to access the FishBase database.

Some species in the genus Cyprinodon

I. FishBase Database

Here, I access the FishBase database and filter the genus by my target genus, Cyprinodon.

Cyprinodon_FB <- fb_tbl("species") %>%
  filter(Genus == "Cyprinodon")
Cyprinodon_FB

To understand their sizes by species, we can create a plot. It looks like the largest species is C. maya at 10 cm TL, while the smallest is C. diabolis at 3 cm.

Cyprinodon_FB %>%
  filter(LTypeMaxM=="TL") %>%
  ggplot(aes(x=Length,y=Species,color=Length)) + 
  ggtitle("Size by Cyprinodon Species") +
  xlab("Length in cm") +
  ylab("Species") +
  geom_point()

It looks like the largest species is C. maya at 10 cm TL, while the smallest is C. diabolis at 3 cm.

Let’s look at the date’s in which they were scientifically described. This may tell us about the amount of taxonomic focus on this genus. First, we must extract the 4 number year from the column “Author” to be able to plot the years of description.

Cyprinodon_FB %>%
  select(Genus, Species, Author) %>%
  mutate(description_year = parse_number(Author)) %>%
  ggplot(aes(x=description_year, color=6)) + 
  theme(legend.position = "none") +
  ggtitle("Count of Cyprinodon Species Described by Year") +
  xlab("Taxonomic Description Year") +
  ylab("Count of Species Described") +
  geom_bar()

## Warning: 7 parsing failures.
## row col expected                                      actual
##   8  -- a number Lozano-Vilano & Contreras-Balderas, 1999   
##  12  -- a number Lozano-Vilano & Contreras-Balderas, 1993   
##  23  -- a number Lozano-Vilano & Contreras-Balderas, 1993   
##  24  -- a number De la Maza-Benignos & Vela-Valladares, 2009
##  28  -- a number Lozano-Vilano & Contreras-Balderas, 1993   
## ... ... ........ ...........................................
## See problems(...) for more details.

## Warning: Removed 7 rows containing non-finite values (stat_count).

We can see that the first species in the genus were described in the early 1800s. However, there has been strong taxonomic work in the genus from 1975-2000s. This is particularly interesting because one of my current Professors (Chris Martin) has been active in the taxonomic research in this genus.

II. Loading in and Understanding the Layout of the Morphological Data

Cyprinodon variegatus, the focus of our study

In our study, we wanted to understand the inter-specific diversity within one specific species called Cyprinodon variegatus. intraspecific diversity is the diversity within a single species. The way we decided to measure this was to look at several populations of C. variegatus across a wide latitudinal range: from as far north as Massachusetts and as far south as Venezuela. We measured morphological traits (physical traits like eye diameter, jaw length, total length, body depth, and more) and measured the variance in these traits to see how the diversity between populations at different latitudes compared.

As a first step, let’s load in the population metadata and the morphological data for the C. variegatus I measured in IB177.

population_metadata <- read.csv("population_metadata.csv")
population_metadata

morphological_data <- read.csv("morphological_data.csv")
morphological_data

Population metadata has the species name, the population number, the population location, location notes, dates in which the samples were collected, the number of individuals in the collection, and the lat/long coordinates of the population.

The morphological data has the unique ID, individual number, population number, collection date, and a whole host of body measurements for 100+ individuals of C. variegatus.

Let’s plot the locations where the study populations were collected using the sf package and ggplot.

world <- ne_countries(scale = "medium", returnclass = "sf")

sites <- st_as_sf(population_metadata, coords = c("Long","Lat"), 
    crs = 4326, agr = "constant")

ggplot(data = world) +
    geom_sf() +
    geom_sf(data = sites, size = 2, shape = 21, fill = "red") + 
  coord_sf(xlim = c(-90, -60), ylim = c(8, 45), expand = FALSE) +
  ggtitle("Cyprinodon variegatus Population Sites") +
  xlab("Longitude") +
  ylab("Latitude")

III. Conducting the Analysis of Variance for Each Morphological Trait

The question we hope to answer is: In C. variegatus, does intraspecific variation change at different latitudes? In order to answer this question, we must conduct a statistical analysis of variance across each measured morphological trait.

Let’s dive into it.

The first step is to filter our morphological data csv by population and save them each into their own unique object. I created a function called “filter_by_pop” to do this.

filter_by_pop <- function(pop_num){
                result <- morphological_data %>% filter(Population_Number==pop_num)
                result
}

filter_by_pop(1)

There are 14 populations. Let’s do this for all 14.

pop_1 <- filter_by_pop(1)
pop_2 <- filter_by_pop(2)
pop_3 <- filter_by_pop(3)
pop_4 <- filter_by_pop(4)
pop_5 <- filter_by_pop(5)
pop_6 <- filter_by_pop(6)
pop_7 <- filter_by_pop(7)
pop_8 <- filter_by_pop(8)
pop_9 <- filter_by_pop(9)
pop_10 <- filter_by_pop(10)
pop_11 <- filter_by_pop(11)
pop_12 <- filter_by_pop(12)
pop_13 <- filter_by_pop(13)
pop_14 <- filter_by_pop(14)

Example - Orbital Diameter:

The example trait we will examine by population is orbital diameter. In order to calculate variance, we want to conduct a generalized linear model to first size correct each morphological trait. Size correcting is essentially removing the bias of size in our variance analysis.

Let’s first create a function to get the size-corrected residuals from our generalized linear model. We will use the morphological trait “Body_Depth” to size correct.

size_corrected_residual <- function(morph_trait,filtered_pop){
                intermediate <- glm(log(morph_trait) ~ log(Body_Depth), family = gaussian, data = filtered_pop)
                result <- intermediate$residuals
}

size_corrected_residual(pop_1$Orbital_Diameter,pop_1)

Let’s do this for all 15 populations.

res_orbitaldiameter_pop1 <- size_corrected_residual(pop_1$Orbital_Diameter,pop_1)
res_orbitaldiameter_pop2 <- size_corrected_residual(pop_2$Orbital_Diameter,pop_2)
res_orbitaldiameter_pop3 <- size_corrected_residual(pop_3$Orbital_Diameter,pop_3)
res_orbitaldiameter_pop4 <- size_corrected_residual(pop_4$Orbital_Diameter,pop_4)
res_orbitaldiameter_pop5 <- size_corrected_residual(pop_5$Orbital_Diameter,pop_5)
res_orbitaldiameter_pop6 <- size_corrected_residual(pop_6$Orbital_Diameter,pop_6)
res_orbitaldiameter_pop7 <- size_corrected_residual(pop_7$Orbital_Diameter,pop_7)
res_orbitaldiameter_pop8 <- size_corrected_residual(pop_8$Orbital_Diameter,pop_8)
res_orbitaldiameter_pop9 <- size_corrected_residual(pop_9$Orbital_Diameter,pop_9)
res_orbitaldiameter_pop10 <- size_corrected_residual(pop_10$Orbital_Diameter,pop_10)
res_orbitaldiameter_pop11 <- size_corrected_residual(pop_11$Orbital_Diameter,pop_11)
res_orbitaldiameter_pop12 <- size_corrected_residual(pop_12$Orbital_Diameter,pop_12)
res_orbitaldiameter_pop13 <- size_corrected_residual(pop_13$Orbital_Diameter,pop_13)
res_orbitaldiameter_pop14 <- size_corrected_residual(pop_14$Orbital_Diameter,pop_14)

Now, we will calculate the variances from the residuals calculated in the previous step. We omitted population 9 in this step due to a presence of an outlier.

var_orbitaldiameter_pop1 = var(res_orbitaldiameter_pop1)
var_orbitaldiameter_pop2 = var(res_orbitaldiameter_pop2)
var_orbitaldiameter_pop3 = var(res_orbitaldiameter_pop3)
var_orbitaldiameter_pop4 = var(res_orbitaldiameter_pop4)
var_orbitaldiameter_pop5 = var(res_orbitaldiameter_pop5)
var_orbitaldiameter_pop6 = var(res_orbitaldiameter_pop6)
var_orbitaldiameter_pop7 = var(res_orbitaldiameter_pop7)
var_orbitaldiameter_pop8 = var(res_orbitaldiameter_pop8)
var_orbitaldiameter_pop9 = NA
var_orbitaldiameter_pop10 = var(res_orbitaldiameter_pop10)
var_orbitaldiameter_pop11 = var(res_orbitaldiameter_pop11)
var_orbitaldiameter_pop12 = var(res_orbitaldiameter_pop12)
var_orbitaldiameter_pop13 = var(res_orbitaldiameter_pop13)
var_orbitaldiameter_pop14 = var(res_orbitaldiameter_pop14)

Next, let’s create a table with variances in orbital diameter by population.

var_orbitaldiameter <- t(data.frame((mget(c("var_orbitaldiameter_pop1","var_orbitaldiameter_pop2", "var_orbitaldiameter_pop3", "var_orbitaldiameter_pop4", "var_orbitaldiameter_pop5", "var_orbitaldiameter_pop6", "var_orbitaldiameter_pop7", "var_orbitaldiameter_pop8", "var_orbitaldiameter_pop9", "var_orbitaldiameter_pop10", "var_orbitaldiameter_pop11", "var_orbitaldiameter_pop12", "var_orbitaldiameter_pop13", "var_orbitaldiameter_pop14")))))

var_orbitaldiameter <- as_tibble(var_orbitaldiameter) %>% 
  mutate(Population.Number = row_number()) %>% 
  rename(variance = 1)

## Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
## Using compatibility `.name_repair`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

We’re almost there! Now, we need to add the lat and long coordinates to the orbital diameter variance table, so we can plot variance by latitude.

var_orbitaldiameter <- merge(var_orbitaldiameter, population_metadata, by="Population.Number") %>% 
  select("Population.Number", "variance", "Lat")

Finally, let’s plot variance by orbital diameter.

orbitaldiameterplot1  <- var_orbitaldiameter %>% 
  ggplot(aes(x=Lat,y=variance)) +
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE) +
  geom_point(aes(color=as.factor(Population.Number))) + 
  ggtitle("Variance in Orbital Diamater by Population and Latitude") +
  xlab("Latitude") +
  ylab("Variance") +
  scale_colour_discrete(name="Population Number")
orbitaldiameterplot1

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 1 rows containing non-finite values (stat_smooth).

## Warning: Removed 1 rows containing missing values (geom_point).

We have a linear regression line on the plot where we’ve analyzed variance in orbital diameter by population by latitude. Let’s see if the slope of the line points to a signficant difference in variance in intraspecific variation in orbital diameter by population by latitude.

summary(lm(formula = variance ~ Lat, data = var_orbitaldiameter))

## 
## Call:
## lm(formula = variance ~ Lat, data = var_orbitaldiameter)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.011097 -0.004803 -0.002066  0.004837  0.010822 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.0167784  0.0066595   2.519   0.0285 *
## Lat         -0.0002200  0.0002035  -1.081   0.3028  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007089 on 11 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.09604,    Adjusted R-squared:  0.01386 
## F-statistic: 1.169 on 1 and 11 DF,  p-value: 0.3028

As the p-value is greater than .05, we can not reject the null hypothesis in this case. Therefore, there is no indication that intraspecific diversity significantly changes based on latitude in the case of orbital diameter.

In interest of time and scope of this ESPM 157 project, I won’t complete the same analysis for the other variables like head height, lower jaw length, caudal peduncle height, and more. However, I can share the results from the analysis. We found little to no significance in the change in variance over latitude for the other variables, similar to orbital diameter.

Conclusion:

What do the results of the study mean?

Essentially, we found no correlation in intraspecific variation across latitudes. This could be the case for a host of reasons. For one, we didn’t measure many populations - this study was conducted simply by measuring around 150 individuals. This could impact the statistical significance of the generalized linear model results.

Similarly, we had populations where there were fewer than 5 or 6 individuals. This complicated our ability to test for statistical significance, so in a future study, it would make sense to only include populations where more than 10 individuals were measured for these morphological traits.

There are a few directions we can take this study going forward. For one, we could measure more populations in the species C. variegatus to see if more data would lead to increased significance. Another approach could be measuring genetic diversity, so taking tissue samples and sequencing DNA to see if population diversity is more visible.

Acknowledgements:

I wanted to say thank you to my team members Saron, Elise, and Fernando for making this project a lot of fun.

I also wanted to thank Dr. Martin and Jackie from IB177 for motivating students to be scientists and guiding us through the entire project.

I also want to thank Professor Boettiger and Maggie. Both of you were instrumental in teaching me about scientific communication via R.

P.S. I noticed that Professor Boettiger was behind a lot of the R-functionality for FishBase - a tool I’ve been using on and off for many years as an aquarium hobbyist. That was the inspiration behind this project! Hope to stay in touch.

Citations:

Allen, Andrew P., and James F. Gillooly. “Assessing latitudinal gradients in speciation rates and biodiversity at the global scale.” Ecology letters 9.8 (2006): 947-954.

Gaston, Kevin J. “Global patterns in biodiversity.” Nature 405.6783 (2000): 220-227.

Hanly, Patrick J., Gary G. Mittelbach, and Douglas W. Schemske. “Speciation and the latitudinal diversity gradient: Insights from the global distribution of endemic fish.” The American Naturalist 189.6 (2017): 604-615.

Hillebrand, Helmut. “On the generality of the latitudinal diversity gradient.” The American Naturalist 163.2 (2004): 192-211.

Lawrence, Elizabeth R., and Dylan J. Fraser. “Latitudinal biodiversity gradients at three levels: linking species richness, population richness and genetic diversity.” Global Ecology and Biogeography 29.5 (2020): 770-788.

Martin, Christopher H. “The cryptic origins of evolutionary novelty: 1000‐fold faster trophic diversification rates without increased ecological opportunity or hybrid swarm - Supplement.” Evolution 70.11 (2016): 2504-2519.

Martin, Christopher H. “The cryptic origins of evolutionary novelty: 1000‐fold faster trophic diversification rates without increased ecological opportunity or hybrid swarm.” Evolution 70.11 (2016): 2504-2519.

Miller, Elizabeth Christina, and Cristian Román‐Palacios. “Evolutionary time best explains the latitudinal diversity gradient of living freshwater fish diversity.” Global Ecology and Biogeography 30.3 (2021): 749-763.

Miller, Elizabeth Christina, et al. “Explaining the ocean’s richest biodiversity hotspot and global patterns of fish diversity.” Proceedings of the Royal Society B 285.1888 (2018): 20181314.

Rohde, Klaus. “Latitudinal gradients in species diversity: the search for the primary cause.” Oikos (1992): 514-527.