Question 1: (16% - 6 marks)

Paradise tree snakes (Chrysopelea paradisi) leap into the air from trees, and by generating lift they glide downward and away rather than plummet. An airborn snake flattens its body everywhere except for the heart region. It forms a horizontal “S” shape and undulates from side to side. By orienting the head and anterior part of the body, a snake can change direction, reach a preferred landing site, and even chase aerial prey.

To better understand lift and stability of glides, Socha (2002, Nature 418: 603-604) videotaped eight snakes leaping from a 10 metre tower. One measurement taken was the rate of side-to-side undulation. Undulation rates of the eight snakes, measured in Hertz (cycles per second), are in the file Snakes.csv in the Assignment1-Project folder.

Based on previous studies, the researchers were interested in whether the mean undulation rate differed significantly from the theoretical rate of 1 cycle per second (which was based on the musculature of the organism).

Present a histogram of the data.
What is the mean and standard deviation of undulation rate?
Calculate the standard error of the mean undulation rate.
State the null hypothesis (H₀) that the researchers wish to test, and make your inference regarding this hypothesis using the information calculated in the previous steps. What is the biological interpretation of your analysis?

Insert your work below the asterisks

Q1a: Histogram

Figure 1: Histogram of Distribution of Undulation Rates in Gliding Chrysopelea paradisi”

hist(df$undulationRate, main = "Histogram of Snake Undulation Rates", xlab = "Undulation Rate (Hz)",
    col = "blue", border = "black")

Q1b: Mean and Standard Deviation of Undulation Rate

mean(df$undulationRate)  # Mean undulation rate

## [1] 1.375

sd(df$undulationRate)  # Standard deviation

## [1] 0.324037

Q1c Standard Error

# Sample size
n <- length(df$undulationRate)

# Standard deviation
s <- sd(df$undulationRate)

# Standard error of the mean
sem <- s/sqrt(n)
sem

## [1] 0.1145644

Q1d

t.test(df$undulationRate, mu = 1)

## 
##  One Sample t-test
## 
## data:  df$undulationRate
## t = 3.2733, df = 7, p-value = 0.01361
## alternative hypothesis: true mean is not equal to 1
## 95 percent confidence interval:
##  1.104098 1.645902
## sample estimates:
## mean of x 
##     1.375

The null hypothesis (H₀) is that the true mean undulation rate of the snakes is equal to 1 Hz. A one-sample t-test yields a t-statistic of 3.27, with a p-value of 0.0136.

The snakes’ undulation rate was significantly higher than the theoretical 1 Hz, suggesting they actively increase their body movement during gliding. This likely enhances lift, stability, and directional control in flight, supporting the idea that undulation is an adaptive mechanism rather than purely passive.

Question 2: (42% - 15 marks)

Does the light environment have an influence on the development of color vision? A study measured the relative abilities of bluefin killifish from two wild populations to detect short wavelengths of light (blue light in our own visible colour spectrum). One population was from a swamp, whose muddy water filters out blue wavelengths, whereas the other population was from a clear-water spring.

Fish were crossed and raised in the lab under two light conditions simulating those in the wild: clear and muddy. Sensitivity to blue light was measured as the relative expression of the SWS1 opsin gene in the eyes of the fish (where SWS indicates shortwave sensitive opsin proteins).

The data are taken from a single individual in each of 33 families raised in a common lab environment. The data are available in the file OpsinExpression.csv. “Population” differences are likely to be genetically based, whereas differences between fish under different water clarity conditions are environmentally induced.

How many factors are included in the experiment? Identify them.
What type of experimental design was used?
Draw a plot showing the relationships in the data.
Provide an “in words” description of the full linear model you would fit to the data.
Carry out the appropriate analysis and summarise the results in a Table.
Investigate whether the assumptions of the model you fitted are met. Explain your interpretations and take action if the assumptions are not met.
Explain, in words, the results of the analysis. Explain whether the genetic and environmentally induced effects on SWS1 opsin expression appear to be in the same direction.

Insert your work below the asterisks

Q2(a)

There are two factors: population and light environment.

Q2(b)

The design of the experiment is a manipulative experiment because it involves actively manipulating the light environment in the laboratory (clear vs. muddy) under controlled conditions. It is also designed as a factoral experiment, which is purposefully designed so that every level of one factor is present with every level of the other factor(s). In this scenario, there are two key factors being considered:

Origin: Fish from two different wild populations (swamp and clear-water spring). This can be considered a categorical grouping factor.
Lab Light Environment: Two manipulated light conditions in the lab (clear and muddy). This is a manipulated categorical factor.

Q2(c)

## Rows: 33
## Columns: 3
## $ population                  <chr> "Spring", "Spring", "Spring", "Spring", "S…
## $ water_clarity               <chr> "Clear", "Clear", "Clear", "Clear", "Clear…
## $ relative_expression_of_sws1 <dbl> 0.16, 0.11, 0.12, 0.11, 0.08, 0.09, 0.14, …

ggplot(opsin, aes(x = population, y = relative_expression_of_sws1, fill = water_clarity)) +
    geom_boxplot(position = position_dodge(0.8)) + geom_jitter(position = position_jitterdodge(jitter.width = 0.1,
    dodge.width = 0.8), shape = 21, alpha = 0.6, color = "black") + labs(title = "SWS1 Opsin Expression by Population and Water Clarity",
    x = "Population (Genetic Background)", y = "Relative Expression of SWS1 Opsin",
    fill = "Water Clarity") + theme_minimal()

SWS1 opsin expression by population and water clarity

Q2(d)

The full linear model describes the relative expression of the SWS1 opsin gene as a function of population origin (swamp or spring), the laboratory light environment (clear or muddy), and the interaction between these two factors, reflecting the factorial design of the experiment. It could be written as:

SWS1 Expression = Intercept + Population + Water Clarity + (Population × Water Clarity) + Error

Q2(e)

model <- lm(relative_expression_of_sws1 ~ population * water_clarity, data = opsin)
summary(model)

## 
## Call:
## lm(formula = relative_expression_of_sws1 ~ population * water_clarity, 
##     data = opsin)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.04250 -0.02250 -0.00300  0.01875  0.03875 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         1.212e-01  9.653e-03  12.561 2.96e-13 ***
## populationSwamp                    -2.875e-02  1.365e-02  -2.106   0.0440 *  
## water_clarityMuddy                 -2.825e-02  1.295e-02  -2.181   0.0374 *  
## populationSwamp:water_clarityMuddy  3.571e-05  1.917e-02   0.002   0.9985    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0273 on 29 degrees of freedom
## Multiple R-squared:  0.3597, Adjusted R-squared:  0.2935 
## F-statistic:  5.43 on 3 and 29 DF,  p-value: 0.004343

A linear model was fitted to assess the effects of population, water clarity, and their interaction on SWS1 opsin gene expression. The results are summarised in Table 1, below:

Table 1. Linear Model Summary: SWS1 Opsin Expression
Term	Estimate	Std. Error	t-value	p-value
Intercept	0.121	0.010	12.561	0.000
Population: Swamp vs Spring	-0.029	0.014	-2.106	0.044
Water Clarity: Muddy vs Clear	-0.028	0.013	-2.181	0.037
Interaction: Swamp × Muddy	0.000	0.019	0.002	0.999

Question 2f: Model Assumptions

Diagnostic plots indicated that model assumptions were met. Residuals appeared approximately normally distributed with constant variance, and no clear patterns or outliers were observed.

Question 2g: Interpretation of Results and Direction of Effects

The linear model showed that both population and light environment had statistically significant effects on the expression of the SWS1 opsin gene. Specifically, fish from the swamp population show lower SWS1 expression than those from the spring population, and fish raised in muddy light conditions have lower expression than those raised in clear conditions.

There was no significant interaction between population and light clarity (p = 0.999), indicating that the effect of the light environment on gene expression was consistent across both populations.

Both genetic and environmental factors reduce SWS1 expression, indicating that they influence opsin expression in the same direction. Biologically, this implies that both genetic background and environmental light exposure influence short-wavelength sensitivity, with reduced blue-light sensitivity likely reflecting adaptation to low-blue environments such as swamps.

Question 3: (42% - 15 marks)

Are the brains of modern humans larger than the brains of Neanderthals? Estimates of cranial capacity from fossils indicates that Neanderthals had large brains, but they also had a large body size. The data in the file NeanderthalBrainSize.csv are estimated log-transformed brain and body sizes of Neanderthal specimens and early modern humans.

The goal of the analysis is to determine, using all of the available data, whether humans and Neanderthals have different brain sizes.

Plot the data in an appropriate way to allow the relationships among the variables to be visualised.
Carry out an appropriate analysis of the data and summarise the main results in a Table.
Investigate whether the assumptions of the model you fitted are met. Explain your interpretations and take action if the assumptions are not met.
What do the model terms in the results Table represent? What statistical hypotheses are tested by the test statistics associated with each model term? And what can you conclude from these tests?
In view of the goals of the study, what would be the next step in your analysis? Explain your results after undertaking this next step.

Insert your work below the asterisks

Question 3a:

A plot was generated to visually compare brain–body scaling across species.

brain_data <- read_csv("NeanderthalBrainSize.csv", show_col_types = FALSE) |>
    clean_names() |>
    mutate(species = recode(species, recent = "human"))

ggplot(brain_data, aes(x = ln_mass, y = ln_brain, color = species)) + geom_point(size = 3,
    alpha = 0.8) + geom_smooth(method = "lm", se = FALSE) + labs(title = "Log-Transformed Brain Size as a Function of Body Size in Neanderthals and Modern Humans",
    x = "Log Body Size (lnMass)", y = "Log Brain Size (lnBrain)", color = "Species") +
    theme_minimal()

Log brain size vs log body size for Neanderthals and modern humans

Q3 (b)

An analysis of covariance (ANCOVA) is most appropriate to compare brain size between species while controlling for body size.

library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)

brain_data <- read_csv("NeanderthalBrainSize.csv") |>
    mutate(species = factor(species))


model <- lm(lnBrain ~ lnMass + species, data = brain_data)

model_table <- tidy(model) |>
    mutate(term = case_when(term == "(Intercept)" ~ "Intercept (Human)", term ==
        "lnMass" ~ "Log Body Mass", term == "speciesneanderthal" ~ "Species: Neanderthal",
        TRUE ~ term))

kable(model_table, col.names = c("Model Term", "Estimate", "Std. Error", "t-value",
    "p-value"), caption = "Table 1: ANCOVA Results — Predicting Log Brain Size from Log Body Mass and Species",
    align = "lcccc", booktabs = TRUE) |>
    kable_styling(full_width = FALSE, position = "center") |>
    row_spec(0, bold = TRUE)

Table 1: ANCOVA Results — Predicting Log Brain Size from Log Body Mass and Species
Model Term	Estimate	Std. Error	t-value	p-value
Intercept (Human)	5.1880712	0.3952570	13.125817	0.0000000
Log Body Mass	0.4963161	0.0917276	5.410764	0.0000043
speciesrecent	0.0702784	0.0282152	2.490801	0.0174947

Q3 (c)

# Diagnostic plots for linear model
par(mfrow = c(2, 2))  # layout: 2x2 grid
plot(model)  # Base R diagnostic plots

Model assumptions were assessed using standard residual diagnostic plots. The Residuals vs Fitted and Scale-Location plots showed no strong patterns, indicating that the assumptions of linearity and homoscedasticity are satisfied. The Normal Q-Q plot showed slight deviations at extremes, but overall the residuals followed a normal distribution. The Residuals vs Leverage plot indicated no influential outliers. Therefore, assumptions of the linear model are reasonably met, and no corrective actions were necessary.

Question 3d: Interpretation of Model Terms and Hypotheses

The ANCOVA model includes three key terms:

Intercept (Human): The estimated log brain size for a human with a log body size of zero (theoretical baseline).The estimate was 5.188**, which is highly significant (p < 0.001) but the intercept itself can not be biologically interpreted.
Log Body Mass: Represents the change in log brain size per unit increase in log body size. The estimate was 0.496 had a p-value < 0.001, indicating a strong, positive allometric relationship and therefore indicating brain size increases significantly with body size.
Species: Neanderthal: The estimated difference in intercept between Neanderthals and humans. The estimate was 0.070 with a p-value of 0.017, indicating that humans differ from neanderthals with bigger brain sizes after controlling for body size

Statistical Hypotheses:

Each term in the model is tested with the null hypothesis that its coefficient is equal to zero:

H₀ (Log Body Mass): Body size has no effect on brain size.
H₀ (Species: Neanderthal): There is no difference in brain size between Neanderthals and humans after adjusting for body size.

Both null hypotheses were rejected. The p-value for log body mass was significant, indicating a strong positive relationship between body and brain size. The species effect was also significant, suggesting that Neanderthals had significantly different brain sizes compared to modern humans after accounting for body size.

The results can be interpreted as demonstrating that both body size and species identity contribute to variation in brain size, and that the brain size of Neanderthals and modern humans differ when controlling for body mass. Both body size and species identity were significant predictors of brain size. Modern humans exhibit significantly greater brain size than Neanderthals when controlling for body mass.

Q3(d)

A logical next step might be to investigate the interaction between log body mass and species, specifically, a model checking the slope of the brain–body size relationship between between Neanderthals and modern humans. For example:

## 
## Call:
## lm(formula = lnBrain ~ lnMass * species, data = brain_data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.090341 -0.056928 -0.002495  0.034044  0.183930 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.2535     0.9769   4.354 0.000111 ***
## lnMass                 0.7135     0.2270   3.143 0.003395 ** 
## speciesrecent          1.1809     1.0623   1.112 0.273862    
## lnMass:speciesrecent  -0.2595     0.2481  -1.046 0.302790    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06655 on 35 degrees of freedom
## Multiple R-squared:  0.4653, Adjusted R-squared:  0.4195 
## F-statistic: 10.15 on 3 and 35 DF,  p-value: 5.908e-05

The interaction term in this extended model was not statistically significant, indicating that the slope of the brain–body size relationship is consistent across species. Therefore, the observed difference in brain size between humans and Neanderthals is due to a shift in intercept rather than a change in scaling. Modern humans have larger brains for body size, but the rate at which brain size increases with body size does not differ significantly between species.

Another additional next step to make the study more robust could be obtaining more data points, specifically for Neanderthals. Additional biologically relevant data such as sex and age of specimens could also be included as covariates in the model to help control for variation unrelated to species identity.

My title

Maeve Plouffe

16 May 2025

Question 1: (16% - 6 marks)

Q1a: Histogram

Q1b: Mean and Standard Deviation of Undulation Rate

Q1c Standard Error

Q1d

Question 2: (42% - 15 marks)

Q2(a)

Q2(b)

Q2(c)

Q2(d)

Q2(e)

Question 2f: Model Assumptions

Question 2g: Interpretation of Results and Direction of Effects

Question 3: (42% - 15 marks)

Question 3a:

Q3 (b)

Q3 (c)

Question 3d: Interpretation of Model Terms and Hypotheses

Statistical Hypotheses:

Q3(d)