BS2077 Tutorial 5

13 March 2025

Setting up

Export the data as .CSV

Put the .XLSX into a fresh clean folder - Goose Analysis or similar
Open the file in Excel and make your column headings R-friendly:
- time spent vigilant (s) becomes t.vig_s
- delete any columns with standardised versions of the variables
Save your .XLSX as a .CSV into the same folder:
- CVS (Comma delimited) (*.csv)
- not the UTF, Mac or MS-DOS version of CSV.
Remember the folder location

Make an RStudio Project

Open RStudio
File > New Project... > then choose Existing Directory
Navigate to your project folder (Goose Analysis) and select it
Open a new blank script: File > New File > R Script
Save it (as my_analysis.R or whatever)
Read in the .CSV file: type, in the script

Geese <- read.csv(file = "my data.csv")

You are in the script, not the console! – you need to press ctrl-return (cmd-return on Mac) to run the line of code

Calculate time fractions and frequencies

To account for the fact that the individual video segments (observation times) vary somewhat, we

convert durations to time-fractions (between 0 and 1, like percent);
convert head-up counts to frequencies (events per minute).

Geese$feed.tf <- Geese$time.feeding / Geese$clip.length
Geese$vig.tf <- Geese$time.vigilant / Geese$clip.length
Geese$hu.freq <- Geese$n.headups / Geese$clip.length * 60  # per minute

Does time invested in feeding depend on group size?

\(H_1:\) There is a correlation between time spent feeding and group size.

\(H_0:\) There is no correlation between time spent feeding and group size.

Always plot the data first! (1)

We’ll use package ggplot2. Simples:

library(ggplot2)  # load the "ggplot2" plotting package;

feeding <- ggplot(Geese, aes(x=flock.size, y=feed.tf))  # set up plot
feeding + geom_point()  # make scatter plot

Always plot the data first! (2)

Straight line fit (1)

You may feel the urge to add a regression line, which is easy enough:

feeding + geom_point() + 
  geom_smooth(method = "lm", se = FALSE, linetype = 2)

Straight line fit (2)

Is it sensible?

Look at the data: it is pretty clear that if there is a relationship between feeding time and flock size, it is not a linear one.

There is no reason to assume a priori that the relationship would be linear:

Hardly any relationships in Biology are linear;
Would you expect any net benefits of group living to increase linearly forever as group size increases? – Think ‘diminishing returns curves’ from the lectures!

Nonlinear fit (1)

Fit a smooth regression using "smooth" instead of "lm".

The span=1 bit just sets how bendy the line is allowed to be. Lower span means more wobbly (try it!).

feeding + geom_point() + 
  geom_smooth(method = "loess", span = 1, se = FALSE)

Nonlinear fit (2)

Hypohesis test (1)

Testing for correlation between feed.tf and flock.size.
By default, cor.test will carry out a Pearson’s correlation test. This assumes linearity!
Since the relationship is clearly non-linear, you should do Spearman’s rank correlation test.

cor.test( ~ feed.tf + flock.size, Geese, method="spearman")

Hypohesis test (2)

cor.test( ~ feed.tf + flock.size, Geese, method="spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  feed.tf and flock.size
## S = 297.62, p-value = 0.006139
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.6352753

Strict Frequentist: reject \(H_0\) at \(\alpha = 0.05\) (‘significant’ correlation); or
Strength of Evidence: the data provide good evidence that time investment in feeding increases with group size.

Does time invested in vigilance depend on group size?

\(H_1:\) There is a correlation between vigilance time and group size.

\(H_0:\) There is no correlation between vigilance time and group size.

Plot the data (1)

vigil <- ggplot(Geese, aes(x=flock.size, y=vig.tf)) # set up plot
vigil + geom_point()  # make scatter plot

Plot the data (2)

Plot the data (3)

Let’s add both linear and smooth regression lines:

vigil + geom_point() + 
  geom_smooth(method = "lm", se = FALSE, linetype = 2) +
  geom_smooth(method = "loess", span = 2, se = FALSE)

Plot the data (4)

Hypothesis test

cor.test( ~ vig.tf + flock.size, Geese, method="spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  vig.tf and flock.size
## S = 1268.6, p-value = 0.02085
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5546251

Strict Frequentist: reject \(H_0\) at \(\alpha = 0.05\) (‘significant’ correlation); or
Strength of Evidence: the data provide weak evidence that time investment in vigilance decreases with group size.

Does the frequency of headup events depend on group size?

\(H_1:\) There is a correlation between head-up frequency and group size.

\(H_0:\) There is no correlation between head-up frequency and group size.

Plot the data (1)

Let’s add the regression lines straight away

ggplot(Geese, aes(x=flock.size, y=hu.freq)) + 
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, linetype = 2) +
  geom_smooth(method = "loess", span = 1, se = FALSE)

Plot the data (2)

If anything, there is a very weak trend for head-ups to be more frequent in larger groups. Maybe they check more often, but only very briefly?

Hypothesis test

cor.test( ~ hu.freq + flock.size, Geese, method="spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  hu.freq and flock.size
## S = 621.37, p-value = 0.3566
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.2385211

Strict Frequentist: accept \(H_0\) at \(\alpha = 0.05\) (‘no significant’ correlation); or
Strength of Evidence: the data provide no evidence that the frequency of head-up events correlates with group size.

Either way, you have not proven the absence of a correlation!!

You can never prove the absence of a correlation with a \(P\)-value.
Put formally, accepting \(H_0\) does not reject \(H_1\).
but with sufficient \(N\) you can have compelling evidence that the correlation is very weak.

Are the vigilance bouts longer in smaller groups?

We can easily calculate the mean bout length, because we know both

the total time the head was up and
the number of bouts.

Geese$mean.vbout.length <- with(Geese, time.vigilant / hu.freq)

Plot the data (1)

ggplot(Geese, aes(x=flock.size, y=mean.vbout.length)) + geom_point() +
  geom_smooth(method = "lm", se = FALSE, linetype = 2) +
  geom_smooth(method = "loess", span = 1.5, se = FALSE)

Plot the data (2)

Hypothesis test

cor.test( ~ mean.vbout.length + flock.size, Geese, method="spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  mean.vbout.length and flock.size
## S = 1146.9, p-value = 0.1064
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.4054838

Curiously, this doesn’t come out significant at \(\alpha = 0.05\) – even though we have good evidence that total time vigilant is higher in small groups, and we found no support for claiming that the frequency of vigilance bouts depends on group size.

Log-transform \(x\) and \(y\) (1)

But we are power-limited… : our \(N\) is low, and rank-based tests have less power.

we could try to linearise the relationship,
which would then permit us to use a parametric test instead.

ggplot(Geese, aes(x=log(flock.size), y=log(mean.vbout.length))) + 
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, linetype = 2) +
  geom_smooth(method = "loess", span = 1.5, se = FALSE)

But be mindful that trying different tests until one comes out ‘significant’ is wrong-headed at best (if done naively), and fraudulent at worst.

Either way, it is known as p-hacking (google it…).

Log-transform \(x\) and \(y\) (2)

Try again…

cor.test( ~ log(mean.vbout.length) + log(flock.size),
          Geese, method="pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  log(mean.vbout.length) and log(flock.size)
## t = -2.9325, df = 15, p-value = 0.01029
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8404458 -0.1732788
## sample estimates:
##        cor 
## -0.6036483

Suggests that the individual vigilance bouts are indeed shorter in larger groups.

Engagement Marks

Do the Quiz on Top Hat
Deadline: Monday 17 March 2025 @ 10:00 AM