LA identity

rm(list = ls())
setwd("C:/Users/User/Desktop")


# Libs & data
library(readxl)
library(tidyverse)


dt <- read_excel("./liakm2_datause.xlsx")

Ania’s ideas/questions:

vocal individuality: PCA, then this used as input for pDFA for ID (controlling for calltype and sex)?

(1a) stability over time: we have vocalisations from 2019 and 2020; not all individuals were recorded in both years. Do you have any suggestions how to test the individual vocal signatures’ stability between the years?

sex differences: PCA, then this used as input for pDFA for sex (controlling for calltype and ID, or just calltype?)?
effect of size: LMs for total head lenght - but how to choose the variables?

(4)? couple similarity: Kasia - can this be done, or is Marion doing something similar? We have the birds assigned to a nest, so we could possibly try to see whether birds in one nest, i.e. partners, are more similar than non-partners? Not sure whether and how to approach this. Kasia - perhaps the tests you used for assortative mating would fit here?

Kasia’s comments

Two issues (to think of before running analyses):

call type - to compare individuals/sexes/partners perhaps we should focus on a single call type (the one that makes the most sense given social context? or the one which is the most variable across the populatiion? ….), otherwise there is two much noise in the data, and the set may be also biased. For example, one individual/sex/partner/ may be more represented with the classic call while the other with low trill - these two signals tell different story, and are of different acoustic characteristic., not taking it into account would be like comparing apples with oranges
acoustic parameters - we have many and probably the best approach would be to use just PC1/2 here as the target trait to be used in the analyses. Alternatively, for given analysis we can consider specific acoustic parameter if that makes sense (for example we may primarily expect sexes to differ in F0).

In all the analyses below I focused on the classic call (as it is pretty well represented in data and may be also considered to be socially important) and Max F0 (Hz) as the acoustic parameter. Whatever, we would decide, it can be easily replaced (while the codes/results should be still valid).

PCA/DF

WE have same issue as everythere, the call type should be taken into account before running any PCA (otherwise data are noisy and potentially biased)

Here is pretty nice package for the PCA and DF, with nice tutorial library(adegenet)dt_long_rep https://adegenet.r-forge.r-project.org/files/tutorial-dapc.pdf

Signal within-individual repeatabiltity

Again, Max F0 values of individuals for the classic call type are used. Repeatability analysis performed with Gaussian error distribution and 100 iterations only (both for bootstrap and permutation), but final results should be based on let’s say n = 1000 iterations? Sex is added to the model (and seems to improve it a little).

library(rptR)

# Same issues as for partners similarity, to analyze repeatablity of the signal we must homogenize the data set 
# both in terms of the calltype and the bioacoustic parameter 

# prep data
dt_long_rep <- dt %>% 
  filter(calltype == "classic") %>%
  select(nest, ring_no, sex, `Max F0 (Hz)`) %>% 
  rename(Value = `Max F0 (Hz)`)

# to see the samplsize
# dt_long_rep %>% group_by(sex, ring_no) %>% summarise(n = n()) %>% View()

# model
set.seed(100)
rpt_res <- rpt(Value ~ (1 | ring_no) + sex, 
               grname = "ring_no", data = dt_long_rep,
               datatype = "Gaussian", 
               nboot = 100, npermut = 100)

## Bootstrap Progress:
## Permutation Progress for ring_no :

# print(rpt_res)
summary(rpt_res)

## 
## Repeatability estimation using the lmm method
## 
## Call = rpt(formula = Value ~ (1 | ring_no) + sex, grname = "ring_no", data = dt_long_rep, datatype = "Gaussian", nboot = 100, npermut = 100)
## 
## Data: 159 observations
## ----------------------------------------
## 
## ring_no (24 groups)
## 
## Repeatability estimation overview: 
##       R     SE   2.5%  97.5% P_permut  LRT_P
##   0.457  0.126  0.201  0.653     0.01      0
## 
## Bootstrapping and Permutation test: 
##             N   Mean Median   2.5%  97.5%
## boot      100 0.4541   0.48  0.201  0.653
## permut    100 0.0206   0.00  0.000  0.102
## 
## Likelihood ratio test: 
## logLik full model = -962.5042
## logLik red. model = -984.4827
## D  = 44, df = 1, P = 1.68e-11
## 
## ----------------------------------------

# plot(rpt_res, type = "boot", cex.main = 1)
# plot(rpt_res, type = "permut", cex.main = 1)
# summary(rpt_res$mod)

# if all works above, the analysis should be performed with 1000 iterations, so you can run this to update it.
# rpt_res_update <- rpt(Value ~ (1 | ring_no) + sex, 
#                       grname = "ring_no", data = dt_long_rep,
#                       datatype = "Gaussian",
#                       nboot = 900, npermut = 900,
#                       update = TRUE, rptObj = rpt_res)
# 
# 
# summary(rpt_res_update) 
#

CONCLUSIONS Pretty high R = 0.4566304, and significantly different from zero. Cool :)

Sex differences

Just to look how particual parameters look for the two sexes. But PCs would be probably more accurate here. ThenI would analyse the effect of sex with regular modelling.

dt_long_sx <- dt %>% 
  filter(calltype == "classic") 
dt_long_sx <- dt_long_sx[,c(2:16, 24, 27)] %>% 
  pivot_longer(names_to = "parameter", values_to = "value", -c(sex, ring_no))

# if (!require(devtools)) {
#   install.packages('devtools')
# }
# devtools::install_github('erocoar/gghalves')
library(gghalves)
ggplot(dt_long_sx, aes(x = sex, y = value)) + 
  facet_wrap(~parameter, scales = "free") +
  geom_half_violin(fill = "tomato") + geom_half_boxplot() + theme_bw()

CONCLUSIONS Not much sex differences. But maybe we should not actually expect it to be quite significant, if analysed parameters are primarily linked with morphology, and the two sexes are so much similar to each other.

Partners similarity

Values (Max F0) of male and female parnters analysed with correlation analysis. The first plot shows observed pattern, the second one outcome of the randomization procedure, to test significance of the observed value of correlation coeficient.

dt_long <- dt %>% 
  filter(calltype == "classic") %>%
  select(nest, sex, `Max F0 (Hz)`) %>%
  group_by(nest, sex) %>% 
  summarise(seasmean_meanF0 = mean(`Max F0 (Hz)`, rm.na = TRUE)) %>%
  ungroup() %>% 
  pivot_wider(names_from = sex, values_from = seasmean_meanF0) %>% 
  filter(!is.na(f+m))

# plot for the observed relationship
ggplot(dt_long, aes(x = f, y = m)) + geom_point() + geom_smooth(method = "lm")

# OBSERVED value for the strength of the relationship
obs_val <- cor(dt_long$f, dt_long$m)

# testing significance of the observed value
set.seed(154153)
N <- 1000

rand_val <- numeric(N)

for(i in 1:N) {
  
  rand_f <- sample(x = dt_long$f, size = length(dt_long$f),  replace = TRUE)
  rand_val[i] <- cor(rand_f, dt_long$m)
  
}

p.val <- sum(obs_val >= rand_val)/N


# plot for the significance of the observed cocoef
rand_val <- as.data.frame(rand_val)
ggplot(rand_val, aes(x =rand_val )) + 
  geom_density(fill = "grey") +
  geom_vline(aes(xintercept =   obs_val), size = 2) + 
  theme_bw()

CONLCUSIONS: It seems there is some kind of relationship between the partners but it is not really significant (P = 0.054), and the sample size here (n = 10 pais) is really low, so it is quite promising, we can sell it as an extra thing in the ms but the relationship (including its biologicial significance) is to be properly tested in a future study (on much bigger samples size)

Body size effect

Same two issues as for any other analyses (focus on given call type, decide about acousting parameter). Becaue we have multiple acoustic measurments per individual but a single body size measurment only, to account for the pseudoreplication here I averaged the values per individual. Perhaps more elegant approach would be to run mixed model (with bird ID as a random factor) but given sample size, and also results of regular linear model (little variance explained), it is not a good idea.

dt_long_size <- dt %>% 
  filter(calltype == "classic") %>%
  group_by(ring_no) %>% 
  mutate(mean_thl = mean(THL),
         mean_val = mean(`Max F0 (Hz)`)) %>% 
  select(ring_no, sex, mean_thl, mean_val) %>% 
  unique()

# plot
ggplot(dt_long_size, aes(x = mean_thl, y = mean_val)) + 
  geom_point() + 
  geom_smooth(method = "lm") + 
  theme_bw()

model <- lm(mean_val ~ mean_thl + sex, data = dt_long_size)
# plot(model,2)
summary(model)

## 
## Call:
## lm(formula = mean_val ~ mean_thl + sex, data = dt_long_size)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -245.213  -44.452   -2.746   65.417  214.241 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  1716.66    1132.11   1.516    0.144
## mean_thl      -10.34      21.26  -0.487    0.632
## sexm           62.63      57.45   1.090    0.288
## 
## Residual standard error: 106.6 on 21 degrees of freedom
## Multiple R-squared:  0.05716,    Adjusted R-squared:  -0.03263 
## F-statistic: 0.6366 on 2 and 21 DF,  p-value: 0.539

CONLCUSIONS: Very little variance explained by the model. I am sceptical about this analysis