R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Cardinals pigments analysis

Log file

Nati calculated the pigment content based on the analysis of spectra of the several body regions of the birds.

So we have a pigment composition for every region that she measured.

Example for Habia fuscicauda (females above; males bellow).

The resume of that data is in file Pigments reviewed.

However, we did not have a proper analysis for two questions we had in the manuscript in relation to the presence of carotenoid pigments in the birds’ plumage. We concentrated in carotenoids because there is a lot of literature pointing at them as sexual selected signals, possibly indicator signals.

It would be interesting to also analyse the structural colours, but there is much less to base on for hypothesis testing.

So, one hypothesis that follows from the Hill and McGraw proposal (see Weaver 2018) is that carotenoids are indicator signals, as they are related with metabolic function and other costly cellular processes that cannot be faked. Weaver et al. (2018) tested the idea that this is probably related with chemical transformation of carotenoids, so that, the signal only works for transformed carotenoids, not for untransformed carotenoids deposited in the integument of the feathers. Thus, when sexually selected for colouration, males should more often present transformed carotenoids (that is, reds, oranges, and a few yellows as in cardueline finches), that untransformed (yellows). They should also differ from females.

So we have a number of questions 1. Is the Red colouration more common in males that in females in cardinals? 2. Are these pigments more present in the front of the body on a face-to-face interaction (crown, throat, breast)?

A related question is about the evolution of those pigments. The red colouration could evolve directly or could depend on the initial evolution of yellow pigments that do not require transformation, which is a more complicated process if it is to evolve together with pigment expression. 3. Is the presence of red pigments in males and in females, separately, a derived condition from yellow colouration? Or did they evolve largely independently from a previous state that could even be no carotenoids?

Here we have four conditions possible:

No pigments / Yellow / Yellow and Red / Red

We can test the transition probabilities between states as Delhey et al. did (2023).

For questions 1 and 2 –

it is necessary to use a multivariate phylogenetic statistic that can deal with binary dependent variable (presence : absence of a certain pigment).

This can be done with BRMS (Bayesian Phylogenetic Logistic Mixed Models).

phylogenetic covariance matrix

First step is to calculate a phylogenetic covariance matrix from the consensus phylogenetic tree, that will account for evolutionary proximity between species (there is correlation between data points - they are not totally independent).

This is done with ape::vcv.

First we need to read and incorporate the tree, which will be called phylo_tree.

library(ape)
phylo_tree <-read.nexus("Consensus_RAxML_bipartitions.result copy.tre")

This is the command to calculate the phylogenetic covariance matrix.

# Sp <- ape::vcv(phylo, model = "Brownian", corr = FALSE)
vcv_matrix <- ape::vcv.phylo(phylo_tree)

Analysis Structure

The structure of the analysis is:

Red_car – binomial variable with presence : absence (1:0) of red pigment in any body region.

Red_car = Sex with Species as a random factor since it will appear twice, one per each sex (It is equivalent to a repeated measures design), and the phylogenetic covariance matrix as factor.

The same is done with yellow.car

The expected outcome is that there will be a difference between the sexes in red but not in yellow.

Question 2 requires we do the same analysis but just for the facing body regions.

I started with the Pigments Reviewed file. Then I estimated 0:1 for red and for yellow in both males and females. This was organized in the Cardinals male_female_carotenoid_pigments_global.xlsx.

Red.car and yellow.car have two possible values each: 0 or 1.

During the process I checked the Delhey (2023) data and found one mis attribution in our data and a couple of errors in theirs.

I also calculated the presence of red or of yellow pigments for all body regions: Cardinals male_female_carotenoid_pigments_per_body_region_Mar2025.xlsx.

Question 1 Data structure

Variables: dependent – Red.car;

predictors: sex, species (random factor), phylogeny #phylogenetic covariance matrix.

Formula

It is possible (and, in some cases, necessary) to incorporate additional information about the response variable. For that, one can add one or more terms of the form | fun(variable). fun may be one of a few functions defined internally in brms and variable corresponds to a variable in the data set supplied by the user.

In binomial models, it is necessary to use the command trials

From Buerkner (2017) (page 8): “In functions such as glm or glmer, the binomial response is typically passed as cbind(success, failure). In brms, the equivalent syntax is success | trials(success + failure).

However, our distribution is a Bernouli distribution because the cases are either zero or 1. For Bernouli distribution the syntax is just: family = bernoulli(link = "logit")

Phylogenetic covariance Matrix The inclusion of the phylogenetic covariance matrix in the formula involves the inclusion of an element which is not a variable, but a matrix. It has the form: (1|gr(scinam, cov = vcv_matrix)). BRMS recognizes it as a matrix. But, ir appears it is also necessary to introduce that other type of data, which is defined by the command data2.

The phylogenetic covariance matrix must have the same number of species (and names written in the same form) as the number of lines in the data matrix. They must match.

Defining Priors

For a Bayesian analysis, it is essential to define the priors for each predictor. The priors are critical to obtain the posteriors and can affect the result, giving rise to very odd results. Usually, in BRMS, we define flat priors, unless we have good reasons to assume certain values. Details are found in: https://paulbuerkner.com/brms/reference/set_prior.html

set_prior is used to define prior distributions for parameters in brms models. The functions prior, prior_, and prior_string are aliases of set_prior each allowing for a different kind of argument specification.

To combine multiple priors, use c(…)

Stan will check their syntactical correctness when the model is parsed to C++ and returns an error if they are not. This, however, does not imply that priors are always meaningful if they are accepted by Stan.

To get a full list of parameters and parameter classes for which priors can be specified (depending on the model) use function default_prior.

  1. Population-level (‘fixed’) effects

The default prior for population-level effects (including monotonic and category specific effects) is an improper flat prior over the reals. Other common options are normal priors or student-t priors. If we want to have a normal prior with mean 0 and standard deviation 5 for x1, and a unit student-t prior with 10 degrees of freedom for x2, we can specify this via set_prior("normal(0,5)", class = "b", coef = "x1") and set_prior("student_t(10, 0, 1)", class = "b", coef = "x2").

  1. Random Effects

For random effects associated with repeated measures (e.g., subjects, time points), you typically want to specify priors for the standard deviations of the random effects. Common choices include:

Normal Distribution: This is often used for the random intercepts. However, since standard deviations must be positive, it's more common to use a half-normal or half-Cauchy distribution.

set_prior("cauchy(0, 2)", class = "sd", group = "your_random_effect")

The phylogenetic covariance matrix also needs priors. I copyed the priors in the example of Buerkner: set_prior(student_t(3,0,20), "sd").

Syntax:

library(ape)
phylo_tree <-read.nexus("Consensus_RAxML_bipartitions.result copy.tre")
vcv_matrix <- ape::vcv.phylo(phylo_tree)
library("brms")
## Loading required package: Rcpp
## Loading 'brms' package (version 2.22.0). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
## 
## Attaching package: 'brms'
## The following object is masked from 'package:stats':
## 
##     ar
#my_data <- read.table("Cardinals male_female_carotenoid_pigments_global_2.txt")
#my_data <- data.frame(read.csv("Cardinals male_female_carotenoid_pigments_global_2.csv"))")

my_data <- data.frame(read.csv("Cardinals male_female_carotenoid_pigments_global_3.csv"))

#New model just with one Random effect from the covariance matrix, since it is not possible to have two random effects with the same name. Thus I removed the random factor and kept the Cov_matrix.

fit_simple_model <- brm(
  formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)),
  # formula = red_car ~ sex + (1|scinam),  # Model wo species covariance
  data = my_data,  
  family = bernoulli(link = "logit"),  # logit is default, so the argument can be simple
  data2 = list(vcv_matrix = vcv_matrix), # it is necessary to introduce the covariance matrix which is    #done using the data2 argument.     
  prior = c(
    prior_string("normal(0,2)", class = "b"), # sets priors for fixed effects that are factors
    prior(student_t(3,0,20), "sd")
      #set_prior ("student_t(3,0,5)", class = "sd", group = "(1|scinam)"),  #sets a Student_t distribution for the random effects
     #set_prior ("student_t(3,0,5)", class = "sd", group = "vcv_matrix") #sets a Student_t distribution for the covariance matrix.
  ),
  warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95),
      # trials = my_data$trials
)
## Compiling Stan program...
## Start sampling
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 9.2e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.92 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.541 seconds (Warm-up)
## Chain 1:                0.377 seconds (Sampling)
## Chain 1:                0.918 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 2.7e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.5 seconds (Warm-up)
## Chain 2:                0.397 seconds (Sampling)
## Chain 2:                0.897 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 2.7e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.601 seconds (Warm-up)
## Chain 3:                0.707 seconds (Sampling)
## Chain 3:                1.308 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 2.7e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.581 seconds (Warm-up)
## Chain 4:                0.42 seconds (Sampling)
## Chain 4:                1.001 seconds (Total)
## Chain 4:
summary(fit_simple_model)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: red_car ~ sex + (1 | gr(scinam, cov = vcv_matrix)) 
##    Data: my_data (Number of observations: 94) 
##   Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
##          total post-warmup draws = 2000
## 
## Multilevel Hyperparameters:
## ~scinam (Number of levels: 47) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     8.23      2.78     3.77    14.62 1.00      616     1175
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -4.46      1.86    -8.61    -1.41 1.00     1110     1252
## sexmale       4.49      1.10     2.64     7.01 1.00     1510     1399
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Test code

#my_data <- read.csv("Cardinals male_female_carotenoid_pigments_global.csv")
#my_data <- data.frame(read.csv("Cardinals male_female_carotenoid_pigments_global_2.csv"))
#data("my_data", package = "brms")
# head(filename)
head(my_data)

# define some priors - example from Buerkner
#bprior <- c(prior_string("normal(0,10)", class = "b"),
#            prior(normal(1,2), class = b, coef = treat),
#            prior_(~cauchy(0,2), class = ~sd,
#                   group = ~subject, coef = ~Intercept))

fit1 <- brm(
   # formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)) + (1|species),
  formula = red_car ~ sex + (1|species),  # Model wo species covariance
  #formula = red_car ~ sex,
  data = my_data,  
  family = binomial,  # logit is default, so the argument can be simple
  prior = c(
    prior_string("normal(0,10)", class = "b"), # sets priors for fixed effects that are factors
    # set_prior(~cauchy(0,2), class = ~sd, group = "species")
    set_prior("cauchy(0,2)", class = "sd", group = "Cardinals male_female_carotenoid_pigments_global.csv")
    #set_prior ("student_t(3,0,5)", class = "sd", group = "(1|scinam)"),  #sets a Student_t distribution for the random effects
  ),
  warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95)
)

# Simpler model

my_data$trials <- 1  # Add a new column 'trials' with a value of 1 for each observation


fit_simple_model <- brm(
   # formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)) + (1|species),
  # formula = red_car ~ sex + (1|scinam),  # Model wo species covariance
  #formula = red_car | trials(0 + 1) ~ sex,
  formula = red_car ~ sex,
    data = my_data,  
  family = bernoulli(link = "logit"),  # logit is default, so the argument can be simple
  prior = c(
    prior_string("normal(0,2)", class = "b") # sets priors for fixed effects that are factors
    # set_prior(~cauchy(0,2), class = ~sd, group = "scinam")
    # set_prior("cauchy(0,2)", class = "sd", group = "scinam")
    #set_prior ("student_t(3,0,5)", class = "sd", group = "(1|species)"),  #sets a Student_t distribution for the random effects
  ),
  warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95),
  # trials = my_data$trials
)
summary(fit_simple_model)

#sligthly more complicated model with repeated measures
#this model runs normally.
fit_simple_model <- brm(
   # formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)) + (1|scinam),
  formula = red_car ~ sex + (1|scinam),  # Model wo species covariance
  #formula = red_car | trials(0 + 1) ~ sex,
  #formula = red_car ~ sex,
    data = my_data,  
  family = bernoulli(link = "logit"),  # logit is default, so the argument can be simple
  prior = c(
    prior_string("normal(0,2)", class = "b"), # sets priors for fixed effects that are factors
    # set_prior(~cauchy(0,2), class = ~sd, group = "scinam")
    set_prior("cauchy(0,2)", class = "sd", group = "scinam")
    #set_prior ("student_t(3,0,5)", class = "sd", group = "(1|species)"),  #sets a Student_t distribution for the random effects
  ),
  warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95),
  # trials = my_data$trials
)
summary(fit_simple_model)

# even more complicated model with random factor and covariance matrix
fit_simple_model <- brm(
  formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)) + (1|scinam),
  formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)),
  # formula = red_car ~ sex + (1|scinam),  # Model wo species covariance
  #formula = red_car ~ sex,
  data = my_data,  
  family = bernoulli(link = "logit"),  # logit is default, so the argument can be simple
  data2 = list(vcv_matrix = vcv_matrix), # it is necessary to introduce the covariance matrix which is    #done using the data2 argument.     
  prior = c(
    prior_string("normal(0,2)", class = "b"), # sets priors for fixed effects that are factors
       # set_prior(~cauchy(0,2), class = ~sd, group = "scinam")
    set_prior("cauchy(0,2)", class = "sd", group = "scinam"), #sets prior for random effect of species names
    set_prior(student_t(3,0,20), "sd")
      #set_prior ("student_t(3,0,5)", class = "sd", group = "(1|scinam)"),  #sets a Student_t distribution for the random effects
      #set_prior ("student_t(3,0,5)", class = "sd", group = "vcv_matrix") #sets a Student_t distribution for the covariance matrix.
  ),
  warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95),
      # trials = my_data$trials
)
summary(fit_simple_model)


#New model just with one Random effect from the covariance matrix, since it is not possible to have two random effects with the same name. Thus I removed the random factor and kept the Cov_matrix.

fit_simple_model <- brm(
  formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)),

  # formula = red_car ~ sex + (1|scinam),  # Model wo species covariance
  #formula = red_car | trials(0 + 1) ~ sex,
  #formula = red_car ~ sex,
  data = my_data,  
  family = bernoulli(link = "logit"),  # logit is default, so the argument can be simple
  data2 = list(vcv_matrix = vcv_matrix), # it is necessary to introduce the covariance matrix which is    #done using the data2 argument.     
  prior = c(
    prior_string("normal(0,2)", class = "b"), # sets priors for fixed effects that are factors
       # set_prior(~cauchy(0,2), class = ~sd, group = "scinam")
    #set_prior("cauchy(0,2)", class = "sd", group = "species"), #sets prior for random effect of species names
    prior(student_t(3,0,20), "sd")
      #set_prior ("student_t(3,0,5)", class = "sd", group = "(1|scinam)"),  #sets a Student_t distribution for the random effects
     #set_prior ("student_t(3,0,5)", class = "sd", group = "vcv_matrix") #sets a Student_t distribution for the covariance matrix.
  ),
  warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95),
      # trials = my_data$trials
)
summary(fit_simple_model)

Reference page on priors: https://paulbuerkner.com/brms/reference/set_prior.html

Code for debugging with simpler model

# Simpler model
# Simplify the model
formula = red_car ~ sex
#get_prior(formula, data = my_data) # this command will inform about the parameter classes when factors (sex) are used as predictors

#To combine multiple priors, use c(...) 

#prior_res <- default_prior(formula, data=my_data)
#fit1 <- brm(
  formula,
  data = my_data,
  family = binomial(),
  prior = default_prior(formula, data=my_data),
  # prior= get_prior(formula, data=my_data),
  warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95)
)

# Summarize the model
summary(fit1)

Simple code from post. This code goes further but gets stuck failing to use the g++ compiler (working on the Rtools setup at the moment).

data_simple <- read.table(
  "https://paul-buerkner.github.io/data/data_simple.txt",
  header = TRUE
)
model_simple <- brm(
  phen ~ cofactor,
  data = data_simple,
  family = gaussian(),
  prior = c(
    prior(normal(0, 10), "b"),
    prior(normal(0, 50), "Intercept")
  )
)

Remaining Problems (13/3)

  1. Set priors correctly
  2. Configure setup to find g++ from Rtools. (Manual: https://blog.thecoatlessprofessor.com/programming/cpp/installing-rtools-for-compiled-code-via-rcpp/index.html)
  3. Use bernouli instead as the distribution
  4. Use more complex model with species covariance.

Question 2 -

Question 3 -

We will need to run BayesTraits from the BTW package (Pagel & Meade, 2006).

It is used for state reconstruction, comparing most likely ancestral states for a given one, throughout the phylogenetic tree.