This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Nati calculated the pigment content based on the analysis of spectra of the several body regions of the birds.
So we have a pigment composition for every region that she measured.
Example for Habia fuscicauda (females above; males bellow).
The resume of that data is in file Pigments reviewed.
However, we did not have a proper analysis for two questions we had in the manuscript in relation to the presence of carotenoid pigments in the birds’ plumage. We concentrated in carotenoids because there is a lot of literature pointing at them as sexual selected signals, possibly indicator signals.
It would be interesting to also analyse the structural colours, but there is much less to base on for hypothesis testing.
So, one hypothesis that follows from the Hill and McGraw proposal (see Weaver 2018) is that carotenoids are indicator signals, as they are related with metabolic function and other costly cellular processes that cannot be faked. Weaver et al. (2018) tested the idea that this is probably related with chemical transformation of carotenoids, so that, the signal only works for transformed carotenoids, not for untransformed carotenoids deposited in the integument of the feathers. Thus, when sexually selected for colouration, males should more often present transformed carotenoids (that is, reds, oranges, and a few yellows as in cardueline finches), that untransformed (yellows). They should also differ from females.
So we have a number of questions 1. Is the Red colouration more common in males that in females in cardinals? 2. Are these pigments more present in the front of the body on a face-to-face interaction (crown, throat, breast)?
A related question is about the evolution of those pigments. The red colouration could evolve directly or could depend on the initial evolution of yellow pigments that do not require transformation, which is a more complicated process if it is to evolve together with pigment expression. 3. Is the presence of red pigments in males and in females, separately, a derived condition from yellow colouration? Or did they evolve largely independently from a previous state that could even be no carotenoids?
Here we have four conditions possible:
No pigments / Yellow / Yellow and Red / Red
We can test the transition probabilities between states as Delhey et al. did (2023).
For questions 1 and 2 –
it is necessary to use a multivariate phylogenetic statistic that can deal with binary dependent variable (presence : absence of a certain pigment).
This can be done with BRMS (Bayesian Phylogenetic Logistic Mixed Models).
https://cran.r-project.org/web/packages/brms/vignettes/brms_phylogenetics.html
First step is to calculate a phylogenetic covariance matrix from the consensus phylogenetic tree, that will account for evolutionary proximity between species (there is correlation between data points - they are not totally independent).
This is done with ape::vcv.
First we need to read and incorporate the tree, which will be called
phylo_tree.
library(ape)
phylo_tree <-read.nexus("Consensus_RAxML_bipartitions.result copy.tre")
This is the command to calculate the phylogenetic covariance matrix.
# Sp <- ape::vcv(phylo, model = "Brownian", corr = FALSE)
vcv_matrix <- ape::vcv.phylo(phylo_tree)
The structure of the analysis is:
Red_car – binomial variable with presence : absence
(1:0) of red pigment in any body region.
Red_car = Sex with Species as a random factor since it
will appear twice, one per each sex (It is equivalent to a repeated
measures design), and the phylogenetic covariance matrix as factor.
The same is done with yellow.car
The expected outcome is that there will be a difference between the sexes in red but not in yellow.
Question 2 requires we do the same analysis but just for the facing body regions.
I started with the Pigments Reviewed file. Then I estimated 0:1 for red and for yellow in both males and females. This was organized in the Cardinals male_female_carotenoid_pigments_global.xlsx.
Red.car and yellow.car have two possible values each: 0 or 1.
During the process I checked the Delhey (2023) data and found one mis attribution in our data and a couple of errors in theirs.
I also calculated the presence of red or of yellow pigments for all body regions: Cardinals male_female_carotenoid_pigments_per_body_region_Mar2025.xlsx.
Variables: dependent – Red.car;
predictors: sex, species (random factor), phylogeny #phylogenetic covariance matrix.
It is possible (and, in some cases, necessary) to incorporate additional information about the response variable. For that, one can add one or more terms of the form | fun(variable). fun may be one of a few functions defined internally in brms and variable corresponds to a variable in the data set supplied by the user.
In binomial models, it is necessary to use the command
trials
From Buerkner (2017) (page 8): “In functions such as glm or glmer,
the binomial response is typically passed as cbind(success, failure). In
brms, the equivalent syntax is
success | trials(success + failure).”
However, our distribution is a Bernouli distribution because the
cases are either zero or 1. For Bernouli distribution the syntax is
just: family = bernoulli(link = "logit")
Phylogenetic covariance Matrix The inclusion of the phylogenetic
covariance matrix in the formula involves the inclusion of an element
which is not a variable, but a matrix. It has the form:
(1|gr(scinam, cov = vcv_matrix)). BRMS recognizes it as a
matrix. But, ir appears it is also necessary to introduce that other
type of data, which is defined by the command data2.
The phylogenetic covariance matrix must have the same number of species (and names written in the same form) as the number of lines in the data matrix. They must match.
For a Bayesian analysis, it is essential to define the priors for each predictor. The priors are critical to obtain the posteriors and can affect the result, giving rise to very odd results. Usually, in BRMS, we define flat priors, unless we have good reasons to assume certain values. Details are found in: https://paulbuerkner.com/brms/reference/set_prior.html
set_prior is used to define prior distributions for parameters in brms models. The functions prior, prior_, and prior_string are aliases of set_prior each allowing for a different kind of argument specification.
To combine multiple priors, use c(…)
Stan will check their syntactical correctness when the model is parsed to C++ and returns an error if they are not. This, however, does not imply that priors are always meaningful if they are accepted by Stan.
To get a full list of parameters and parameter classes for which
priors can be specified (depending on the model) use function
default_prior.
The default prior for population-level effects (including monotonic
and category specific effects) is an improper flat prior over the reals.
Other common options are normal priors or student-t priors. If we want
to have a normal prior with mean 0 and standard deviation 5 for x1, and
a unit student-t prior with 10 degrees of freedom for x2, we can specify
this via set_prior("normal(0,5)", class = "b", coef = "x1")
and
set_prior("student_t(10, 0, 1)", class = "b", coef = "x2").
For random effects associated with repeated measures (e.g., subjects, time points), you typically want to specify priors for the standard deviations of the random effects. Common choices include:
Normal Distribution: This is often used for the random intercepts. However, since standard deviations must be positive, it's more common to use a half-normal or half-Cauchy distribution.
set_prior("cauchy(0, 2)", class = "sd", group = "your_random_effect")
The phylogenetic covariance matrix also needs priors. I copyed the
priors in the example of Buerkner:
set_prior(student_t(3,0,20), "sd").
Syntax:
library(ape)
phylo_tree <-read.nexus("Consensus_RAxML_bipartitions.result copy.tre")
vcv_matrix <- ape::vcv.phylo(phylo_tree)
library("brms")
## Loading required package: Rcpp
## Loading 'brms' package (version 2.22.0). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
##
## Attaching package: 'brms'
## The following object is masked from 'package:stats':
##
## ar
#my_data <- read.table("Cardinals male_female_carotenoid_pigments_global_2.txt")
#my_data <- data.frame(read.csv("Cardinals male_female_carotenoid_pigments_global_2.csv"))")
my_data <- data.frame(read.csv("Cardinals male_female_carotenoid_pigments_global_3.csv"))
#New model just with one Random effect from the covariance matrix, since it is not possible to have two random effects with the same name. Thus I removed the random factor and kept the Cov_matrix.
fit_simple_model <- brm(
formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)),
# formula = red_car ~ sex + (1|scinam), # Model wo species covariance
data = my_data,
family = bernoulli(link = "logit"), # logit is default, so the argument can be simple
data2 = list(vcv_matrix = vcv_matrix), # it is necessary to introduce the covariance matrix which is #done using the data2 argument.
prior = c(
prior_string("normal(0,2)", class = "b"), # sets priors for fixed effects that are factors
prior(student_t(3,0,20), "sd")
#set_prior ("student_t(3,0,5)", class = "sd", group = "(1|scinam)"), #sets a Student_t distribution for the random effects
#set_prior ("student_t(3,0,5)", class = "sd", group = "vcv_matrix") #sets a Student_t distribution for the covariance matrix.
),
warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95),
# trials = my_data$trials
)
## Compiling Stan program...
## Start sampling
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 9.2e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.92 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.541 seconds (Warm-up)
## Chain 1: 0.377 seconds (Sampling)
## Chain 1: 0.918 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 2.7e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 2: Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 2: Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 2: Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 2: Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 2: Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 2: Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 2: Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 2: Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 2: Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 2: Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 2: Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.5 seconds (Warm-up)
## Chain 2: 0.397 seconds (Sampling)
## Chain 2: 0.897 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 2.7e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 3: Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 3: Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 3: Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 3: Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 3: Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 3: Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 3: Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 3: Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 3: Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 3: Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 3: Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.601 seconds (Warm-up)
## Chain 3: 0.707 seconds (Sampling)
## Chain 3: 1.308 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 2.7e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 4: Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 4: Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 4: Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 4: Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 4: Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 4: Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 4: Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 4: Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 4: Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 4: Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 4: Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.581 seconds (Warm-up)
## Chain 4: 0.42 seconds (Sampling)
## Chain 4: 1.001 seconds (Total)
## Chain 4:
summary(fit_simple_model)
## Family: bernoulli
## Links: mu = logit
## Formula: red_car ~ sex + (1 | gr(scinam, cov = vcv_matrix))
## Data: my_data (Number of observations: 94)
## Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
## total post-warmup draws = 2000
##
## Multilevel Hyperparameters:
## ~scinam (Number of levels: 47)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 8.23 2.78 3.77 14.62 1.00 616 1175
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -4.46 1.86 -8.61 -1.41 1.00 1110 1252
## sexmale 4.49 1.10 2.64 7.01 1.00 1510 1399
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Test code
#my_data <- read.csv("Cardinals male_female_carotenoid_pigments_global.csv")
#my_data <- data.frame(read.csv("Cardinals male_female_carotenoid_pigments_global_2.csv"))
#data("my_data", package = "brms")
# head(filename)
head(my_data)
# define some priors - example from Buerkner
#bprior <- c(prior_string("normal(0,10)", class = "b"),
# prior(normal(1,2), class = b, coef = treat),
# prior_(~cauchy(0,2), class = ~sd,
# group = ~subject, coef = ~Intercept))
fit1 <- brm(
# formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)) + (1|species),
formula = red_car ~ sex + (1|species), # Model wo species covariance
#formula = red_car ~ sex,
data = my_data,
family = binomial, # logit is default, so the argument can be simple
prior = c(
prior_string("normal(0,10)", class = "b"), # sets priors for fixed effects that are factors
# set_prior(~cauchy(0,2), class = ~sd, group = "species")
set_prior("cauchy(0,2)", class = "sd", group = "Cardinals male_female_carotenoid_pigments_global.csv")
#set_prior ("student_t(3,0,5)", class = "sd", group = "(1|scinam)"), #sets a Student_t distribution for the random effects
),
warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95)
)
# Simpler model
my_data$trials <- 1 # Add a new column 'trials' with a value of 1 for each observation
fit_simple_model <- brm(
# formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)) + (1|species),
# formula = red_car ~ sex + (1|scinam), # Model wo species covariance
#formula = red_car | trials(0 + 1) ~ sex,
formula = red_car ~ sex,
data = my_data,
family = bernoulli(link = "logit"), # logit is default, so the argument can be simple
prior = c(
prior_string("normal(0,2)", class = "b") # sets priors for fixed effects that are factors
# set_prior(~cauchy(0,2), class = ~sd, group = "scinam")
# set_prior("cauchy(0,2)", class = "sd", group = "scinam")
#set_prior ("student_t(3,0,5)", class = "sd", group = "(1|species)"), #sets a Student_t distribution for the random effects
),
warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95),
# trials = my_data$trials
)
summary(fit_simple_model)
#sligthly more complicated model with repeated measures
#this model runs normally.
fit_simple_model <- brm(
# formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)) + (1|scinam),
formula = red_car ~ sex + (1|scinam), # Model wo species covariance
#formula = red_car | trials(0 + 1) ~ sex,
#formula = red_car ~ sex,
data = my_data,
family = bernoulli(link = "logit"), # logit is default, so the argument can be simple
prior = c(
prior_string("normal(0,2)", class = "b"), # sets priors for fixed effects that are factors
# set_prior(~cauchy(0,2), class = ~sd, group = "scinam")
set_prior("cauchy(0,2)", class = "sd", group = "scinam")
#set_prior ("student_t(3,0,5)", class = "sd", group = "(1|species)"), #sets a Student_t distribution for the random effects
),
warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95),
# trials = my_data$trials
)
summary(fit_simple_model)
# even more complicated model with random factor and covariance matrix
fit_simple_model <- brm(
formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)) + (1|scinam),
formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)),
# formula = red_car ~ sex + (1|scinam), # Model wo species covariance
#formula = red_car ~ sex,
data = my_data,
family = bernoulli(link = "logit"), # logit is default, so the argument can be simple
data2 = list(vcv_matrix = vcv_matrix), # it is necessary to introduce the covariance matrix which is #done using the data2 argument.
prior = c(
prior_string("normal(0,2)", class = "b"), # sets priors for fixed effects that are factors
# set_prior(~cauchy(0,2), class = ~sd, group = "scinam")
set_prior("cauchy(0,2)", class = "sd", group = "scinam"), #sets prior for random effect of species names
set_prior(student_t(3,0,20), "sd")
#set_prior ("student_t(3,0,5)", class = "sd", group = "(1|scinam)"), #sets a Student_t distribution for the random effects
#set_prior ("student_t(3,0,5)", class = "sd", group = "vcv_matrix") #sets a Student_t distribution for the covariance matrix.
),
warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95),
# trials = my_data$trials
)
summary(fit_simple_model)
#New model just with one Random effect from the covariance matrix, since it is not possible to have two random effects with the same name. Thus I removed the random factor and kept the Cov_matrix.
fit_simple_model <- brm(
formula = red_car ~ sex + (1|gr(scinam, cov = vcv_matrix)),
# formula = red_car ~ sex + (1|scinam), # Model wo species covariance
#formula = red_car | trials(0 + 1) ~ sex,
#formula = red_car ~ sex,
data = my_data,
family = bernoulli(link = "logit"), # logit is default, so the argument can be simple
data2 = list(vcv_matrix = vcv_matrix), # it is necessary to introduce the covariance matrix which is #done using the data2 argument.
prior = c(
prior_string("normal(0,2)", class = "b"), # sets priors for fixed effects that are factors
# set_prior(~cauchy(0,2), class = ~sd, group = "scinam")
#set_prior("cauchy(0,2)", class = "sd", group = "species"), #sets prior for random effect of species names
prior(student_t(3,0,20), "sd")
#set_prior ("student_t(3,0,5)", class = "sd", group = "(1|scinam)"), #sets a Student_t distribution for the random effects
#set_prior ("student_t(3,0,5)", class = "sd", group = "vcv_matrix") #sets a Student_t distribution for the covariance matrix.
),
warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95),
# trials = my_data$trials
)
summary(fit_simple_model)
Reference page on priors: https://paulbuerkner.com/brms/reference/set_prior.html
Code for debugging with simpler model
# Simpler model
# Simplify the model
formula = red_car ~ sex
#get_prior(formula, data = my_data) # this command will inform about the parameter classes when factors (sex) are used as predictors
#To combine multiple priors, use c(...)
#prior_res <- default_prior(formula, data=my_data)
#fit1 <- brm(
formula,
data = my_data,
family = binomial(),
prior = default_prior(formula, data=my_data),
# prior= get_prior(formula, data=my_data),
warmup = 500, iter = 1000, chains = 4, control = list(adapt_delta = 0.95)
)
# Summarize the model
summary(fit1)
Simple code from post. This code goes further but gets stuck failing to use the g++ compiler (working on the Rtools setup at the moment).
data_simple <- read.table(
"https://paul-buerkner.github.io/data/data_simple.txt",
header = TRUE
)
model_simple <- brm(
phen ~ cofactor,
data = data_simple,
family = gaussian(),
prior = c(
prior(normal(0, 10), "b"),
prior(normal(0, 50), "Intercept")
)
)
We will need to run BayesTraits from the BTW package (Pagel & Meade, 2006).
It is used for state reconstruction, comparing most likely ancestral states for a given one, throughout the phylogenetic tree.