Dummy data analysis

Set up

# Load packages
library(readxl)
library(brms)
library(dplyr)
library(tidybayes)
library(bayesplot)

# Read in data
DF1 <- read_excel("Qry_Example.xlsx")

Examine the data set and do some basic manipulation

### Convert key variables to right type
# Convert character to numeric 
DF1$numberOccupants <- as.numeric(DF1$numberOccupants)
DF1$UnitConsumption <- as.numeric(DF1$UnitConsumption)
DF1$income <- as.numeric(DF1$income)

# Convert character to factor 
DF1$speciesFK <- as.factor(DF1$speciesFK)
DF1$householdCharPK <- as.factor(DF1$householdCharPK)
  
### Convert '-9' to NA 
DF1[DF1 == -9] <- NA

### Divide the units consumed by the household number 
DF1$Con_per_cap <- DF1$UnitConsumption/DF1$numberOccupants

### Convert from RWF to USD
DF1$income_USD <- DF1$income*0.00100 
DF1$income_USD <- DF1$income_USD/100

### Subset to complete cases
# Subset variables for use in dummy analysis 
keep <- c("Con_per_cap", "householdCharPK", "speciesFK", "income_USD")
DF2 <- DF1[,keep]

# Number of original observations 
nrow(DF2)

## [1] 5610

# Subset to complete cases 
DF2 %>%
  filter(complete.cases(DF2)) -> DF2  

# Drop observations where there are less than 10 species 
DF2 %>% 
  group_by(speciesFK) %>% 
  filter(n() >= 10) -> DF3

# Drop unused levels
DF3$speciesFK <- droplevels(DF3$speciesFK)

# How many observations per species?
table(DF3$speciesFK)

## 
##  115  176  233  405  410  413  414  416  417  424 
##   29   13   22  200   12  857 1195 2222   69   16

# Number of complete observations
nrow(DF3)

## [1] 4635

Implement a hypothetical Bayesian mixed-effects model

In this hypothetical analysis:

The response variable is units consumed per household member in the last 24 hours
The explanatory variables are income (hundreds of USD) and species
And the random effect is household (since there are multiple observations per household)

The hyperparameters are:

For income, we’re specifying a normally distributed prior of the form: N(0.25, 0.50). In other words, for every unit (100 USD) change in income, we expect a 0.25 unit increase in consumption, with a SD of 0.50 (based on research on the study ‘Whats his face, et al. 2020’.)
All other variables use brms default weakly informative priors

The model will be run with:

A random seem of 123
Four MCMC chains
For 1,000 burn-in iterations followed by 1,000 post burn in iteration
For the time being, we’re treating units as a continuous variable (but this isn’t strictly correct)

### Set priors 
# Get default priors
prior <- get_prior(Con_per_cap ~ income_USD + speciesFK + (1|householdCharPK),
                    data = DF3, family = gaussian())

# Change the prior describing the expected association between income and consumption 
prior$prior[2] <- "normal(0.25, 0.50)"


# Implement the analysis 
Dum_model <- brm(formula = Con_per_cap ~  income_USD + speciesFK + (1|householdCharPK), 
              data    = DF3, prior   = prior,  seed    = 123, family = gaussian(),  chains = 4, iter = 2000,warmup = 1000 )

## 
## SAMPLING FOR MODEL '59a4e9b3d39e19b573c66e073f3efdf5' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.001 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 30.65 seconds (Warm-up)
## Chain 1:                12.995 seconds (Sampling)
## Chain 1:                43.645 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '59a4e9b3d39e19b573c66e073f3efdf5' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 29.682 seconds (Warm-up)
## Chain 2:                12.503 seconds (Sampling)
## Chain 2:                42.185 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '59a4e9b3d39e19b573c66e073f3efdf5' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 26.673 seconds (Warm-up)
## Chain 3:                11.888 seconds (Sampling)
## Chain 3:                38.561 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '59a4e9b3d39e19b573c66e073f3efdf5' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 27.994 seconds (Warm-up)
## Chain 4:                12.796 seconds (Sampling)
## Chain 4:                40.79 seconds (Total)
## Chain 4:

Check for convergence

Normally I’d use all the steps in the WAMBS-checklist (When to worry and how to Avoid the Misuse of Bayesian Statistics). But here I’ll just look at the trace plots.

# Examine the trace plot 
plot(Dum_model, pars = 1:11, plot.type = "trace")

Examine posterior distibution

# The fixed effects
pars_view <- c( "b_Intercept","b_income_USD", "b_speciesFK176","b_speciesFK233", "b_speciesFK405","b_speciesFK410","b_speciesFK413","b_speciesFK414","b_speciesFK416","b_speciesFK417", "b_speciesFK424")

# 
bayesplot::mcmc_areas(
  Dum_model, 
  pars = pars_view,
  prob = 0.90, 
  prob_outer = 0.95, 
  point_est = "mean"
)