Chapter 11 - God Spiked the Integers

This chapter described some of the most common generalized linear models, those used to model counts. It is important to never convert counts to proportions before analysis, because doing so destroys information about sample size. A fundamental difficulty with these models is that parameters are on a different scale, typically log-odds (for binomial) or log-rate (for Poisson), than the outcome variable they describe. Therefore computing implied predictions is even more important than before.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them.

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

11-1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

# #The reason is that while ordinary binomial and Poisson models can be aggregated
# and disaggregated across rows in the data, without changing any causal assumptions,
# the same is not true of beta-binomial and gamma-Poisson models. The reason is that a betabinomial
# or gamma-Poisson likelihood applies an unobserved parameter to each row in the
# data. When we then go to calculate log-likelihoods, how the data are structured will determine
# how the beta-distributed or gamma-distributed variation enters the model.
# For example, a beta-binomial model like the one examined earlier in this chapter has
# counts on each row. The rows were departments in that case, and all of the applications
# for each department were assumed to have the same unknown baseline probability of acceptance.
# What we’d like to do is treat each application as an observation, calculating WAIC over
# applications. But if we do that, then we lose the fact that the beta-binomial model implies
# the same latent

11-2. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Plot and compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0,10). Re-estimate the posterior using both ulam and quap. Plot and compare the posterior distributions. Do the differences increase or decrease? Why?

data("chimpanzees")
d <- chimpanzees
d$recipient <- NULL

# map
q2 <- map(alist(
  pulled_left ~ dbinom( 1 , p ) ,
  logit(p) <- a[actor] + (bp + bpC*condition)*prosoc_left ,
  a[actor] ~ dnorm(0,10),
  bp ~ dnorm(0,10),
  bpC ~ dnorm(0,10)
) ,
data=d)
pairs(q2)

11-3. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

data('Kline')
d <- Kline
d$P <- scale( log(d$population) )
d$id <- ifelse( d$contact=="high" , 2 , 1)
d
##       culture population contact total_tools mean_TU            P id
## 1    Malekula       1100     low          13     3.2 -1.291473310  1
## 2     Tikopia       1500     low          22     4.7 -1.088550750  1
## 3  Santa Cruz       3600     low          24     4.0 -0.515764892  1
## 4         Yap       4791    high          43     5.0 -0.328773359  2
## 5    Lau Fiji       7400    high          33     5.0 -0.044338980  2
## 6   Trobriand       8000    high          19     4.0  0.006668287  2
## 7       Chuuk       9200    high          40     3.8  0.098109204  2
## 8       Manus      13000     low          28     6.6  0.324317564  1
## 9       Tonga      17500    high          55     5.4  0.518797917  2
## 10     Hawaii     275000     low          71     6.6  2.321008320  1
# now

d <- subset(d, d$culture != "Hawaii")
d$P <- scale( log(d$population) )
d$id <- ifelse( d$contact=="high" , 2 , 1 )
d
##      culture population contact total_tools mean_TU          P id
## 1   Malekula       1100     low          13     3.2 -1.6838108  1
## 2    Tikopia       1500     low          22     4.7 -1.3532297  1
## 3 Santa Cruz       3600     low          24     4.0 -0.4201043  1
## 4        Yap       4791    high          43     5.0 -0.1154764  2
## 5   Lau Fiji       7400    high          33     5.0  0.3478956  2
## 6  Trobriand       8000    high          19     4.0  0.4309916  2
## 7      Chuuk       9200    high          40     3.8  0.5799580  2
## 8      Manus      13000     low          28     6.6  0.9484740  1
## 9      Tonga      17500    high          55     5.4  1.2653019  2
#same slope as Hawaii seems like the only outlier

11-4. Use WAIC or PSIS to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

data('chimpanzees')

d <- chimpanzees

d$treatment <- 1 + d$prosoc_left + 2 * d$condition

dat_list <- list(
  pulled_left = d$pulled_left,
  actor = d$actor,
  treatment = as.integer(d$treatment)
)

##only model

m11.1 <- quap(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a,
    a ~ dnorm(0, 10)
  ),
  data = d
)

# Intercept and Treatment 


m11.3 <- quap(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a + b[treatment],
    a ~ dnorm(0, 1.5),
    b[treatment] ~ dnorm(0, 0.5)
  ),
  data = d
)

# Individual Intercept and Treatment

m11.4 <- quap(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a[actor] + b[treatment],
    a[actor] ~ dnorm(0, 1.5),
    b[treatment] ~ dnorm(0, 0.5)
  ),
  data = dat_list
)

(comp <- compare(m11.1, m11.3, m11.4))
##           WAIC        SE    dWAIC      dSE    pWAIC       weight
## m11.4 532.3109 18.553594   0.0000       NA 8.027727 1.000000e+00
## m11.3 682.8030  9.229227 150.4922 18.07886 3.799772 2.094297e-33
## m11.1 688.0992  7.144819 155.7884 18.59873 1.079532 1.482472e-34
plot(comp)

# this proves that the model accounting for individual intercepts as well as treatment effects (m11.4) outperforms the simpler models

11-5. The data contained in data(salamanders) are counts of salamanders (Plethodon elongatus) from 47 different 49-m2 plots in northern California. The column SALAMAN is the count in each plot, and the columns PCTCOVER and FORESTAGE are percent of ground cover and age of trees in the plot, respectively. You will model SALAMAN as a Poisson variable. (a) Model the relationship between density and percent cover, using a log-link (same as the example in the book and lecture). Use weakly informative priors of your choosing. Check the quadratic approximation again, by comparing quap to ulam. Then plot the expected counts and their 89% interval against percent cover. In which ways does the model do a good job? A bad job? (b) Can you improve the model by using the other predictor, FORESTAGE? Try any models you think useful. Can you explain why FORESTAGE helps or does not help with prediction?

#loading the data
data('salamanders')
d <- salamanders
d$C <- standardize(d$PCTCOVER)
d$A <- standardize(d$FORESTAGE)


##poisson model
f <- alist(
  SALAMAN ~ dpois(lambda),
  log(lambda) <- a + bC * C,
  a ~ dnorm(0, 1),
  bC ~ dnorm(0, 1)
)

##stimulation
N <- 50 # 50 samples from prior
a <- rnorm(N, 0, 1)
bC <- rnorm(N, 0, 1)
C_seq <- seq(from = -2, to = 2, length.out = 30)
plot(NULL,
  xlim = c(-2, 2), ylim = c(0, 20),
  xlab = "cover(stanardized)", ylab = "salamanders"
)
for (i in 1:N) {
  lines(C_seq, exp(a[i] + bC[i] * C_seq), col = grau(), lwd = 1.5)
}

# let's make the prior a bit more informative

bC <- rnorm(N, 0, 0.5)
plot(NULL,
  xlim = c(-2, 2), ylim = c(0, 20),
  xlab = "cover(stanardized)", ylab = "salamanders"
)
for (i in 1:N) {
  lines(C_seq, exp(a[i] + bC[i] * C_seq), col = grau(), lwd = 1.5)
}

###updating specification and running

f <- alist(
  SALAMAN ~ dpois(lambda),
  log(lambda) <- a + bC * C,
  a ~ dnorm(0, 1),
  bC ~ dnorm(0, 0.5)
)

##ulam
# mH4a <- ulam(f, data = d, chains = 4) R getting crashed again

##quap
m5quap <- quap(f, data = d)

precis(m5quap)
##         mean        sd      5.5%     94.5%
## a  0.5080573 0.1384130 0.2868465 0.7292681
## bC 1.0317302 0.1640859 0.7694893 1.2939711