Chapter 11 - God Spiked the Integers

This chapter described some of the most common generalized linear models, those used to model counts. It is important to never convert counts to proportions before analysis, because doing so destroys information about sample size. A fundamental difficulty with these models is that parameters are on a different scale, typically log-odds (for binomial) or log-rate (for Poisson), than the outcome variable they describe. Therefore computing implied predictions is even more important than before.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

11E1. If an event has probability 0.35, what are the log-odds of this event?

log(0.35/(1-0.35))
## [1] -0.6190392

11E2. If an event has log-odds 3.2, what is the probability of this event?

logistic(3.2)
## [1] 0.9608343

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

# It imply that log ods will change in exp(1.7) times
exp(1.7)
## [1] 5.473947

11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.

# Outcome variable can be measured on a different scale for each observation. So, to make all observations have same scale we use offset. For example from the chapter, where a number of manuscripts produced by the monastery is measured on the daily or weekly basis. The offset parameter is used to convert all measurements to the daily basis.  
# Number of animals in the area is another possible example where offset is helpful. Square of the area can be treated as an offset in this case.

11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

# Binomial likelihood in the aggregated form contains C(n,m) multiplier and this multiplier is converted to an additional constant at the log-scale of the likelihood. 
# Non-aggregated format, each event is modeled independently as a number of heads in the single drop of the coin.
# likelihood in the aggregated format: C(n,m)*p^m*(1-p)^n-m
# likelihood in the non-aggregated format: p^m*(1-p)^n-m

11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in the outcome?

# It implies that the change of the predictor by one unit increases the lambda parameter of the Poisson distribution in exp(1.7)=5.4739 times. Thus an average number of events per interval increases in 5.47 times.

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

# Binomial likelihood is parametrised by parameter p i.e. probability of an event but we are interested in modeling it with linear combination of the predictors and p should fall in the [0, 1] range. So, we model f(p) as linear relation a + b*x, f is selected in a such way that
# p = f^-1(a+b*x) is in [0,1] scale
# f=logit=log(p/1-p), f^-1=logistic=exp(a+b*x)/(1+exp(a+b*x))
# logit function ensures required contraint.

11M4. Explain why the log link is appropriate for a Poisson generalized linear model.

# Like the previous question, we are interested in modeling lambda i.e. parameter of the Poisson distribution with linear model. 
# Lambda should be in [0, +inf) range. An exponential function is an appropriate inverse of the link that maps (-inf, +inf) range of linear combination of parameters into (0,+inf). Then the link function should be logarithm as the inverse of exp.

11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?

# Using logit link implies that a lambda parameter of the Poisson likelihood always falls in the (0,1) range.
# We set such a constraint for some real process.
prop.table(table(rpois(1e+5, .1)))
## 
##       0       1       2       3 
## 0.90371 0.09148 0.00467 0.00014
# One of possible cases to model is a process with the high number of "zero" events and with the absence of exponential growth of the predicted rate. 

11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?

# Both Binominal and Poisson distributions at maximum entropy have same constraints. Poisson distriutions have higher constraints than binominal

11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Plot and compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0,10). Re-estimate the posterior using both ulam and quap. Plot and compare the posterior distributions. Do the differences increase or decrease? Why?

data(chimpanzees)
d <- chimpanzees
d$treatment <- 1 + d$prosoc_left + 2*d$condition

m11.1 <- quap(
    alist(
        pulled_left ~ dbinom( 1 , p ) ,
        logit(p) <- a ,
        a ~ dnorm( 0 , 10 )
) , data=d )

m11.2 <- quap(
    alist(
        pulled_left ~ dbinom( 1 , p ) ,
        logit(p) <- a + b[treatment] ,
        a ~ dnorm( 0 , 1.5 ),
        b[treatment] ~ dnorm( 0 , 10 )
    ) , data=d )
set.seed(1999)

11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

data(Kline)
d <- Kline
d$P <- scale( log(d$population) )
d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )

11H1. Use WAIC or PSIS to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

data("chimpanzees")

d <- chimpanzees

m1 <- map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a ,
    a ~ dnorm(0,10)
  ),
  data=d )

m2 <- map(
  alist(
    pulled_left ~ dbinom(1, p) ,
    logit(p) <- a + bp*prosoc_left ,
    a ~ dnorm(0,10) ,
    bp ~ dnorm(0,10)
  ),
  data=d )

m3 <- map(
  alist(
    pulled_left ~ dbinom(1, p) ,
    logit(p) <- a + (bp + bpC*condition)*prosoc_left ,
    a ~ dnorm(0,10) ,
    bp ~ dnorm(0,10) ,
    bpC ~ dnorm(0,10)
  ), data=d )

m4 <- map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a[actor] + (bp + bpC*condition)*prosoc_left,
    a[actor] ~ dnorm(0, 10),
    bp ~ dnorm(0, 10),
    bpC ~ dnorm(0, 10)
  ),
  data = d)

compare(m1, m2, m3, m4)
##        WAIC        SE    dWAIC      dSE      pWAIC       weight
## m4 545.3981 18.835083   0.0000       NA 13.2449983 1.000000e+00
## m2 680.4400  9.339840 135.0419 18.23978  1.9721212 4.742632e-30
## m3 682.3484  9.324054 136.9503 18.17123  3.0026928 1.826481e-30
## m1 687.8226  7.209368 142.4246 19.06513  0.9403586 1.182761e-31
#  a model that has largest WAIC is the worst and gains no score in per model weights distribution.