11E1. If an event has probability 0.35, what are the log-odds of this event?

p <- 0.35
p/(1-p)
## [1] 0.5384615

11E2. If an event has log-odds 3.2, what is the probability of this event?

lo <- 3.2
lo/(1+lo)
## [1] 0.7619048

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

#Log-odds will increase by 1.7 times

11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.

#When time is different for two events, offset will be needed. Offset can bring all observations into the same scale.
#For example, a number of manuscripts produced by the monastery is measured on the daily or weekly basis, and offset parameter can convert the measurements to daily basis.

11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

#The aggregated model contains C(n,m) multiplier. This multiplier is converted to an additional constant at the log-scale of the likelihood. For non-aggregated model, each event is modeled independently as a number of heads in the single drop of the coin.

11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in the outcome?

#Log of outcome will increase by 1.7 times

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

#A link function's job is to map the linear space of a model like alpha + beta * (xi-x_bar) onto the non-linear space of a parameter like log(theta). So f is chosen with that goal in mind.
# The logit link maps a parameter that is defined as a probability mass, and therefore constrained to lie between zero and one, which is the output of binomial generalized linear model.

11M4. Explain why the log link is appropriate for a Poisson generalized linear model.

#Log link function maps a parameter that is defined over only positive real values onto a linear model. This works well with Poisson distribution, where the outcome are counts and always positive values.

11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?

#Log link function maps a parameter that is defined over only positive real values onto a linear model. This works well with Poisson distribution, where the outcome are counts and always positive values.

11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?

#When each trial must result in one of two events and the expected value is constant.
#As a special case of the binomial, it has maximum entropy under exactly the same constraints.

11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0,10). Re-estimate the posterior using both ulam and quap. Do the differences increase or decrease? Why?

data(chimpanzees)
d <- chimpanzees
d$treatment <- 1 + d$prosoc_left + 2*d$condition

m11.1 <- quap(
    alist(
        pulled_left ~ dbinom( 1 , p ) ,
        logit(p) <- a ,
        a ~ dnorm( 0 , 10 )
) , data=d )

m11.2 <- quap(
    alist(
        pulled_left ~ dbinom( 1 , p ) ,
        logit(p) <- a + b[treatment] ,
        a ~ dnorm( 0 , 1.5 ),
        b[treatment] ~ dnorm( 0 , 10 )
    ) , data=d )
set.seed(1999)

11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

data(Kline)
d <- Kline
d$P <- scale( log(d$population) )
d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )

11H1. Use WAIC or PSIS to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

data("chimpanzees")

d <- chimpanzees
# model with single intercept for all actors
m11.3 <- map(
  alist(
    pulled_left ~ dbinom(1, p) ,
    logit(p) <- a + (bp + bpC*condition)*prosoc_left ,
    a ~ dnorm(0,10) ,
    bp ~ dnorm(0,10) ,
    bpC ~ dnorm(0,10)
  ), data=d )

# model with intercept per actor, but with no coefficient for condition variable

m11.2 <- map(
  alist(
    pulled_left ~ dbinom(1, p) ,
    logit(p) <- a + bp*prosoc_left ,
    a ~ dnorm(0,10) ,
    bp ~ dnorm(0,10)
  ),
  data=d )

# model with intercept per actor but no coefficient for condition and for prosoc option
m11.1 <- map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a ,
    a ~ dnorm(0,10)
  ),
  data=d )

# compare
compare(m11.1,m11.2,m11.3)
##           WAIC       SE    dWAIC      dSE     pWAIC     weight
## m11.2 680.4441 9.344763 0.000000       NA 1.9741680 0.70795632
## m11.3 682.3403 9.354362 1.896185 0.752352 2.9991969 0.27431847
## m11.1 687.8189 7.206162 7.374788 6.179243 0.9385454 0.01772521
#A model that has a single intercept for all actors is the worst one (has largest WAIC) and gains no score in per model weights distribution.
#A model that consists of single intercept per actor is comparable by WAIC, but still has a difference with the best model that is greater than deviance of the difference. Thus it also gains no score in model weights distribution.
#There is a tiny difference in WAIC between models with and without `condition` variable. Model without this variable looks better in terms of estimated WAIC.