11E1. If an event has probability 0.35, what are the log-odds of this event?

p <- 0.35
p/(1-p)
## [1] 0.5384615

11E2. If an event has log-odds 3.2, what is the probability of this event?

l <- 3.2
l/(1+l)
## [1] 0.7619048

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

exp(1.7)
## [1] 5.473947

11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.

# Because offset can help bringing all observations on the same scale. 
# An example could be if the number of events is measured on the daily or weekly basis, the offset parameter can be used to convert all measurements to the daily basis.

11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

#Because likelihood in these two formats are different. When converting likelihood in the aggregated form to the non-aggregated format, the c(n,m) multiplier is converted to an constant at the log-scale.   

11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in the outcome?

#It implies that a unit increase in x increase the outcome by exp(1.7) times. 

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

#Becayse the logit link maps a parameter that is defined as a probability mass, and therefore constrained to lie between zero and one, onto a linear model that can take on any real value. This link is extremely common when working with binomial GLMs.

11M4. Explain why the log link is appropriate for a Poisson generalized linear model.

#Because using a log link can constrain the parameter to be positive.

11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?

# Using logit link implies that a lambda parameter of the Poisson likelihood always falls in [0, +inf) range.

11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?

#The constraints are not different at all. The binomial distribution has maximum entropy when each trial must result in one of two events and the expected value is constant. While Poisson has more constraints which require its variance is equal to the expected value and both are constant.

11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0,10). Re-estimate the posterior using both ulam and quap. Do the differences increase or decrease? Why?

library(rethinking)
data(chimpanzees)
d <- chimpanzees
d$treatment <- 1 + d$prosoc_left + 2*d$condition

m11.1 <- quap(
    alist(
        pulled_left ~ dbinom( 1 , p ) ,
        logit(p) <- a ,
        a ~ dnorm( 0 , 10 )
) , data=d )

m11.2 <- quap(
    alist(
        pulled_left ~ dbinom( 1 , p ) ,
        logit(p) <- a + b[treatment] ,
        a ~ dnorm( 0 , 1.5 ),
        b[treatment] ~ dnorm( 0 , 10 )
    ) , data=d )

11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

data(Kline)
d <- Kline
d$P <- scale( log(d$population) )
d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )

11H1. Use WAIC or PSIS to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

d <- chimpanzees
d$treatment <- 1 + d$prosoc_left + 2*d$condition
m11.1 <- quap(
    alist(
        pulled_left ~ dbinom( 1 , p ) ,
        logit(p) <- a ,
        a ~ dnorm( 0 , 10 )
) , data=d )

m11.2 <- quap(
    alist(
        pulled_left ~ dbinom( 1 , p ) ,
        logit(p) <- a + b[treatment] ,
        a ~ dnorm( 0 , 1.5 ),
        b[treatment] ~ dnorm( 0 , 10 )
    ) , data=d )
set.seed(167)

m11.3 <- quap(
    alist(
        pulled_left ~ dbinom( 1 , p ) ,
        logit(p) <- a + b[treatment] ,
        a ~ dnorm( 0 , 1.5 ),
        b[treatment] ~ dnorm( 0 , 0.5 )
    ) , data=d )
set.seed(1999)
prior <- extract.prior( m11.3 , n=1e4 )
p <- sapply( 1:4 , function(k) inv_logit( prior$a + prior$b[,k] ) )

mean( abs( p[,1] - p[,2] ) )
## [1] 0.09838663
dat_list <- list(
    pulled_left = d$pulled_left,
    actor = d$actor,
    treatment = as.integer(d$treatment) )

m11.4 <- ulam(
    alist(
        pulled_left ~ dbinom( 1 , p ) ,
        logit(p) <- a[actor] + b[treatment] ,
        a[actor] ~ dnorm( 0 , 1.5 ),
        b[treatment] ~ dnorm( 0 , 0.5 )
    ) , data=dat_list , chains=4 , log_lik=TRUE )
## 
## SAMPLING FOR MODEL '80e2b6267e3dc4ff0c2916d0cf0879e8' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 1.108 seconds (Warm-up)
## Chain 1:                1.044 seconds (Sampling)
## Chain 1:                2.152 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '80e2b6267e3dc4ff0c2916d0cf0879e8' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 1.022 seconds (Warm-up)
## Chain 2:                0.843 seconds (Sampling)
## Chain 2:                1.865 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '80e2b6267e3dc4ff0c2916d0cf0879e8' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 1.027 seconds (Warm-up)
## Chain 3:                0.777 seconds (Sampling)
## Chain 3:                1.804 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '80e2b6267e3dc4ff0c2916d0cf0879e8' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0.001 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 1.023 seconds (Warm-up)
## Chain 4:                0.898 seconds (Sampling)
## Chain 4:                1.921 seconds (Total)
## Chain 4:
precis( m11.4 , depth=2 )
##             mean        sd        5.5%       94.5%     n_eff    Rhat4
## a[1] -0.45452838 0.3278455 -0.97428979  0.05633733  618.6859 1.007790
## a[2]  3.87570394 0.7275517  2.76657713  5.14747922 1306.5162 1.002516
## a[3] -0.75030298 0.3319683 -1.27857582 -0.22199795  658.6755 1.006012
## a[4] -0.74646697 0.3353481 -1.28305292 -0.23289459  671.8760 1.010255
## a[5] -0.45025996 0.3204826 -0.97250608  0.03983956  872.4934 1.001939
## a[6]  0.47748575 0.3135428 -0.01167141  0.98821730  629.0855 1.009149
## a[7]  1.95036354 0.4097641  1.29196633  2.60978487  986.1479 1.001770
## b[1] -0.03940276 0.2789146 -0.47517233  0.39103998  683.5683 1.007147
## b[2]  0.48303991 0.2798537  0.04147342  0.91432370  633.8484 1.005394
## b[3] -0.38082565 0.2764895 -0.82897775  0.05710524  632.3770 1.006771
## b[4]  0.37491068 0.2740707 -0.06246763  0.80819481  592.1209 1.008426
compare(m11.1,m11.2,m11.3,m11.4)
## Warning in compare(m11.1, m11.2, m11.3, m11.4): Not all model fits of same class.
## This is usually a bad idea, because it implies they were fit by different algorithms.
## Check yourself, before you wreck yourself.
##           WAIC        SE    dWAIC      dSE    pWAIC       weight
## m11.4 532.5113 18.911418   0.0000       NA 8.605149 1.000000e+00
## m11.3 682.4238  9.151896 149.9125 18.37400 3.614829 2.798488e-33
## m11.2 683.0118  9.698420 150.5005 18.45498 3.973591 2.085636e-33
## m11.1 687.9868  7.112314 155.4755 18.93168 1.023361 1.733464e-34
compare(m11.1,m11.2,m11.3,m11.4, func=PSIS)
## Warning in compare(m11.1, m11.2, m11.3, m11.4, func = PSIS): Not all model fits of same class.
## This is usually a bad idea, because it implies they were fit by different algorithms.
## Check yourself, before you wreck yourself.
##           PSIS        SE    dPSIS      dSE     pPSIS       weight
## m11.4 532.5646 18.931982   0.0000       NA 8.6317936 1.000000e+00
## m11.3 682.5845  9.033233 150.0199 18.36321 3.6560748 2.652119e-33
## m11.2 683.0146  9.636632 150.4500 18.44684 3.9801957 2.138978e-33
## m11.1 687.9020  7.186091 155.3374 18.94611 0.9805965 1.857411e-34