Chapter 11 - God Spiked the Integers

This chapter described some of the most common generalized linear models, those used to model counts. It is important to never convert counts to proportions before analysis, because doing so destroys information about sample size. A fundamental difficulty with these models is that parameters are on a different scale, typically log-odds (for binomial) or log-rate (for Poisson), than the outcome variable they describe. Therefore computing implied predictions is even more important than before.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

11E1. If an event has probability 0.35, what are the log-odds of this event?

log_odds <- log(0.35/(1-0.35))
log_odds
## [1] -0.6190392

11E2. If an event has log-odds 3.2, what is the probability of this event?

prob <- 1/(1+exp(-3.2))
prob
## [1] 0.9608343

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

coeff <- exp(1.7)
coeff
## [1] 5.473947
#What this means is that we will see a value of 1.7 implies the log-odds will result a 5.47-time change in the odds of the outcome.

11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.

#Poisson regressions are measured as events per unit time so in order to make sure they're on scale, an offset is required sometimes. Such as weather forcast.

11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

#The aggregated model contains an extra factor in its log-probabilities due to the way the data organized. When we use dbinom(), it contains a term for all the orders the successes could appear in trials. It counts all the ways you could see successes in trials.
#This makes the aggregated probabilities larger, so the PSIS/WAIC scores end up being smaller.

11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in the outcome?

#For a one unit change in the predictor variable, the difference in the log of outcome is expected to change by 1.7.

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

#The logit link maps a parameter that is defined as a probability mass, and therefore constrained to be between 0 and 1, which is the output of binomial generalized linear model.

11M4. Explain why the log link is appropriate for a Poisson generalized linear model.

#The log link maps a parameter that is defined over only positive real values onto a linear model. which works well with Poisson distribution, and the outcome are counts and always positive values.

11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?

#It implies that the outcome of a Poisson generalized model will be positive and between 0 and 1 (inclusive).

11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?

#The constraints for binomial and Poisson distributions to have maximum entropy are: for binomial, binary events and fixed probability. When we talk about Poisson, the variance must be equal to expected values. The constraints are different because the distributions are different. 

11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Plot and compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0,10). Re-estimate the posterior using both ulam and quap. Plot and compare the posterior distributions. Do the differences increase or decrease? Why?

data("chimpanzees")
df <-  chimpanzees
df$recipient <- NULL

#Re-estimating using quap
m <- quap(
          alist(
            pulled_left ~ dbinom(1,p),
            logit(p) <- a[actor] + (xp + xpc * condition)*prosoc_left,
            a[actor] ~ dnorm(0,10),
            xp ~ dnorm(0,10),
            xpc ~ dnorm(0,10)
          ),
          data = df
)
pairs(m)

#Posterior mean and standard deviation are very close. MCMC model's posterior std is slightly higher.

11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

data(Kline)
d <- Kline
d$P <- scale( log(d$population) )
d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )
d
##       culture population contact total_tools mean_TU            P contact_id
## 1    Malekula       1100     low          13     3.2 -1.291473310          1
## 2     Tikopia       1500     low          22     4.7 -1.088550750          1
## 3  Santa Cruz       3600     low          24     4.0 -0.515764892          1
## 4         Yap       4791    high          43     5.0 -0.328773359          2
## 5    Lau Fiji       7400    high          33     5.0 -0.044338980          2
## 6   Trobriand       8000    high          19     4.0  0.006668287          2
## 7       Chuuk       9200    high          40     3.8  0.098109204          2
## 8       Manus      13000     low          28     6.6  0.324317564          1
## 9       Tonga      17500    high          55     5.4  0.518797917          2
## 10     Hawaii     275000     low          71     6.6  2.321008320          1

11H1. Use WAIC or PSIS to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

m2 <- map(
          alist(
            pulled_left ~ dbinom(1,p),
            logit(p) <- x,
            x ~ dnorm(0,10)
          ),
          data = df
)

m3 <- map(
          alist(
            pulled_left ~ dbinom(1,p),
            logit(p) <- x + xp*prosoc_left,
            x ~ dnorm(0, 10),
            xp ~ dnorm(0,10)
          ),
          data = df
)

m4 <- map(
          alist(
            pulled_left ~ dbinom(1,p),
            logit(p) <- x + (xp + xpc*condition)*prosoc_left,
            x ~ dnorm(0, 10),
            xp ~ dnorm(0,10),
            xpc ~ dnorm(0,10)
          ),
          data = df
)

m5 <- map(
          alist(
            pulled_left ~ dbinom(1,p),
            logit(p) <- x[actor] + (xp + xpc*condition)*prosoc_left,
            x[actor] ~ dnorm(0, 10),
            xp ~ dnorm(0,10),
            xpc ~ dnorm(0,10)
          ),
          data = df
)
#compare the models
compare(m2, m3, m4, m5)
##        WAIC        SE    dWAIC      dSE      pWAIC       weight
## m5 545.3054 18.784009   0.0000       NA 13.4221409 1.000000e+00
## m3 680.6520  9.289898 135.3466 18.14412  2.0775940 4.072504e-30
## m4 682.5859  9.256050 137.2805 18.06981  3.1188982 1.548517e-30
## m2 687.8825  7.206802 142.5771 18.97656  0.9703851 1.095919e-31
#The results indicate that m5 with unique intercept for each actor has the least WAIC value with a weight of 1 whereas other models have a much closer WAIC values and weight of 0. Based on pWAIC values, m5 seems to be more flexible to fit the data. The standard error value is also higher for m5 when compared with that of other models.