Assignment #9

Questions

11E1. If an event has probability 0.35, what are the log-odds of this event?

log( 0.35 / (1 - 0.35))

## [1] -0.6190392

11E2. If an event has log-odds 3.2, what is the probability of this event?

1 / (1 + exp(-3.2))

## [1] 0.9608343

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

exp(1.7)

## [1] 5.473947

# A change from 0 to 1 in variable for this coefficient implies that there will be a proportional increase of 5.47 in the odds of the outcome. The odds would increase by 447%.

11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.

# For example, if we want to compare the sales for two different stores . Store A might summarize the sales data daily while Store B may summarize the sales every 2 days. For each observation of sales of Store A, the exposure time is then one day while for Store B it is 2 days. The offset is then the log of the sales time.

11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

# The aggregated model contains an extra factor in its log probabilities. When the data is converted between the two formats c(n,m) multiplier is converted to constant.

11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in the outcome?

# If the predictor variable associated with this coefficient goes up by one unit, then the rate is multiplied by about 5.5 (exp(1.7)). This said, there are on average 5.5 times more events happening in the same time interval.

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

# The binomial generalized linear model has as its main parameter a probability. 
# The logit link maps a probability (i.e. a value between 0 and 1) onto the line and the mapped value can then be used for a linear model.

11M4. Explain why the log link is appropriate for a Poisson generalized linear model.

# lambda is constrained to be positive. The log function maps positive value onto R and thus the function links count values (positive values) to a linear model.

11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?

# Using a logit link for the parameter lambda in a Poisson model would imply that lambda is constrained to be between 0 and 1. 
# If lambda is between 0 and 1, then there are on average less than one event per time interval. This could be useful in cases where there would be at maximum one event per interval.

11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?

# Both the binomial and the Poisson distribution have maximum entropy when:
# 1. each trial results in one of two events and
# 2. the expected value is constant.

# The Poisson distribution is a special case of the Binomial and thus it has maximum entropy under the same constraints.

11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0,10). Re-estimate the posterior using both ulam and quap. Do the differences increase or decrease? Why?

data("chimpanzees")
d <- chimpanzees
d$recipient <- NULL

# map
q2 <- map(alist(
  pulled_left ~ dbinom( 1, p),
  logit(p) <- a[actor] + (bp + bpC*condition)*prosoc_left ,
  a[actor] ~ dnorm(0,10),
  bp ~ dnorm(0,10),
  bpC ~ dnorm(0,10)
) ,
data=d)
pairs(q2)

11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

data(Kline)
d <- Kline
d$P <- scale( log(d$population) )
d$contact_id <- ifelse( d$contact=="high", 2, 1)

11H1. Use WAIC or PSIS to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

data("chimpanzees")

d <- chimpanzees

m11.1 <- map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a ,
    a ~ dnorm(0,10)
  ),
  data=d )

m11.2 <- map(
  alist(
    pulled_left ~ dbinom(1, p) ,
    logit(p) <- a + bp*prosoc_left ,
    a ~ dnorm(0,10) ,
    bp ~ dnorm(0,10)
  ),
  data=d )

m11.3 <- map(
  alist(
    pulled_left ~ dbinom(1, p) ,
    logit(p) <- a + (bp + bpC*condition)*prosoc_left ,
    a ~ dnorm(0,10) ,
    bp ~ dnorm(0,10) ,
    bpC ~ dnorm(0,10)
  ), data=d )

m11.4 <- map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a[actor] + (bp + bpC*condition)*prosoc_left,
    a[actor] ~ dnorm(0, 10),
    bp ~ dnorm(0, 10),
    bpC ~ dnorm(0, 10)
  ),
  data = d)

compare(m11.1,m11.2,m11.3,m11.4)

##           WAIC        SE    dWAIC      dSE      pWAIC       weight
## m11.4 549.9380 18.618504   0.0000       NA 15.7574776 1.000000e+00
## m11.2 680.6383  9.425802 130.7002 18.09042  2.0701622 4.157230e-29
## m11.3 682.2474  9.453927 132.3094 18.02927  2.9536308 1.859414e-29
## m11.1 687.8677  7.141689 137.9296 18.90440  0.9637598 1.119330e-30

Assignment #9

Zhengyuan Huang

2021-02-10

Chapter 11 - God Spiked the Integers

Questions