Assignment #9

Questions

11E1. If an event has probability 0.35, what are the log-odds of this event?

log(0.35/(1-0.35))

## [1] -0.6190392

11E2. If an event has log-odds 3.2, what is the probability of this event?

logistic(3.2)

## [1] 0.9608343

#log(0.96/(1-0.96))

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

exp(1.7)

## [1] 5.473947

# it means that 1 unit of x will imply 5.47 unit of propotional change in odds of the outcome

11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.

# The poisson distribution is used as a limiting distribution of binomial. sometimes it is more relevant to model rates instead of counts. This is relevant when individuals are not following same amount of time. For example, five cases over 1 years should not amount to the same as five cases over 10 years. So will use Tx to exposure the time of those with covariate x. In this case, the LogTx is the offset

11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

# The likelihood of the data does change when the data is converted between two formats because the aggregated form involves a extra log-odd factor

11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in the outcome?

exp(1.7)

## [1] 5.473947

# it means that 1 unit of x will imply 5.47 unit of change

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

# logit link essentially connects a parameter constrained between zero and one and the real space. The logit function is like logit(p_i) = log(p/(1-p))

# the p_i is a probability mass. The link will work for a GLM

curve(logit,from = -0.5,to = 1)

## Warning in log(x): NaNs produced

11M4. Explain why the log link is appropriate for a Poisson generalized linear model.

curve(log, from = -0.5, to = 100000)

## Warning in log(x): NaNs produced

11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?

# this is implies the mean mu lies between the zero and one. The Poisson distibution is defined by a single parameter. The premise of a poisson regression problem is the GLM models a count with an unknown maximum

# To fixed this problem, we can confine the poisson distribution to be followed only within a particular range.

# like the the covid-19 test problem can be constrains with a log(p/(s-p)) model

11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?

# The binomial distribution is defined to be the maximum entropy distribution are: 1. discrete binary outcomes 2. Constant probability

# this is defined by the number of outcomes as well as the probability. The experiment is to reduce a series of independent and identical bernoulli trails with only two outcomes.  The poisson distribution is derived as a limiting form of the binomial, where n -> ∞ and p -> 0. Since this does not change the underlying constraints, this is still a maximum entropy distribution.

11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Plot and compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0,10). Re-estimate the posterior using both ulam and quap. Plot and compare the posterior distributions. Do the differences increase or decrease? Why?

data('chimpanzees')

d<- chimpanzees
d$recipient<- NULL

m11_7<- rethinking::map(
    alist(
        pulled_left ~dbinom(1,p),
        logit(p)<- a[actor] + (bp+bpc*condition)*prosoc_left,
        a[actor] ~dnorm(0,10),
        bp ~ dnorm(0,10),
        bpc ~ dnorm(0,10)
    ),
    data = d
)

pairs(m11_7)

11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

library('dplyr')
data(Kline)
kDat <- Kline
kDat <- kDat %>% dplyr::mutate(cid=ifelse(contact == "high", 2, 1),
                               stdPop=standardize(log(population))) %>% filter(culture != "Hawaii")

dataList<- list(
    totTools = kDat$total_tools,
    stdPop = kDat$stdPop,
    cid = as.integer(kDat$cid)
)

m11_8 <- ulam(
  alist(
    totTools ~ dpois(lambda),
    log(lambda) <- a[cid] + b[cid]*stdPop,
    a[cid] ~ dnorm(3, 0.5),
    b[cid] ~ dnorm(0, 0.2)
  ),data = dataList, chains = 4, cores = 4
)

pairs(m11_8)

m11_8 %>% precis(2)

##           mean         sd        5.5%     94.5%    n_eff     Rhat4
## a[1] 3.1793203 0.12729884  2.96582405 3.3782197 1691.368 0.9998414
## a[2] 3.6117057 0.07382473  3.49626185 3.7277584 1779.533 1.0020825
## b[1] 0.1902573 0.12816463 -0.00860378 0.3971389 1653.918 1.0004470
## b[2] 0.1909199 0.15284178 -0.05823923 0.4389852 1765.421 0.9997861

11H1. Use WAIC or PSIS to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

data("chimpanzees")

d2 <- chimpanzees

m11h1_1 <- rethinking::map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a,
    a ~ dnorm(0,10)
  ),
  data = d2 )

m11h1_2 <- rethinking::map(
  alist(
    pulled_left ~ dbinom(1, p) ,
    logit(p) <- a + bp*prosoc_left,
    a ~ dnorm(0,10) ,
    bp ~ dnorm(0,10)
  ),
  data = d2 )

m11h1_3 <- rethinking::map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a + (bp + bpC*condition)*prosoc_left,
    a ~ dnorm(0,10),
    bp ~ dnorm(0,10),
    bpC ~ dnorm(0,10)
  ), 
  data = d2 )

m11h1_4 <- rethinking::map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a[actor] + (bp + bpC*condition)*prosoc_left,
    a[actor] ~ dnorm(0, 10),
    bp ~ dnorm(0, 10),
    bpC ~ dnorm(0, 10)
  ),
  data = d2)


compare(m11h1_1,m11h1_2,m11h1_3,m11h1_4)

##             WAIC        SE    dWAIC      dSE      pWAIC       weight
## m11h1_4 549.9617 18.554308   0.0000       NA 15.5689313 1.000000e+00
## m11h1_2 680.5102  9.232976 130.5485 17.98034  2.0057272 4.484884e-29
## m11h1_3 682.1850  9.317678 132.2233 17.92136  2.9205058 1.941175e-29
## m11h1_1 687.9396  7.156139 137.9780 18.83602  0.9996638 1.092599e-30

Assignment #9

JianweiLi

2021-04-05

Chapter 11 - God Spiked the Integers

Questions