Assignment #9

Questions

11E1. If an event has probability 0.35, what are the log-odds of this event?

p = 0.35
log(p/(1-p))

## [1] -0.6190392

11E2. If an event has log-odds 3.2, what is the probability of this event?

logistic(3.2)

## [1] 0.9608343

#It is log(0.96/(1-0.96))

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

exp(1.7)

## [1] 5.473947

# it means 1 unit of x will imply 5.47 unit of propotional change in odds of the outcome

11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.

# POison limits the distribution of binomials. At times, it is relevant to model rates instead of counts when individuals are not following same amount of time. 10 hikes in 1 year won't amount to the same as 10 hikes in 10 years. So, the LogTx is the offset.

11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

# The likelihood of the data changes when data is converted between two formats because the aggregated form involves a extra log-odd factor

11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in the outcome?

exp(1.7)

## [1] 5.473947

#  This means that 1 unit of x will imply 5.47 unit of change

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

# logit link connects a parameter constrained between zero and one and the real space. The logit function is like logit(p_i) = log(p/(1-p))
# the p_i is a probability mass. It works for a GLM

curve(logit,from = -0.5,to = 1)

## Warning in log(x): NaNs produced

11M4. Explain why the log link is appropriate for a Poisson generalized linear model.

curve(log, from = -0.5, to = 100000)

## Warning in log(x): NaNs produced

11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?

# mean mu lies is (0,1). The Poisson distibution is defined by a single parameter. The premise of a poisson regression problem is the GLM models a count with an unknown maximum. To fix this problem, we confine the poisson distribution to be followed only within a particular range. Covid-19 test problem can be constrained with a log(p/(s-p)) model

11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?

# The binomial distribution is defined to be the maximum entropy distribution are discrete binary outcomes and Constant probability. This is defined by the number of outcomes as well as the probability. The experiment is to reduce a series of independent and identical bernoulli trails with only two outcomes.  The poisson distribution is derived as a limiting form of the binomial, where n -> ∞ and p -> 0. Since this does not change the underlying constraints, this is still a maximum entropy distribution.

11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Plot and compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0,10). Re-estimate the posterior using both ulam and quap. Plot and compare the posterior distributions. Do the differences increase or decrease? Why?

data('chimpanzees')

d<- chimpanzees
d$recipient<- NULL

m11_7<- rethinking::map(
    alist(
        pulled_left ~dbinom(1,p),
        logit(p)<- a[actor] + (bp+bpc*condition)*prosoc_left,
        a[actor] ~dnorm(0,10),
        bp ~ dnorm(0,10),
        bpc ~ dnorm(0,10)
    ),
    data = d
)

pairs(m11_7)

11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

library('dplyr')

## Warning: package 'dplyr' was built under R version 4.0.5

data(Kline)
kDat <- Kline
kDat <- kDat %>% dplyr::mutate(cid=ifelse(contact == "high", 2, 1),
                               stdPop=standardize(log(population))) %>% filter(culture != "Hawaii")

dataList<- list(
    totTools = kDat$total_tools,
    stdPop = kDat$stdPop,
    cid = as.integer(kDat$cid)
)

m11_8 <- ulam(
  alist(
    totTools ~ dpois(lambda),
    log(lambda) <- a[cid] + b[cid]*stdPop,
    a[cid] ~ dnorm(3, 0.5),
    b[cid] ~ dnorm(0, 0.2)
  ),data = dataList, chains = 4, cores = 4
)

pairs(m11_8)

m11_8 %>% precis(2)

##           mean         sd        5.5%     94.5%    n_eff     Rhat4
## a[1] 3.1815692 0.12118453  2.98720343 3.3790179 1644.467 0.9993562
## a[2] 3.6103645 0.07240292  3.49151389 3.7225798 1869.625 1.0013828
## b[1] 0.1927150 0.12555077 -0.01408238 0.3924939 1720.325 0.9987851
## b[2] 0.1961452 0.15388000 -0.05168369 0.4391618 1755.596 0.9995042

11H1. Use WAIC or PSIS to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

data("chimpanzees")

d2 <- chimpanzees

m11h1_1 <- rethinking::map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a,
    a ~ dnorm(0,10)
  ),
  data = d2 )

m11h1_2 <- rethinking::map(
  alist(
    pulled_left ~ dbinom(1, p) ,
    logit(p) <- a + bp*prosoc_left,
    a ~ dnorm(0,10) ,
    bp ~ dnorm(0,10)
  ),
  data = d2 )

m11h1_3 <- rethinking::map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a + (bp + bpC*condition)*prosoc_left,
    a ~ dnorm(0,10),
    bp ~ dnorm(0,10),
    bpC ~ dnorm(0,10)
  ), 
  data = d2 )

m11h1_4 <- rethinking::map(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a[actor] + (bp + bpC*condition)*prosoc_left,
    a[actor] ~ dnorm(0, 10),
    bp ~ dnorm(0, 10),
    bpC ~ dnorm(0, 10)
  ),
  data = d2)


compare(m11h1_1,m11h1_2,m11h1_3,m11h1_4)

##             WAIC        SE    dWAIC      dSE      pWAIC       weight
## m11h1_4 541.2689 18.978585   0.0000       NA 11.3178162 1.000000e+00
## m11h1_2 680.6890  9.289821 139.4201 18.41362  2.0962568 5.312698e-31
## m11h1_3 682.5351  9.265899 141.2661 18.34107  3.0925264 2.110796e-31
## m11h1_1 687.9407  7.044027 146.6717 19.16187  0.9994704 1.414608e-32

Assignment #9

Neeraj Pradeep Sadarjoshi

2021-05-25

Chapter 11 - God Spiked the Integers

Questions