Chapter 12 - Monsters and Mixtures

This chapter introduced several new types of regression, all of which are generalizations of generalized linear models (GLMs). Ordered logistic models are useful for categorical outcomes with a strict ordering. They are built by attaching a cumulative link function to a categorical outcome distribution. Zero-inflated models mix together two different outcome distributions, allowing us to model outcomes with an excess of zeros. Models for overdispersion, such as beta-binomial and gamma-Poisson, draw the expected value of each observation from a distribution that changes shape as a function of a linear model.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

12E1. What is the difference between an ordered categorical variable and an unordered one? Define and then give an example of each.

# Ordered categorical variable can be rankings or orders, such as the sizes of 
# a table, which can be large, medium and small, or level of happiness in work, 
# which can be high, medium and low.

# Unordered categories variable doesn't have internal defined ordering. 
# In other words, the order of these variables does not matter, such as the 5 
# regions of USA include Northeast, Southwest, West, Southeast, and Midwest, 
# where changing the order makes no difference.

12E2. What kind of link function does an ordered logistic regression employ? How does it differ from an ordinary logit link?

# An ordered logistic regression the employ ordered logit function. 
# It differs from an ordinary logit link because it is a log-cumulative-odds 
# link probability model and it returns a sum of probabilities of all levels 
# less than or equal to a given one, whereas an ordinary logit link represents 
# the odds of a particular value.

12E3. When count data are zero-inflated, using a model that ignores zero-inflation will tend to induce which kind of inferential error?

# It assumes that the result of the modeling process is more often zero than it 
# actually is, because zero may come from a different process.
# Using a model that ignores zero-inflation may result in underestimation of 
# the rate of events because a count distribution with extra zeros added to it 
# will have a lower mean.

12E4. Over-dispersion is common in count data. Give an example of a natural process that might produce over-dispersed counts. Can you also give an example of a process that might produce under-dispersed counts?

# Over-dispersion
# Over-dispersion is when changes are observed to be higher than expected. Some 
# distributions have no parameters to fit the variability of observations. 
# For example, a normal distribution does this with parameters, which are 
# constant in a typical regression. For example, if we count the wild cats in 
# different areas of a forest over a given period of time, it's very likely 
# the result will be over-dispersed since the rate at which wild cats are 
# observed will vary strongly across the different areas.

# Under-dispersion
# Under-dispersion shows less variation in the rates than would be expected. 
# Under-dispersion can occur when adjacent subgroups are correlated with each 
# other, namely autocorrelation. Using the same example as above, 
# if we count wild cats at each area of a forest through time in half-day intervals, 
# it's likely to end up with under-dispersed counts.

12M1. At a certain university, employees are annually rated from 1 to 4 on their productivity, with 1 being least productive and 4 most productive. In a certain department at this certain university in a certain year, the numbers of employees receiving each rating were (from 1 to 4): 12, 36, 7, 41. Compute the log cumulative odds of each rating.

rate <- c(12, 36, 7, 41)
q <- rate / sum(rate)
p <- cumsum(q)
cum_log_odds_rate <- log(p/(1-p))
cum_log_odds_rate
## [1] -1.9459101  0.0000000  0.2937611        Inf

12M2. Make a version of Figure 12.5 for the employee ratings data given just above.

proportion = rate/sum(rate)
proportion = sapply(1:4, function(i){ sum(proportion[1:i]) })
plot(y=proportion, x=1:4, type="b", ylim=c(0,1))
segments(1:4, 0, 1:4, proportion)
for(i in 1:4){segments(i+0.05, c(0,proportion)[i], i+0.05, proportion[i], col = "blue")}

12M3. Can you modify the derivation of the zero-inflated Poisson distribution (ZIPoisson) from the chapter to construct a zero-inflated binomial distribution?

# The probability of a zero, mixing together both processes: 
# Pr(0|p0, q, n) = p0 + (1 − p0)(1 − q)^n

# The probability of any particular non-zero observation y: 
# Pr(y|p0, q, n) = (1 − p0)(n!/(y!(n − y)!)(q^y)((1 − q)^(n−y))

12H1. In 2014, a paper was published that was entitled “Female hurricanes are deadlier than male hurricanes.”191 As the title suggests, the paper claimed that hurricanes with female names have caused greater loss of life, and the explanation given is that people unconsciously rate female hurricanes as less dangerous and so are less likely to evacuate. Statisticians severely criticized the paper after publication. Here, you’ll explore the complete data used in the paper and consider the hypothesis that hurricanes with female names are deadlier. Load the data with:

data(Hurricanes)
d <- Hurricanes

d$fmnnty_std <- (d$femininity - mean(d$femininity) )/sd(d$femininity)

m1 <- map2stan(
  alist(
    deaths ~ dpois(lambda),
    log(lambda) <- a + bf*fmnnty,
    a ~ dnorm(0,10),
    bf ~ dnorm(0,1)),
  data=list(
    deaths=d$deaths,
    fmnnty=d$fmnnty_std), 
  chains=4)
## 
## SAMPLING FOR MODEL '4976c6512dd23144717826d9c057a957' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 5.4e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.54 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.10436 seconds (Warm-up)
## Chain 1:                0.062768 seconds (Sampling)
## Chain 1:                0.167128 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '4976c6512dd23144717826d9c057a957' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 1.2e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.051025 seconds (Warm-up)
## Chain 2:                0.048528 seconds (Sampling)
## Chain 2:                0.099553 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '4976c6512dd23144717826d9c057a957' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 9e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.048989 seconds (Warm-up)
## Chain 3:                0.044732 seconds (Sampling)
## Chain 3:                0.093721 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '4976c6512dd23144717826d9c057a957' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 1e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.049441 seconds (Warm-up)
## Chain 4:                0.046906 seconds (Sampling)
## Chain 4:                0.096347 seconds (Total)
## Chain 4:
# Computing WAIC
m0 <- map2stan(
  alist(
    deaths ~ dpois(lambda),
    log(lambda) <- a,
    a ~ dnorm(0,10)),
  data=list(deaths=d$deaths), 
  chains=4)
## 
## SAMPLING FOR MODEL '224fd06bfe576afb1ad5296f7c5ed9c1' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 1.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.032695 seconds (Warm-up)
## Chain 1:                0.04201 seconds (Sampling)
## Chain 1:                0.074705 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '224fd06bfe576afb1ad5296f7c5ed9c1' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 1.6e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.026516 seconds (Warm-up)
## Chain 2:                0.034071 seconds (Sampling)
## Chain 2:                0.060587 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '224fd06bfe576afb1ad5296f7c5ed9c1' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 2.2e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.22 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.022049 seconds (Warm-up)
## Chain 3:                0.022417 seconds (Sampling)
## Chain 3:                0.044466 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '224fd06bfe576afb1ad5296f7c5ed9c1' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 9e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.032727 seconds (Warm-up)
## Chain 4:                0.020038 seconds (Sampling)
## Chain 4:                0.052765 seconds (Total)
## Chain 4:
# Comparing models:
compare(m0, m1)
##        WAIC        SE   dWAIC      dSE    pWAIC       weight
## m1 4406.715  995.3076  0.0000       NA 128.0429 1.000000e+00
## m0 4449.339 1075.2995 42.6242 145.4279  78.2631 5.549735e-10
# plot raw data
plot(d$fmnnty_std, d$deaths, pch=16,
      col=rangi2, xlab="femininity", ylab="deaths")

# compute model-based trend
pred_dat <- list(fmnnty = seq(from = -2, to = 1.5, length.out = 30))
lambda <- link(m1, data=pred_dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
lambda.mu <- apply(lambda, 2, mean)
lambda.PI <- apply(lambda, 2, PI)

# superimpose trend
lines(pred_dat$fmnnty, lambda.mu)
shade(lambda.PI, pred_dat$fmnnty)

# compute sampling distribution
deaths_sim <- sim(m1, data=pred_dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
deaths_sim.PI <- apply(deaths_sim, 2, PI)

# superimpose sampling interval as dashed lines
lines(pred_dat$fmnnty, deaths_sim.PI[1,], lty=2)
lines(pred_dat$fmnnty, deaths_sim.PI[2,], lty=2)

# plotting femininity of names data
plot(d$fmnnty_std, d$names, pch = 16, col= rangi2)

# compute model-based trend
pred_dat2 <- list(fmnnty = seq(from = -2, to = 1.5, length.out = 30))
lambda <- link(m1, data = pred_dat2)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
lambda.mu2 <- apply(lambda, 2, mean)
lambda.PI2 <- apply(lambda, 2, PI)

# superimpose trend
lines(pred_dat2$fmnnty, lambda.mu2)
shade(lambda.PI2, pred_dat2$fmnnty)

# compute sampling distribution
deaths_sim2 <- sim(m1, data = pred_dat2)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
deaths_sim.PI2 <- apply(deaths_sim2, 2, PI)

# superimpose sampling interval as dashed lines
lines(pred_dat2$fmnnty, deaths_sim.PI2[1, ], lty = 2)
lines(pred_dat2$fmnnty, deaths_sim.PI2[2, ], lty = 2)

# From the model above, we find a weak, positive trend. The black shaded area 
# is not very visible and it misses many of the hurricane death tolls to the right, 
# which can be a sign of over-dispersion. This issue can be caused by these 
# highly influential data points - the 4 hurricanes that were of high density 
# causing over 100 deaths. 
# The model including the femininity rating fits the data better than the 
# intercept-only model, but the plot shows that the reason for this better fit 
# is due to a few influential outliers on the right side of the plot. There are 
# four storms that were particularly deadly (over 100 deaths), and all four had 
# names that received high ratings of femininity, leading the model (and the 
# authors Jung et al. 2014 in PNAS) to believe that female hurricanes are 
# deadlier than male hurricanes.

Acquaint yourself with the columns by inspecting the help ?Hurricanes. In this problem, you’ll focus on predicting deaths using femininity of each hurricane’s name. Fit and interpret the simplest possible model, a Poisson model of deaths using femininity as a predictor. You can use quap or ulam. Plot your results of prior predictive simulation. Compare the model to an intercept-only Poisson model of deaths. Plot the results over the raw data. How strong is the association between femininity of name and deaths? Which storms does the model fit (retrodict) well? Which storms does it fit poorly?