Chapter 12 - Monsters and Mixtures

This chapter introduced several new types of regression, all of which are generalizations of generalized linear models (GLMs). Ordered logistic models are useful for categorical outcomes with a strict ordering. They are built by attaching a cumulative link function to a categorical outcome distribution. Zero-inflated models mix together two different outcome distributions, allowing us to model outcomes with an excess of zeros. Models for overdispersion, such as beta-binomial and gamma-Poisson, draw the expected value of each observation from a distribution that changes shape as a function of a linear model.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them.

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

12-1. At a certain university, employees are annually rated from 1 to 4 on their productivity, with 1 being least productive and 4 most productive. In a certain department at this certain university in a certain year, the numbers of employees receiving each rating were (from 1 to 4): 12, 36, 7, 41. Compute the log cumulative odds of each rating.

employees <- c(12,36,7,41,0)

for(i in 1:4){
  print(log(sum(employees[1:i])/sum(employees[(i+1):5])))
}
## [1] -1.94591
## [1] 0
## [1] 0.2937611
## [1] Inf

12-2. Make a version of Figure 12.5 for the employee ratings data given just above.

n = c(12, 36, 7, 41)
p = n / sum(n)
cum_p = cumsum(p)
plot(
  y = cum_p,
  x = 1:4,
  type = "b",
  ylim = c(0, 1)
)
segments(1:4, 0, 1:4, cum_p)
for (i in 1:4) {
    segments(i + 0.05, c(0, cum_p)[i], i + 0.05, cum_p[i], col = "green")
}

12-3. In 2014, a paper was published that was entitled “Female hurricanes are deadlier than male hurricanes.”191 As the title suggests, the paper claimed that hurricanes with female names have caused greater loss of life, and the explanation given is that people unconsciously rate female hurricanes as less dangerous and so are less likely to evacuate. Statisticians severely criticized the paper after publication. Here, you’ll explore the complete data used in the paper and consider the hypothesis that hurricanes with female names are deadlier.

Acquaint yourself with the columns by inspecting the help ?Hurricanes. In this problem, you’ll focus on predicting deaths using femininity of each hurricane’s name. Fit and interpret the simplest possible model, a Poisson model of deaths using femininity as a predictor. You can use quap or ulam. Compare the model to an intercept-only Poisson model of deaths. How strong is the association between femininity of name and deaths? Which storms does the model fit (retrodict) well? Which storms does it fit poorly?

data(hurricanes)
## Warning in data(hurricanes): data set 'hurricanes' not found
#data1 <- Hurricanes
#m1 <- map(
#  alist(
#    deaths ~ dpois( lambda ),
#    log(lambda) <- a + bF*femininity,
#    a ~ dnorm(0,10),
#    bF ~ dnorm(0,5)
#  ) ,
#  data=data1)

#m2 <- map(
#  alist(
#    deaths ~ dpois( lambda ),
#    log(lambda) <- a ,
#    a ~ dnorm(0,10)
#  ) ,
#  data=data1)

#compare(m1,m2)

##        WAIC        SE    dWAIC      dSE     pWAIC       weight
## m1 4410.517  996.0753  0.00000       NA 134.35939 1.000000e+00
## m2 4448.556 1074.1789 38.03882 144.0952  82.56393 5.495093e-09

#y <- sim(m1)

#y.mean <- colMeans(y)
#y.PI <- apply(y, 2, PI)

#plot(y=data1$deaths, x=data1$femininity, col=rangi2, ylab="deaths", xlab="femininity", pch=16)
#points(y=y.mean, x=data1$femininity, pch=1)
#segments(x0=data1$femininity, x1= data1$femininity, y0=y.PI[1,], y1=y.PI[2,])

#lines(y= y.mean[order(data1$femininity)],  x=sort(data1$femininity))
#lines( y.PI[1,order(data1$femininity)],  x=sort(data1$femininity), lty=2 )
#lines( y.PI[2,order(data1$femininity)],  x=sort(data1$femininity), lty=2 )

12-4. Counts are nearly always over-dispersed relative to Poisson. So fit a gamma-Poisson (aka negative-binomial) model to predict deaths using femininity. Show that the over-dispersed model no longer shows as precise a positive association between femininity and deaths, with an 89% interval that overlaps zero. Can you explain why the association diminished in strength?

#data(Hurricanes)
#d <- Hurricanes
#d$fem_std <- (d$femininity - mean(d$femininity)) / sd(d$femininity) 
#dat <- list(D = d$deaths, F = d$fem_std)

# Recall that the gamma-Poisson has two parameters, one for the rate, and the other for the dispersion of rates. Larger values of the dispersion imply that the distribution is more similar to a pure Poisson process. For ensuring meaningful comparisons, we will keep the same priors as before. We will need a scale parameter, but we will postulate a simple exponential prior for that. This clearly has more spread than the previous predictions. We note that the effective number of samples in the second model are greater, which implies that this model is less prone to correlations. We can quantify this with the WAIC as well. By definition, the dispersion term tends to spread the distribution out, with higher values of the dispersion corresponding to a “true” Poisson distribution.

12-5. In the data, there are two measures of a hurricane’s potential to cause death: damage_norm and min_pressure. Consult ?Hurricanes for their meanings. It makes some sense to imagine that femininity of a name matters more when the hurricane is itself deadly. This implies an interaction between femininity and either or both of damage_norm and min_pressure. Fit a series of models evaluating these interactions. Interpret and compare the models. In interpreting the estimates, it may help to generate counterfactual predictions contrasting hurricanes with masculine and feminine names. Are the effect sizes plausible?

#data(Hurricanes)
#d <- Hurricanes 
#d$fem_std <- (d$femininity - mean(d$femininity)) / sd(d$femininity) # standardised femininity
#dat <- list(D = d$deaths, F = d$fem_std)
#dat$P <- standardize(d$min_pressure)
#dat$S <- standardize(d$damage_norm)

#I start with a basic model which builds on the previous gamma-Poisson model by adding an interaction between femininity and min_pressure: As minimum pressure gets lower, a storm grows stronger. We can clearly see how our model makes less of a distinction between masculine and feminine hurricanes overall at this point. Damage norm scales multiplicatively. The distances grow fast as we approach the rightward side of the plot. This is difficult for the model to account for. Hence why the model is underwhelming. So why is the interaction effect so strong? Probably because of those 3-4 highly influential feminine storms at the upper-righthand corner of our plot above which implies that feminine storms are especially deadly when they are damaging to begin with. Personally, I don’t trust this association and would argue that there is no logical reason for it and most likely an artefact of the limited data availability.