| itle: ‘Assignment #10’ |
| ubtitle: ‘Chapter 12’ |
| uthor: “Ronak Jain” |
| ate: “2021-10-14” |
| utput: html_document |
This chapter introduced several new types of regression, all of which are generalizations of generalized linear models (GLMs). Ordered logistic models are useful for categorical outcomes with a strict ordering. They are built by attaching a cumulative link function to a categorical outcome distribution. Zero-inflated models mix together two different outcome distributions, allowing us to model outcomes with an excess of zeros. Models for overdispersion, such as beta-binomial and gamma-Poisson, draw the expected value of each observation from a distribution that changes shape as a function of a linear model.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them.
Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
12-1. At a certain university, employees are annually rated from 1 to 4 on their productivity, with 1 being least productive and 4 most productive. In a certain department at this certain university in a certain year, the numbers of employees receiving each rating were (from 1 to 4): 12, 36, 7, 41. Compute the log cumulative odds of each rating.
n <- c( 12, 36 , 7 , 41 )
p <- n / sum(n)
cump = cumsum(p)
log(cump / (1 - cump))
## [1] -1.9459101 0.0000000 0.2937611 Inf
12-2. Make a version of Figure 12.5 for the employee ratings data given just above.
n = c(12, 36, 7, 41)
p = n / sum(n)
cump = cumsum(p)
plot(
y = cump,
x = 1:4,
type = "b",
ylim = c(0, 1)
)
segments(1:4, 0, 1:4, cump)
for (i in 1:4) {
segments(i + 0.05, c(0, cump)[i], i + 0.05, cump[i], col = "green")
}
12-3. In 2014, a paper was published that was entitled “Female hurricanes are deadlier than male hurricanes.”191 As the title suggests, the paper claimed that hurricanes with female names have caused greater loss of life, and the explanation given is that people unconsciously rate female hurricanes as less dangerous and so are less likely to evacuate. Statisticians severely criticized the paper after publication. Here, you’ll explore the complete data used in the paper and consider the hypothesis that hurricanes with female names are deadlier.
Acquaint yourself with the columns by inspecting the help ?Hurricanes. In this problem, you’ll focus on predicting deaths using femininity of each hurricane’s name. Fit and interpret the simplest possible model, a Poisson model of deaths using femininity as a predictor. You can use quap or ulam. Compare the model to an intercept-only Poisson model of deaths. How strong is the association between femininity of name and deaths? Which storms does the model fit (retrodict) well? Which storms does it fit poorly?
library(rethinking)
data(Hurricanes)
data1 <- Hurricanes
m1 <- map(
alist(
deaths ~ dpois( lambda ),
log(lambda) <- a + bF*femininity,
a ~ dnorm(0,10),
bF ~ dnorm(0,5)
) ,
data=data1)
m2 <- map(
alist(
deaths ~ dpois( lambda ),
log(lambda) <- a ,
a ~ dnorm(0,10)
) ,
data=data1)
compare(m1,m2)
## WAIC SE dWAIC dSE pWAIC weight
## m1 4422.702 1003.02 0.00000 NA 138.77600 9.999995e-01
## m2 4451.638 1077.32 28.93621 141.6727 80.48083 5.206917e-07
y <- sim(m1)
y.mean <- colMeans(y)
y.PI <- apply(y, 2, PI)
plot(y=data1$deaths, x=data1$femininity, col=rangi2, ylab="deaths", xlab="femininity", pch=16)
points(y=y.mean, x=data1$femininity, pch=1)
segments(x0=data1$femininity, x1= data1$femininity, y0=y.PI[1,], y1=y.PI[2,])
lines(y= y.mean[order(data1$femininity)], x=sort(data1$femininity))
lines( y.PI[1,order(data1$femininity)], x=sort(data1$femininity), lty=2 )
lines( y.PI[2,order(data1$femininity)], x=sort(data1$femininity), lty=2 )
# Based on the above plot we can clearly see that the relationship between deaths and feminitiy of hurricanes is not very strong and it looks the feminine hurricanes do have higher number of deaths.
12-4. Counts are nearly always over-dispersed relative to Poisson. So fit a gamma-Poisson (aka negative-binomial) model to predict deaths using femininity. Show that the over-dispersed model no longer shows as precise a positive association between femininity and deaths, with an 89% interval that overlaps zero. Can you explain why the association diminished in strength?
library(rethinking)
data(Hurricanes)
d <- Hurricanes # load data on object called d
d$fem_std <- (d$femininity - mean(d$femininity)) / sd(d$femininity) # standardised femininity
dat <- list(D = d$deaths, F = d$fem_std)
mH2 <- ulam(
alist(
D ~ dgampois(lambda, scale),
log(lambda) <- a + bF * F,
a ~ dnorm(1, 1),
bF ~ dnorm(0, 1),
scale ~ dexp(1) # strictly positive hence why exponential prior
),
data = dat, chains = 4, cores = 4, log_lik = TRUE
)
precis(mH2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a 2.9768960 0.15173921 2.74303304 3.2185314 1881.872 1.0019032
## bF 0.2117032 0.15793262 -0.04848839 0.4602784 1801.903 0.9994932
## scale 0.4499314 0.06319212 0.35394818 0.5588968 1814.489 1.0007143
#Our previously identified positive relationship between standardised femininity of hurricane name and death toll is still there albeit slightly diminished in magnitude. However, the credible interval around it has widened considerably and overlaps zero now.
#Let’s compare the estimates of our models side by side:
# model formula
f <- alist(
D ~ dpois(lambda), # poisson outcome distribution
log(lambda) <- a + bF * F, # log-link for lambda with linear model
# priors in log-space, 0 corresponds to outcome of 1
a ~ dnorm(1, 1),
bF ~ dnorm(0, 1)
)
mH1 <- ulam(f, data = dat, chains = 4, cores = 4, log_lik = TRUE)
precis(mH1)
## mean sd 5.5% 94.5% n_eff Rhat4
## a 3.0001480 0.02336658 2.963271 3.0363959 1279.216 1.0005851
## bF 0.2396465 0.02542728 0.199307 0.2801151 1330.173 0.9992721
#Based on the analysis I think there is higher uncertainty in the new model compared to the old one even when thogh when the predictions are very similar in both. Also the models do not account for high death count due to storms.
12-5. In the data, there are two measures of a hurricane’s potential to cause death: damage_norm and min_pressure. Consult ?Hurricanes for their meanings. It makes some sense to imagine that femininity of a name matters more when the hurricane is itself deadly. This implies an interaction between femininity and either or both of damage_norm and min_pressure. Fit a series of models evaluating these interactions. Interpret and compare the models. In interpreting the estimates, it may help to generate counterfactual predictions contrasting hurricanes with masculine and feminine names. Are the effect sizes plausible?
library(rethinking)
data(Hurricanes)
d <- Hurricanes # load data on object called d
d$fem_std <- (d$femininity - mean(d$femininity)) / sd(d$femininity) # standardised femininity
dat <- list(D = d$deaths, F = d$fem_std)
dat$P <- standardize(d$min_pressure)
dat$S <- standardize(d$damage_norm)
mH3a <- ulam(
alist(
D ~ dgampois(lambda, scale),
log(lambda) <- a + bF * F + bP * P + bFP * F * P,
a ~ dnorm(1, 1),
c(bF, bP, bFP) ~ dnorm(0, 1),
scale ~ dexp(1)
),
data = dat, cores = 4, chains = 4, log_lik = TRUE
)
precis(mH3a)
## mean sd 5.5% 94.5% n_eff Rhat4
## a 2.7578019 0.14026320 2.54804161 2.9886691 2033.630 1.0001743
## bFP 0.2927512 0.15330404 0.05422522 0.5352309 1859.179 1.0009154
## bP -0.6683991 0.14198860 -0.89827329 -0.4464214 1769.804 0.9994191
## bF 0.2978721 0.14239279 0.06186658 0.5174821 2201.774 1.0001314
## scale 0.5525406 0.08120198 0.43243004 0.6855105 2049.784 1.0005165
#Based on the above analysis it looks like bP value is negative which shows that as severe the storm is more people die and lower the pressure is. The interaction effect bFP seems to be positive.
P_seq <- seq(from = -3, to = 2, length.out = 1e2) # pressure sequence
# 'masculine' storms
d_pred <- data.frame(F = -1, P = P_seq)
lambda_m <- link(mH3a, data = d_pred)
lambda_m.mu <- apply(lambda_m, 2, mean)
lambda_m.PI <- apply(lambda_m, 2, PI)
# 'feminine' storms
d_pred <- data.frame(F = 1, P = P_seq)
lambda_f <- link(mH3a, data = d_pred)
lambda_f.mu <- apply(lambda_f, 2, mean)
lambda_f.PI <- apply(lambda_f, 2, PI)
# Plotting, sqrt() to make differences easier to spot, can't use log because there are storm with zero deaths
plot(dat$P, sqrt(dat$D),
pch = 1, lwd = 2, col = ifelse(dat$F > 0, "red", "dark gray"),
xlab = "minimum pressure (std)", ylab = "sqrt(deaths)"
)
lines(P_seq, sqrt(lambda_m.mu), lty = 2)
shade(sqrt(lambda_m.PI), P_seq)
lines(P_seq, sqrt(lambda_f.mu), lty = 1, col = "red")
shade(sqrt(lambda_f.PI), P_seq, col = col.alpha("red", 0.2))
# Our models expects the male storms to be less deadly compared to the female ones but as the pressure drops the difference between the two also diminishes.
# Now adding the other variable damage caused by each storm to our model:
mH3b <- ulam(
alist(
D ~ dgampois(lambda, scale),
log(lambda) <- a + bF * F + bS * S + bFS * F * S,
a ~ dnorm(1, 1),
c(bF, bS, bFS) ~ dnorm(0, 1),
scale ~ dexp(1)
),
data = dat, chains = 4, cores = 4, log_lik = TRUE
)
precis(mH3b)
## mean sd 5.5% 94.5% n_eff Rhat4
## a 2.5714607 0.1329740 2.36290226 2.7834553 2813.533 0.9997421
## bFS 0.3143231 0.2059420 -0.02899507 0.6293951 2183.267 0.9990035
## bS 1.2520829 0.2142457 0.92560946 1.6075934 2421.165 0.9985959
## bF 0.0890769 0.1271696 -0.11138189 0.2936303 2514.069 0.9992753
## scale 0.6847341 0.1070618 0.52131582 0.8670977 2904.998 0.9999683
# We can clearly see above that the effect of feminine storm has been totally normalized with bF values as 0.09 and there is a strong interaction efficient bFS positive 0.31.
S_seq <- seq(from = -1, to = 5.5, length.out = 1e2) # damage sequence
# 'masculine' storms
d_pred <- data.frame(F = -1, S = S_seq)
lambda_m <- link(mH3b, data = d_pred)
lambda_m.mu <- apply(lambda_m, 2, mean)
lambda_m.PI <- apply(lambda_m, 2, PI)
# 'feminine' storms
d_pred <- data.frame(F = 1, S = S_seq)
lambda_f <- link(mH3b, data = d_pred)
lambda_f.mu <- apply(lambda_f, 2, mean)
lambda_f.PI <- apply(lambda_f, 2, PI)
# plot
plot(dat$S, sqrt(dat$D),
pch = 1, lwd = 2, col = ifelse(dat$F > 0, "red", "dark gray"),
xlab = "normalized damage (std)", ylab = "sqrt(deaths)"
)
lines(S_seq, sqrt(lambda_m.mu), lty = 2)
shade(sqrt(lambda_m.PI), S_seq)
lines(S_seq, sqrt(lambda_f.mu), lty = 1, col = "red")
shade(sqrt(lambda_f.PI), S_seq, col = col.alpha("red", 0.2))
# We can clearly see above that there is not much difference between feminine and masculine storms at this time. It also looks like damage storms scales exponentially as we can see towards the right side of the plot.
#The interaction effect looks to be positive and high thats beause of the three feminine storms on the right top corner of the plot. It looks like feminine storms which are damaging are really dangerous.