Assignment #10

Questions

12-1. At a certain university, employees are annually rated from 1 to 4 on their productivity, with 1 being least productive and 4 most productive. In a certain department at this certain university in a certain year, the numbers of employees receiving each rating were (from 1 to 4): 12, 36, 7, 41. Compute the log cumulative odds of each rating.

df <- c( 12, 36 , 7 , 41 )
p <- df / sum(df)
p

## [1] 0.12500000 0.37500000 0.07291667 0.42708333

cum_p <- cumsum(p)
cum_p

## [1] 0.1250000 0.5000000 0.5729167 1.0000000

log_odds <- log(cum_p / (1 - cum_p))
log_odds

## [1] -1.9459101  0.0000000  0.2937611        Inf

12-2. Make a version of Figure 12.5 for the employee ratings data given just above.

c_prop=df/sum(df)
c_prop=sapply(1:4,function(i){sum(c_prop[1:i])})
plot(y=c_prop,x=1:4,type="b", ylim=c(0,1))
segments(1:4,0,1:4,c_prop)
for(i in 1:4){segments(i+0.05,c(0,c_prop)[i],i+0.05,c_prop[i], col = "blue")}

12-3. In 2014, a paper was published that was entitled “Female hurricanes are deadlier than male hurricanes.”191 As the title suggests, the paper claimed that hurricanes with female names have caused greater loss of life, and the explanation given is that people unconsciously rate female hurricanes as less dangerous and so are less likely to evacuate. Statisticians severely criticized the paper after publication. Here, you’ll explore the complete data used in the paper and consider the hypothesis that hurricanes with female names are deadlier.

Acquaint yourself with the columns by inspecting the help ?Hurricanes. In this problem, you’ll focus on predicting deaths using femininity of each hurricane’s name. Fit and interpret the simplest possible model, a Poisson model of deaths using femininity as a predictor. You can use quap or ulam. Compare the model to an intercept-only Poisson model of deaths. How strong is the association between femininity of name and deaths? Which storms does the model fit (retrodict) well? Which storms does it fit poorly?

library(rethinking)
data(Hurricanes)
df <- Hurricanes
str(df)

## 'data.frame':    92 obs. of  8 variables:
##  $ name        : Factor w/ 83 levels "Able","Agnes",..: 38 77 1 9 47 20 40 60 27 33 ...
##  $ year        : int  1950 1950 1952 1953 1953 1954 1954 1954 1955 1955 ...
##  $ deaths      : int  2 4 3 1 0 60 20 20 0 200 ...
##  $ category    : int  3 3 1 1 1 3 3 4 3 1 ...
##  $ min_pressure: int  960 955 985 987 985 960 954 938 962 987 ...
##  $ damage_norm : int  1590 5350 150 58 15 19321 3230 24260 2030 14730 ...
##  $ female      : int  1 0 0 1 1 1 1 1 1 1 ...
##  $ femininity  : num  6.78 1.39 3.83 9.83 8.33 ...

df$fmnnty_std <- ( df$femininity - mean(df$femininity))/sd(df$femininity)
df1 <- list(deaths = df$deaths, fmn <-df$fmnnty_std)

# Poisson model
m1 <- map(
  alist(
    deaths ~ dpois(lambda),
    log(lambda) <- a+ bF * fmn,
    a ~ dnorm(0,10),
    bF ~ dnorm(0,1)
  ), data = df1)

# intercept-only
df2 <- list(deaths = df$deaths)
m2 <- map(
  alist(
    deaths ~ dpois(lambda),
    log(lambda) <- a,
    a ~ dnorm(0,10)
  ), data = df2
)

compare(m1,m2)

##        WAIC       SE    dWAIC      dSE     pWAIC     weight
## m1 4436.462 1010.026  0.00000       NA 138.74544 0.99866479
## m2 4449.697 1076.195 13.23466 137.1479  78.80234 0.00133521

# Plotting

plot( df$fmnnty_std , df$deaths , pch=16 ,xlab="femininity" , ylab="deaths")
pred_dat <- list( fmn = seq(from = -2, to = 1.5, length.out = 30) )
lambda <- link(m1,data=pred_dat)
lambda.mu <- apply(lambda,2,mean)
lambda.PI <- apply(lambda,2,PI)

line(pred_dat$fmn, lambda.mu)

## 
## Call:
## line(pred_dat$fmn, lambda.mu)
## 
## Coefficients:
## [1]  20.540   4.576

shade( lambda.PI , pred_dat$fmn )

12-4. Counts are nearly always over-dispersed relative to Poisson. So fit a gamma-Poisson (aka negative-binomial) model to predict deaths using femininity. Show that the over-dispersed model no longer shows as precise a positive association between femininity and deaths, with an 89% interval that overlaps zero. Can you explain why the association diminished in strength?

data(Hurricanes)
d <- Hurricanes
d$fem_std <- (d$femininity - mean(d$femininity)) / sd(d$femininity) 
dat <- list(D = d$deaths, F = d$fem_std)

The realtionship seems not very strong, but still positive trend, which means female hurricanes seems to be more death. Second model fits well and the model using femininity as only predictor fits poorly.

12-5. In the data, there are two measures of a hurricane’s potential to cause death: damage_norm and min_pressure. Consult ?Hurricanes for their meanings. It makes some sense to imagine that femininity of a name matters more when the hurricane is itself deadly. This implies an interaction between femininity and either or both of damage_norm and min_pressure. Fit a series of models evaluating these interactions. Interpret and compare the models. In interpreting the estimates, it may help to generate counterfactual predictions contrasting hurricanes with masculine and feminine names. Are the effect sizes plausible?

library(rethinking)
data(Hurricanes)
d <- Hurricanes 
d$fem_std <- (d$femininity - mean(d$femininity)) / sd(d$femininity) 
dat <- list(D = d$deaths, F = d$fem_std)
dat$P <- standardize(d$min_pressure)
dat$S <- standardize(d$damage_norm)

Assignment #10

Chapter 12

Yuman Liang

2021-10-11

Chapter 12 - Monsters and Mixtures

Questions

The realtionship seems not very strong, but still positive trend, which means female hurricanes seems to be more death. Second model fits well and the model using femininity as only predictor fits poorly.