Assignment #10 - Hanyang Li_ANLY505-2020-LateSummer

Questions

12E1. What is the difference between an ordered categorical variable and an unordered one? Define and then give an example of each.

# Ordered A>B>C>D but the distances between each category can differ. For example, developmental stages. For unordered categoricals there is no such structure, for example furniture types.

12E2. What kind of link function does an ordered logistic regression employ? How does it differ from an ordinary logit link?

# Cumulative log-odds give the odds of that value or any smaller value. Normal log-odds represent the odds of a particular value.

12E3. When count data are zero-inflated, using a model that ignores zero-inflation will tend to induce which kind of inferential error?

# it will think the outcome of the process modeled is zero more often than is the case, as the zeroes could be arising from a different process.

12E4. Over-dispersion is common in count data. Give an example of a natural process that might produce over-dispersed counts. Can you also give an example of a process that might produce underdispersed counts?

# Overdispersion: missing some key measurement that causes a lot of the variance in the data.

12M1. At a certain university, employees are annually rated from 1 to 4 on their productivity, with 1 being least productive and 4 most productive. In a certain department at this certain university in a certain year, the numbers of employees receiving each rating were (from 1 to 4): 12, 36, 7, 41. Compute the log cumulative odds of each rating.

nr_employees <- c(12,36,7,41,0) 
# add zero so that the final odds will be 1/0

for(i in 1:4){
  print(log(sum(nr_employees[1:i])/sum(nr_employees[(i+1):5])))
}

## [1] -1.94591
## [1] 0
## [1] 0.2937611
## [1] Inf

12M2. Make a version of Figure 12.5 for the employee ratings data given just above.

cum_prop=nr_employees/sum(nr_employees)
cum_prop=sapply(1:4,function(i){sum(cum_prop[1:i])})
plot(y=cum_prop,x=1:4,type="b", ylim=c(0,1))
segments(1:4,0,1:4,cum_prop)
for(i in 1:4){segments(i+0.05,c(0,cum_prop)[i],i+0.05,cum_prop[i], col = "blue")}

12M3. Can you modify the derivation of the zero-inflated Poisson distribution (ZIPoisson) from the chapter to construct a zero-inflated binomial distribution?

# Likelyhood for binomial: Pr(y | N, p) = px(1-p)n-x
# where x is number of successes.

# (1 - p_not_work) * p_success^n_success (1-p_success)^n_trials-n_success

12H1. In 2014, a paper was published that was entitled “Female hurricanes are deadlier than male hurricanes.”191 As the title suggests, the paper claimed that hurricanes with female names have caused greater loss of life, and the explanation given is that people unconsciously rate female hurricanes as less dangerous and so are less likely to evacuate. Statisticians severely criticized the paper after publication. Here, you’ll explore the complete data used in the paper and consider the hypothesis that hurricanes with female names are deadlier. Load the data with:

data(Hurricanes)

m11H1.map <- map(
  alist(
    deaths ~ dpois( lambda ),
    log(lambda) <- a + bF*femininity,
    a ~ dnorm(0,10),
    bF ~ dnorm(0,5)
  ) ,
  data=Hurricanes)

m11H1.map2 <- map(
  alist(
    deaths ~ dpois( lambda ),
    log(lambda) <- a ,
    a ~ dnorm(0,10)
  ) ,
  data=Hurricanes)

compare(m11H1.map,m11H1.map2)

##                WAIC        SE    dWAIC      dSE     pWAIC       weight
## m11H1.map  4410.385  996.3371  0.00000       NA 132.55770 1.000000e+00
## m11H1.map2 4457.072 1080.8846 46.68675 148.1524  78.93138 7.279482e-11

y <- sim(m11H1.map)

y.mean <- colMeans(y)
y.PI <- apply(y, 2, PI)


# plot the model predictions for `y` vs. the actual number of successes for each case
plot(y=Hurricanes$deaths, x=Hurricanes$femininity, col=rangi2, ylab="deaths", xlab="femininity", pch=16)
points(y=y.mean, x=Hurricanes$femininity, pch=1)
segments(x0=Hurricanes$femininity, x1= Hurricanes$femininity, y0=y.PI[1,], y1=y.PI[2,])

lines(y= y.mean[order(Hurricanes$femininity)],  x=sort(Hurricanes$femininity))
lines( y.PI[1,order(Hurricanes$femininity)],  x=sort(Hurricanes$femininity), lty=2 )
lines( y.PI[2,order(Hurricanes$femininity)],  x=sort(Hurricanes$femininity), lty=2 )

# It looks like the relationship isn’t very strong, but the deadlies hurricanes seem to be more feminine in name

Acquaint yourself with the columns by inspecting the help ?Hurricanes. In this problem, you’ll focus on predicting deaths using femininity of each hurricane’s name. Fit and interpret the simplest possible model, a Poisson model of deaths using femininity as a predictor. You can use quap or ulam. Compare the model to an intercept-only Poisson model of deaths. How strong is the association between femininity of name and deaths? Which storms does the model fit (retrodict) well? Which storms does it fit poorly?

Assignment #10 - Hanyang Li_ANLY505-2020-LateSummer

Hanyang Li

2020-10-12

Chapter 12 - Monsters and Mixtures

Questions