Assignment #10

Questions

12E1. What is the difference between an ordered categorical variable and an unordered one? Define and then give an example of each.

#Ordered categorical data defines the values representing rank or order. Example: customer satisfaction of a product. Un-ordered categorical data don't have this feature. Example: Marital Status

12E2. What kind of link function does an ordered logistic regression employ? How does it differ from an ordinary logit link?

#Log-cumulative-odds link probability model and the normal log-odds represent the odds of a particular value.

12E3. When count data are zero-inflated, using a model that ignores zero-inflation will tend to induce which kind of inferential error?

#It will think the outcome of the process modeled is zero more often than is the case, as the zeroes could be arising from a different process.

12E4. Over-dispersion is common in count data. Give an example of a natural process that might produce over-dispersed counts. Can you also give an example of a process that might produce underdispersed counts?

#In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. Example:Poisson distribution

12M1. At a certain university, employees are annually rated from 1 to 4 on their productivity, with 1 being least productive and 4 most productive. In a certain department at this certain university in a certain year, the numbers of employees receiving each rating were (from 1 to 4): 12, 36, 7, 41. Compute the log cumulative odds of each rating.

n <- c( 12, 36 , 7 , 41 )
q <- n / sum(n)
q

## [1] 0.12500000 0.37500000 0.07291667 0.42708333

sum(q)

## [1] 1

p <- cumsum(q)
p

## [1] 0.1250000 0.5000000 0.5729167 1.0000000

log(p/(1-p))

## [1] -1.9459101  0.0000000  0.2937611        Inf

12M2. Make a version of Figure 12.5 for the employee ratings data given just above.

plot(1:4 , p , xlab="rating" , ylab="cumulative proportion" ,
xlim=c(0.7,4.3), ylim=c(0,1) , xaxt="n")
axis(1, at=1:4, labels=1:4)

#plot gray cumulative probability lines
for (x in 1:4 ) lines( c(x,x) , c(0,p[x]) , col="gray" , lwd=2)

#plot blue discrete probability segments
for (x in 1:4 )
lines(c(x,x)+0.1 , c(p[x]-q[x],p[x]) , col="slateblue" , lwd=2)

#add number labels
text(1:4+0.2 , p-q/2 , labels=1:4 , col="slateblue")

12M3. Can you modify the derivation of the zero-inflated Poisson distribution (ZIPoisson) from the chapter to construct a zero-inflated binomial distribution?

#The probability of a zero, mixing together both processes, is: Pr(0|p0, q, n) = p0 + (1 − p0)(1 − q)^n

#The probability of any particular non-zero observation y is:Pr(y|p0, q, n) = (1 − p0)(n!/(y!(n − y)!)(q^y)((1 − q)^(n−y))

12H1. In 2014, a paper was published that was entitled “Female hurricanes are deadlier than male hurricanes.”191 As the title suggests, the paper claimed that hurricanes with female names have caused greater loss of life, and the explanation given is that people unconsciously rate female hurricanes as less dangerous and so are less likely to evacuate. Statisticians severely criticized the paper after publication. Here, you’ll explore the complete data used in the paper and consider the hypothesis that hurricanes with female names are deadlier. Load the data with:

data(Hurricanes)

d <- Hurricanes
d$fmnnty_std <- (d$femininity - mean(d$femininity))/sd(d$femininity)
m1 <- map2stan(
  alist(
    deaths ~ dpois(lambda),
    log(lambda) <- a + bf*fmnnty,
    a ~ dnorm(0,10),
    bf ~ dnorm(0,1)),
  data=list(
    deaths=d$deaths,
    fmnnty=d$fmnnty_std), 
  chains=4)

## 
## SAMPLING FOR MODEL 'a9936927220da0c8e25a64d23b9ccd9d' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.061 seconds (Warm-up)
## Chain 1:                0.052 seconds (Sampling)
## Chain 1:                0.113 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'a9936927220da0c8e25a64d23b9ccd9d' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.057 seconds (Warm-up)
## Chain 2:                0.066 seconds (Sampling)
## Chain 2:                0.123 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'a9936927220da0c8e25a64d23b9ccd9d' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.093 seconds (Warm-up)
## Chain 3:                0.09 seconds (Sampling)
## Chain 3:                0.183 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'a9936927220da0c8e25a64d23b9ccd9d' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.061 seconds (Warm-up)
## Chain 4:                0.052 seconds (Sampling)
## Chain 4:                0.113 seconds (Total)
## Chain 4:

## Computing WAIC

precis(m1)

##         mean         sd      5.5%     94.5%    n_eff     Rhat4
## a  3.0011407 0.02366863 2.9636728 3.0390109 2554.734 0.9998257
## bf 0.2380494 0.02466798 0.1983633 0.2771667 2209.651 0.9998868

#Intercept only model:
m0 <- map2stan(
  alist(
    deaths ~ dpois(lambda),
    log(lambda) <- a,
    a ~ dnorm(0,10)),
  data=list(deaths=d$deaths), 
  chains=4)

## 
## SAMPLING FOR MODEL '190654e686e6150d13388d460e0d4c65' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.037 seconds (Warm-up)
## Chain 1:                0.038 seconds (Sampling)
## Chain 1:                0.075 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '190654e686e6150d13388d460e0d4c65' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.037 seconds (Warm-up)
## Chain 2:                0.037 seconds (Sampling)
## Chain 2:                0.074 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '190654e686e6150d13388d460e0d4c65' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.037 seconds (Warm-up)
## Chain 3:                0.039 seconds (Sampling)
## Chain 3:                0.076 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '190654e686e6150d13388d460e0d4c65' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.037 seconds (Warm-up)
## Chain 4:                0.038 seconds (Sampling)
## Chain 4:                0.075 seconds (Total)
## Chain 4:

## Computing WAIC

# Compare the two models:
compare(m0, m1)

##        WAIC       SE    dWAIC      dSE     pWAIC       weight
## m1 4416.272 1000.596  0.00000       NA 131.67302 1.000000e+00
## m0 4461.532 1081.759 45.26048 145.0229  85.25903 1.485289e-10

# The model that includes femininity of names is better. 

# Now in order to see which hurricanes the model retrodicts well, I’ll compute and plot the expected death count, 89% interval of the expectation, and 89% interval of the expected distribution of deaths with Poisson sampling.

# plot raw data
plot(d$fmnnty_std , d$deaths , pch=16 ,
      col=rangi2 , xlab="femininity" , ylab="deaths")

# compute model-based trend
pred_dat <- list(fmnnty = seq(from = -2, to = 1.5, length.out = 30))
lambda <- link(m1,data=pred_dat)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

lambda.mu <- apply(lambda,2,mean)
lambda.PI <- apply(lambda,2,PI)

# superimpose trend
lines(pred_dat$fmnnty , lambda.mu)
shade(lambda.PI , pred_dat$fmnnty)

# compute sampling distribution
deaths_sim <- sim(m1,data=pred_dat)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

deaths_sim.PI <- apply(deaths_sim,2,PI)

# superimpose sampling interval as dashed lines
lines(pred_dat$fmnnty , deaths_sim.PI[1,] , lty=2)
lines(pred_dat$fmnnty , deaths_sim.PI[2,] , lty=2)

# Because it is so narrow, We can’t see the 89% interval of the expected value. The sampling distribution isn’t much wider itself. Here we can see femininity accounts for very little of the variation in deaths, especially at the high end. There’s a lot of over-dispersion, which is very common in Poisson models. Therefore, this homogenous Poisson model does a poor job for most of the hurricanes in the sample, since most of them lie outside the dashed prediction boundaries.

# plotting femininity of names data
plot(d$fmnnty_std, d$names, pch = 16, col= rangi2)

# compute model-based trend
pred_dat2 <- list(fmnnty = seq(from = -2, to = 1.5, length.out = 30))
lambda <- link(m1, data = pred_dat2)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

lambda.mu2 <- apply(lambda, 2, mean)
lambda.PI2 <- apply(lambda, 2, PI)

# superimpose trend
lines(pred_dat2$fmnnty, lambda.mu2)
shade(lambda.PI2, pred_dat2$fmnnty)

# compute sampling distribution
deaths_sim2 <- sim(m1, data = pred_dat2)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

deaths_sim.PI2 <- apply(deaths_sim2, 2, PI)

# superimpose sampling interval as dashed lines
lines(pred_dat2$fmnnty, deaths_sim.PI2[1, ], lty = 2)
lines(pred_dat2$fmnnty, deaths_sim.PI2[2, ], lty = 2)

# Based on the plot we can observe from Femininity vs deaths that femninity of names is much higher when compared to femininity of deaths. Femininity of names retrodict well and Poisson model fits poorly.

Acquaint yourself with the columns by inspecting the help ?Hurricanes. In this problem, you’ll focus on predicting deaths using femininity of each hurricane’s name. Fit and interpret the simplest possible model, a Poisson model of deaths using femininity as a predictor. You can use quap or ulam. Compare the model to an intercept-only Poisson model of deaths. How strong is the association between femininity of name and deaths? Which storms does the model fit (retrodict) well? Which storms does it fit poorly?

Assignment #10

Shidong Li

2021-02-15

Chapter 12 - Monsters and Mixtures

Questions