Assignment #10

Questions

12E1. What is the difference between an ordered categorical variable and an unordered one? Define and then give an example of each.

#There is a natural ordering of levels within ordered category variable such as rating. But levels of the unordered categorical variable are not comparable.

12E2. What kind of link function does an ordered logistic regression employ? How does it differ from an ordinary logit link?

#Ordered logistic regression employs cumulative logit link function.This function returns a sum of probabilities of all levels less than or equal to a given one

12E3. When count data are zero-inflated, using a model that ignores zero-inflation will tend to induce which kind of inferential error?

#It will underestimate true value of lambda for Poisson regression or parameter p for Binomial regression.

12E4. Over-dispersion is common in count data. Give an example of a natural process that might produce over-dispersed counts. Can you also give an example of a process that might produce underdispersed counts?

#Overdispersion is the presence of greater variability in a data set than would be expected in a statistical model. Underdispersed counts could happen in all observations smaller than a threshold substituted with the threshold itself.

12M1. At a certain university, employees are annually rated from 1 to 4 on their productivity, with 1 being least productive and 4 most productive. In a certain department at this certain university in a certain year, the numbers of employees receiving each rating were (from 1 to 4): 12, 36, 7, 41. Compute the log cumulative odds of each rating.

rating <- c(12, 36, 7, 41)
cum <- cumsum(rating) /  sum(rating)
log_cum_odds <- log(cum/(1-cum)) 
print(log_cum_odds)

## [1] -1.9459101  0.0000000  0.2937611        Inf

12M2. Make a version of Figure 12.5 for the employee ratings data given just above.

plot(1:4, cum)
lines(1:4, cum)
prev <- 0
prev2 <- 0
for(i in 1:4){
  lines(c(i,i),c(0,cum[i]), lwd=4)
  lines(c(i+0.03,i+0.03), c(prev, cum[i]), lwd=4, col='blue')
  prev2 <- prev
  prev <- cum[i]
}

12M3. Can you modify the derivation of the zero-inflated Poisson distribution (ZIPoisson) from the chapter to construct a zero-inflated binomial distribution?

#change Poisson likelihood to Binomial

12H1. In 2014, a paper was published that was entitled “Female hurricanes are deadlier than male hurricanes.”191 As the title suggests, the paper claimed that hurricanes with female names have caused greater loss of life, and the explanation given is that people unconsciously rate female hurricanes as less dangerous and so are less likely to evacuate. Statisticians severely criticized the paper after publication. Here, you’ll explore the complete data used in the paper and consider the hypothesis that hurricanes with female names are deadlier. Load the data with:

data(Hurricanes)
d <- Hurricanes
str(d)

## 'data.frame':    92 obs. of  8 variables:
##  $ name        : Factor w/ 83 levels "Able","Agnes",..: 38 77 1 9 47 20 40 60 27 33 ...
##  $ year        : int  1950 1950 1952 1953 1953 1954 1954 1954 1955 1955 ...
##  $ deaths      : int  2 4 3 1 0 60 20 20 0 200 ...
##  $ category    : int  3 3 1 1 1 3 3 4 3 1 ...
##  $ min_pressure: int  960 955 985 987 985 960 954 938 962 987 ...
##  $ damage_norm : int  1590 5350 150 58 15 19321 3230 24260 2030 14730 ...
##  $ female      : int  1 0 0 1 1 1 1 1 1 1 ...
##  $ femininity  : num  6.78 1.39 3.83 9.83 8.33 ...

ggplot(d, aes(as.factor(female), femininity))+geom_boxplot()

ggplot(d, aes(as.factor(female), deaths))+geom_violin()

ggplot(d, aes(femininity, deaths))+geom_point() + geom_smooth()

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

m11h1.int <- map(alist(
  deaths ~ dpois(lambda),
  log(lambda) ~ a ,
  a ~ dnorm(0, 10)
),
data=d
)
precis(m11h1.int)

##       mean         sd     5.5%    94.5%
## a 3.027739 0.02294244 2.991073 3.064406

m11h1.fem <- map(alist(
  deaths ~ dpois(lambda),
  log(lambda) ~ a + b_fem*femininity,
  b_fem ~ dnorm(0, 10),
  a ~ dnorm(0, 10)
),
data=d
)
precis(m11h1.fem)

##             mean          sd       5.5%      94.5%
## b_fem 0.07387302 0.007890426 0.06126259 0.08648344
## a     2.50035263 0.063295955 2.39919347 2.60151179

(cmp <- compare(m11h1.int, m11h1.fem))

##               WAIC       SE   dWAIC      dSE     pWAIC       weight
## m11h1.fem 4417.055 1000.432  0.0000       NA 128.18215 9.999998e-01
## m11h1.int 4447.659 1075.087 30.6036 141.5163  78.14946 2.262109e-07

plot(cmp)

plot(coeftab(m11h1.int, m11h1.fem))

postcheck(m11h1.fem, window = 100)
abline(h=40, col='red')
abline(h=10, col='blue')
abline(h=mean(d$deaths), lty=2)

d.predict <- data.frame(femininity=seq(1,11,0.1))
lambda.sample <- link(m11h1.fem, d.predict)
lambda.avg <- apply(lambda.sample, 2, mean )
lambda.pi <- apply(lambda.sample, 2, PI )

count.sample <- sim(m11h1.fem, data = d.predict)
count.avg <- apply(count.sample, 2, mean )
count.pi <- apply(count.sample, 2, PI )

plot(d$femininity, d$deaths, xlim=c(0,12), col='blue', pch=16)
lines(d.predict$femininity, lambda.avg)
shade(lambda.pi, d.predict$femininity)

lines(d.predict$femininity, count.avg, col='red')
shade(count.pi, d.predict$femininity)

#There are compound variables that explain the death

Acquaint yourself with the columns by inspecting the help ?Hurricanes. In this problem, you’ll focus on predicting deaths using femininity of each hurricane’s name. Fit and interpret the simplest possible model, a Poisson model of deaths using femininity as a predictor. You can use quap or ulam. Compare the model to an intercept-only Poisson model of deaths. How strong is the association between femininity of name and deaths? Which storms does the model fit (retrodict) well? Which storms does it fit poorly?

Assignment #10

Yucheng Hu

2020-10-13

Chapter 12 - Monsters and Mixtures

Questions