Assignment #10

Questions

12E1. What is the difference between an ordered categorical variable and an unordered one? Define and then give an example of each.

#ordered categorical variable will have a clear order/rankings, while un-ordered one does not have it.
#For example, risk level variable with values as Low-risk,Medium risk and High risk could be an ordered categorical variable, while color variable with values like red, yellow and blue is not ordered.

12E2. What kind of link function does an ordered logistic regression employ? How does it differ from an ordinary logit link?

#Ordered logistic regression employs ordered logit link.
#ordered logit link function regards that variable to be continuous variable and it is #calculating cumulative log odds so that it is ordered, while ordinary logit link function #does not.

12E3. When count data are zero-inflated, using a model that ignores zero-inflation will tend to induce which kind of inferential error?

# it will underestimate the ratio if the zeros will be counted into the total but does not represent values in the numerator.

12E4. Over-dispersion is common in count data. Give an example of a natural process that might produce over-dispersed counts. Can you also give an example of a process that might produce underdispersed counts?

#In credit approval process, if banks do not apply any restrictions and choose to book everyone, then their count by credit score will be over-dispersion; in the other hand, if banks choose to apply a very high credit score cut to the applications, then the booked accounts will be underdispersion because they are all concentrated on high credit scores.

12M1. At a certain university, employees are annually rated from 1 to 4 on their productivity, with 1 being least productive and 4 most productive. In a certain department at this certain university in a certain year, the numbers of employees receiving each rating were (from 1 to 4): 12, 36, 7, 41. Compute the log cumulative odds of each rating.

 CNT <- c(12,36,7,41)

for(i in 1:4){
  print(log(sum(CNT[1:i])/sum(CNT[(i+1):4])))
}

## [1] -1.94591
## [1] 0
## [1] 0.2937611
## [1] NA

12M2. Make a version of Figure 12.5 for the employee ratings data given just above.

prop= CNT/sum(CNT)

prop=sapply(1:4,function(i){sum(prop[1:i])})
plot(y=prop,x=1:4,type="b", ylim=c(0,1))
segments(1:4,0,1:4,prop)
for(i in 1:4){segments(i+0.05,c(0,prop)[i],i+0.05,prop[i], col = "pink")}

12M3. Can you modify the derivation of the zero-inflated Poisson distribution (ZIPoisson) from the chapter to construct a zero-inflated binomial distribution?

#  nCx * p^ * ⋅(1−p)^(n−x)

12H1. In 2014, a paper was published that was entitled “Female hurricanes are deadlier than male hurricanes.”191 As the title suggests, the paper claimed that hurricanes with female names have caused greater loss of life, and the explanation given is that people unconsciously rate female hurricanes as less dangerous and so are less likely to evacuate. Statisticians severely criticized the paper after publication. Here, you’ll explore the complete data used in the paper and consider the hypothesis that hurricanes with female names are deadlier. Load the data with:

data(Hurricanes)

data<-Hurricanes

M1 <- map(
  alist(
    deaths ~ dpois( lambda ),
    log(lambda) <- a + bF*femininity,
    a ~ dnorm(0,10),
    bF ~ dnorm(0,5)
  ) ,
  data=data)

M2 <- map(
  alist(
    deaths ~ dpois( lambda ),
    log(lambda) <- a,
    a ~ dnorm(0,10)
  ) ,
  data=data)

compare(M1,M2)

##        WAIC       SE    dWAIC      dSE    pWAIC       weight
## M1 4418.571 1001.309  0.00000       NA 127.7574 1.000000e+00
## M2 4456.914 1080.074 38.34348 144.3757  83.2533 4.718659e-09

plot<-ggplot(data=data, aes(y=deaths, x=femininity))+
               geom_point(color="blue")+
               stat_smooth(method=lm)+
               labs(x="Femininity", y="death")

plot

## `geom_smooth()` using formula 'y ~ x'

#The trend is not obvious as the femininity value goes up.

Acquaint yourself with the columns by inspecting the help ?Hurricanes. In this problem, you’ll focus on predicting deaths using femininity of each hurricane’s name. Fit and interpret the simplest possible model, a Poisson model of deaths using femininity as a predictor. You can use quap or ulam. Compare the model to an intercept-only Poisson model of deaths. How strong is the association between femininity of name and deaths? Which storms does the model fit (retrodict) well? Which storms does it fit poorly?

Assignment #10

Tenghe Li

2021-02-17

Chapter 12 - Monsters and Mixtures

Questions