Mock Exam: Advanced Statistics 2

Mock exam

50 mins for mock exam
Password: 2mock
We can’t give technical help with Rstudio during the exam
Some questions in the mock exam are meant to stretch your grade potential to higher bands
Include at minimum 3 decimal places, but you can copy and paste the R output
Mock is available until next Friday; you ahve infinite attempts

Notes

(Record this part of the session)

Question 1

Q: You toss an unbiased coin three times. What is the probability of getting at least one TAIL?

A: If we look at all possible outcomes of tossing an unbiased coin three times:

  Toss1 Toss2 Toss3
1     H     H     H
2     T     H     H
3     H     T     H
4     T     T     H
5     H     H     T
6     T     H     T
7     H     T     T
8     T     T     T

We can see: (1) 8 possible outcomes (because \(2^3\)), (2) only 1 possible outcome with no TAIL.

Seven outcomes include TAIL: therefore the probability of getting at least one TAIL with three tosses is:

1 - 1/8

[1] 0.875

Question 1

Using the product rule:

Probability of TAIL (or HEAD)

1/2

[1] 0.5

Probability of 3 HEADs in a row (and therefore no TAIL)

1/2 * 1/2 * 1/2 # or 1/2^3

[1] 0.125

The probability of at least one TAIL is therefore

1 - (1/2 * 1/2 * 1/2)

[1] 0.875

Question 2

Q: An urn contains 9 red marbles and 6 black ones. You reach in and randomly draw a marble. You note down what colour it is and you do not put it back in. You then draw again. What is the probability of drawing a red marble first and then drawing another red one?

A: Conditional probability using the product rule for independent events:

\(P(\text{Red on 2nd} \mid \text{Red on 1st}) = P(\text{Red on 1st}) \times P(\text{Red on 2nd})\)

Question 2

Assign given information:

n_red_marbles <- 9 # red marbles
n_black_marbles <- 6 # black marbles
n_marbles <- n_red_marbles + n_black_marbles #  total of marbles in urn

Question 2

Probability of red marble on first draw:

p_red_marble_1st <- n_red_marbles / n_marbles

[1] 0.6

Question 2

Probability of red marble on second draw. First red marble was NOT put back inside the urn (see question):

p_red_marble_2nd <- (n_red_marbles - 1) / (n_marbles - 1)

[1] 0.5714286

Question 2

Conditional probability of red marble following another red marble

p_red_marble_1st * p_red_marble_2nd

[1] 0.3428571

Question 3

Q: If the probability of an event happening is 0.43, what is the odds of the event happening?

A: \(\text{odds} = \frac{P}{(1-P)}\)

0.43 / (1 - 0.43)

[1] 0.754386

Question 4

Q: If the log odds of something happening is 2, what is the probability that it will happen?

A: Convert log odds to odds to probability \(\frac{1}{1 + \exp(-2)}\) or use plogis:

plogis(2)

[1] 0.8807971

Question 5

Q: Using the affairs.csv file \([\dots]\) [r]un a binary logistic model with had_affair as an outcome, and gender, children, and religiousness as predictors.

What is the coefficient for the religiousness variable?

library(tidyverse)

df <- read_csv('https://raw.githubusercontent.com/mark-andrews/ntupsychology-data/main/data-sets/affairs.csv')

df <- mutate(df, had_affair = affairs > 0)

model <- glm(had_affair ~ gender + religiousness + children,
             data = df,
             family = binomial(link = "logit"))

b <- coef(model)
b['religiousness']

religiousness 
   -0.3052493

Question 6

Q: Without using predict, what is the odds of having an affair for a male without children who has a religiousness score of 5?

log_odds <- b[1] + b['gendermale'] * 1 + b["childrenyes"] * 0 + b["religiousness"] * 5
exp(log_odds)

[1] 0.1050485

Question 7

Q: Without using predict, what is the probability of having an affair for a female with children who has a religiousness score of 5?

log_odds <- b[1] + b['gendermale'] * 0 + b["childrenyes"] * 1 + b["religiousness"] * 5
plogis(log_odds)

[1] 0.1700162

Question 8

Q: What is the deviance for the logistic regression model, i.e. the model with gender, children, and religiousness as predictor?

deviance(model)

[1] 649.3511

Question 9

Q: A random variable \(x\) is distributed according to a Poisson distribution with rate parameter \(\lambda =\) 2.8. What is the probability that \(x\) takes the value of 5?

dpois(5, 2.8)

[1] 0.08721363

Question 10

Q: \([\dots]\) Perform a Poisson regression that models the number of cigs smoked as a function of age and ethnicity. What is the factor by which the rate of cigs smoked changes as we go from non_white to white?

df2 <- read_csv("https://raw.githubusercontent.com/mark-andrews/ntupsychology-data/main/data-sets/smoking.csv")

df2 <- mutate(df2, ethnicity = ifelse(white == 1, 'white', 'not_white'))

model2 <- glm(cigs ~ age + ethnicity,
             data = df2,
             family = poisson(link = 'log'))

exp(coef(model2)['ethnicitywhite'])

ethnicitywhite 
      1.032144

Question 11

Q: What is the upper limit of the 95% confidence interval for the logarithm of the rate of cigs smoked for the predictor age?

confint.default(model2, parm = 'age')[2]

[1] -0.002505636

Question 12

Q: Rerun the above model using a zero-inflated Poisson model. What is the probability of someone who is 20 years old and white being in the zero distribution group?

library(pscl)

model3 <- zeroinfl(cigs ~ age + ethnicity,
  dist = 'poisson',
  data = df2)

b <- coef(model3, model = 'zero')

log_odds <- b[1] + b['age'] * 20 + b["ethnicitywhite"] * 1
plogis(log_odds)

[1] 0.5541451

Question 13

Q: Use a Vuong test to test if the zero-inflated Poisson model is a better model than the original Poisson. What is the absolute value of the Z test statistic for this null-hypothesis test?

Note. “Absolute” means a value with no sign (e.g., the absolute value of -2 and 2 is 2).

vuong(model3, model2)

Vuong Non-Nested Hypothesis Test-Statistic: 
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A    p-value
Raw                    27.35054 model1 > model2 < 2.22e-16
AIC-corrected          27.33725 model1 > model2 < 2.22e-16
BIC-corrected          27.30606 model1 > model2 < 2.22e-16

# Raw   27.35054

Question 14

Q: Using the sleepstudy data in the lme4 package, perform a multilevel linear model with Reaction as outcome variable, Days as fixed effect predictor, and random intercepts that vary by Subject.

What is the coefficient for the fixed effects slope coefficient?

library(lme4)
model4 <- lmer(Reaction ~ Days + (1|Subject), data=sleepstudy)
fixef(model4)['Days']

    Days 
10.46729

Question 15

Q: What is the standard deviation of the inter subject variability (i.e., the standard deviation of the parent normal distribution)?

VarCorr(model4)

 Groups   Name        Std.Dev.
 Subject  (Intercept) 37.124  
 Residual             30.991

# Subject  (Intercept) 37.124