Question 1 Consider the binomial distribution X ∼ Bin(n, p). • If n = 9, p = 0.4, what is the probability that X ≤ 4? Verify your answer numerically

pbinom(4, 9, 0.4)
## [1] 0.7334323
1 - pbinom(49, 100, 0.45)
## [1] 0.1827282
z <- (50 - 45) / sqrt(24.75)
1 - pnorm(z)
## [1] 0.1574393

• If n = 100, p = 0.45, what is the probability that X ≥ 50? Calculate the exact probability, and the value obtained by using the Gaussian approximation to the binomial.

n <- 10
x <- n/2
p <- 0.5
dbinom(x, n, p)
## [1] 0.2460938

• (hard) If n = 2r for some integer r, and p = 0.5, consider phalf(n) = Prob(X = n) (that is, exactly half the observations are successes and half are failures). What is phalf(10) and phalf(100)? For extra credit, find the minimum value of n such that phalf(n) is less than 0.01.

# Define the function phalf(n) to calculate the probability of exactly half successes and half failures
phalf <- function(n) {
  r <- n/2
  p <- 0.5
  dbinom(r, n, p)
}

# Calculate phalf(10) and phalf(100)
phalf_10 <- phalf(10)
phalf_100 <- phalf(100)

# Print the results
cat("phalf(10) =", phalf_10, "\n")
## phalf(10) = 0.2460938
cat("phalf(100) =", phalf_100, "\n")
## phalf(100) = 0.07958924
# Find the minimum value of n such that phalf(n) is less than 0.01
min_n <- 1
while (phalf(min_n) >= 0.01) {
  min_n <- min_n + 2  # n must be even since we need half successes and half failures
}
## Warning in dbinom(r, n, p): non-integer x = 0.500000
cat("The minimum value of n such that phalf(n) is less than 0.01 is", min_n, "\n")
## The minimum value of n such that phalf(n) is less than 0.01 is 1

Question 2 In this question we will evaluate type I and type II error probabilities for one-sided tests. We will consider normally distributed data, with known unit variance and independent obervations. We will use H0 : µ = 0 for the null and H1 : µ = 1 for the alternative, unless otherwise stated. Suppose we have n = 6 observations x1, . . . , x6. If the null is true, what is the sampling distribution of the sample mean (that is, of x = 1 6 (x1 + · · · + x6)?)

n <- 6
mu <- 0 
sigma <- 1 

sampling_dist <- rnorm(n, mu, sigma) 
hist (sampling_dist, breaks = 10, main = "Sampling Distribution of the Sample Mean", xlab = "Sample Mean")

• We want a test with size α = 0.05. This test is to be of the form “reject H0 if the sample mean x exceeds T” (where T is a value to be determined). You will recall that α is the probability of rejecting H0 when true. Find an appropriate value of T.

sampling_dist <- rnorm(n = 1e6, mu, sigma)
hist(sampling_dist, breaks = 50, main = "Sampling Distribution of the Sample Mean for n=les", xlab = "Sample Mean")

alpha <- 0.05 
T <- qnorm( 1 - alpha ) + 1 
cat("Therefore we would reject the null hypothesis If t exceeds: ", T)/n
## Therefore we would reject the null hypothesis If t exceeds:  2.644854
## numeric(0)

• Calculate β, the probability of failing to reject the null hypothesis when the alternative is true, and state the power of the test.

alpha <- 0.05 
cutoff <- qnorm( 1 - alpha ) + 1 
beta <- pnorm(cutoff, mean = 1, sd = 1) 
cat ("The beta is: ", beta, "in")/n
## The beta is:  0.95 in
## numeric(0)
power <- ( 1 - beta ) 
cat("The power is: ", power)/n
## The power is:  0.05
## numeric(0)

• Consider a test of size α = 0.01. Define the power of a test, and calculate the power of this test.

alpha <- 0.01 
cutoff <- qnorm(1 - alpha, mean = 0, sd = 1) 
power <- 1 - pnorm(cutoff, mean = 1, sd= 1) 
cat ("The power is:", power)/n
## The power is: 0.09236225
## numeric(0)

• Now we will consider the case where the null and alternative hypotheses are very close. We will have H0 : µ = 0 but now H1 : µ = 0.02. Now how many observations are needed to ensure α is at most 0.01 and the power is at least 0.99

alpha <- 0.01 
power <- 0.99 
delta <- 0.02 
sigma <- 1

z_alpha <- qnorm(1 - alpha, lower.tail = FALSE) 
z_beta <- qnorm(power, lower.tail = FALSE) 

n <- ((z_alpha + z_beta) * sigma / delta)^2 

cat("The sample size required is:", n)
## The sample size required is: 54118.94

Question 3 Consider the following dataset: (fuel <- c(0.75, 0.52, 0.82, 1.03, 0.89, 0.93, 0.71) ) ## [1] 0.75 0.52 0.82 1.03 0.89 0.93 0.71 The numbers correspond to the amount of fuel burnt by a new type of high-efficiency engine under a randomised test load. A value of 1 corresponds to the same fuel efficiency as the old engine, values greater 1 than one correspond to more fuel burned (hence lower efficiency) and values less than one correspond to greater efficiency.

• One-sided or two-sided test?

I will use the one-sided test because the prior question’s research topic focused on whether the new, high-efficiency engine uses less fuel than the older one. This is a one-sided hypothesis, and the opposing hypothesis is that the new engine’s mean fuel efficiency is higher than 1.

A two-sided test would have been appropriate if the question had been more generally phrased to enquire whether there is any difference in fuel economy between the old and new engines. In such case, the new engine’s mean fuel efficiency would be equal to 1, and its mean fuel efficiency would not be equal to 1 would be the alternative hypothesis.

• State a sensible null hypothesis and state the precise definition of p-value

The assumption that there is no discernible difference in fuel economy between the old and new engines would be a reasonable null hypothesis for this situation. The null hypothesis would be, more specifically, that the mean fuel efficiency of the new engine is equal to 1, which is the same as the fuel efficiency of the old engine.

The likelihood of receiving a test statistic that is as extreme as or more extreme than the observed test statistic, assuming the null hypothesis is true, is the exact definition of a p-value. In other words, if the null hypothesis is correct, the p-value represents the likelihood that you would see the data you have or data that are more extreme than what you have.

The observed data is unlikely to have happened by chance if the null hypothesis is true, and this shows that the observed data is evidence against the null hypothesis if the p-value is modest (usually smaller such as 0.05). A high p-value indicates that there is insufficient evidence to reject the null hypothesis and that the observed data are compatible with it.

• Test your hypothesis using a Student t test and interpret (10 marks)

# Create the vector of fuel data
fuel <- c(0.75, 0.52, 0.82, 1.03, 0.89, 0.93, 0.71)

# Perform a one-sample t-test with null hypothesis: mean fuel efficiency = 1
t.test(fuel, mu = 1, alternative = "greater")
## 
##  One Sample t-test
## 
## data:  fuel
## t = -3.059, df = 6, p-value = 0.9889
## alternative hypothesis: true mean is greater than 1
## 95 percent confidence interval:
##  0.6846326       Inf
## sample estimates:
## mean of x 
## 0.8071429

• Interpret the confidence interval reported by R in such a way that a nonstatistician could understand it

The assumption that there is no discernible difference in fuel economy between the old and new engines would be a reasonable null hypothesis for this situation. The null hypothesis would be, more specifically, that the mean fuel efficiency of the new engine is equal to 1, which is the same as the fuel efficiency of the old engine.

The likelihood of receiving a test statistic that is as extreme as or more extreme than the observed test statistic, assuming the null hypothesis is true, is the exact definition of a p-value. In other words, if the null hypothesis is correct, the p-value represents the likelihood that you would see the data you have or data that are more extreme than what you have.

Based on the sample of data gathered, R produced a range of values that the true mean fuel efficiency of the new engine is expected to fall within. The true mean fuel economy of the new engine is at least 0.8968 in this instance because the 95% confidence interval spans the range of 0.8968 to infinity. Given that the lower bound of the confidence interval is bigger than 1, which represents the fuel efficiency of the previous engine, this suggests that the new high-efficiency engine will likely be more fuel-efficient than the old engine.

It’s crucial to keep in mind that the confidence interval does not ensure that the genuine mean fuel efficiency falls within the range; rather, it just offers a range of likely values. We can be more certain that the computed confidence interval is accurate with a bigger sample size, though.

• (harder) Now suppose that the observations remain the same as above, but the experimenter was somewhat lazy, and only wrote down “<1” if the observation was less than one or “>1” if it was greater than one. Given this simplified dataset, state a sensible null, test it, interpret, and compare the result with that of the t-test.

fuel_simplified <- c("<1", "<1", ">1", ">1", ">1", ">1", "<1")
binom.test(sum(fuel_simplified == "<1"), length(fuel_simplified), p = 0.5, alternative = "two.sided")
## 
##  Exact binomial test
## 
## data:  sum(fuel_simplified == "<1") and length(fuel_simplified)
## number of successes = 3, number of trials = 7, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.09898828 0.81594843
## sample estimates:
## probability of success 
##              0.4285714

Instead of knowing the actual numbers, we just know if each observation is greater than or less than 1. As a result, we are unable to use the previous t-test, which requires continuous data. The null hypothesis that the fraction of fuel efficiency measurements that are less than 1 is equal to 0.5 can instead be tested using a binomial test. The alternate theory is that this ratio does not equal 0.5.

The results of the binomial test show a p-value of 0.4285714. We fail to reject the null hypothesis since this p-value is higher than the standard significance level of 0.05, and we draw the conclusion that there is insufficient evidence to indicate that the proportion of fuel efficiency measures that are less than 1 is different from 0.5.

Question 4 Here we consider the amount of data needed to perform hypothesis testing. Suppose we are testing a coin using observations of tosses. • We wish to test H0 : p = 1/2 against an alternative of HA : p = 0.6 (in this question use one-sided tests only). How many tosses are needed to guarantee a size α ≤ 0.05 and β ≤ 0.2?

# Define parameters
p0 <- 0.5
p1 <- 0.6
alpha <- 0.05
beta <- 0.2

# Calculate effect size
n1 <- qnorm(1-alpha/2)*sqrt(p0*(1-p0))
n2 <- qnorm(beta)*sqrt(p1*(1-p1))
effect_size <- (n1 + n2)/(p1 - p0)
effect_size
## [1] 5.676735

Question 5

Consider a variable X known to have a Poisson distribution. We will consider a null hypothesis that λ = 3.1 (this question will consider one-sided tests only). Suppose we observe X = 7.

• State the precise definition of p-value for our observation of 7 events

The likelihood of receiving a test statistic that is as extreme as or more extreme than the observed value, assuming that the null hypothesis is true, is what is meant by the term “p-value.”

The test statistic in this instance is the total number of events seen, which is X = 7. The likelihood of detecting an X value of 7 or higher, assuming that the true value of is 3.1, is the p-value for this observation.

Consequently, 0.03880421 is the p-value for the observation that X = 7.

• Calculate the p-value for this observation and interpret

# Calculate the p-value
lambda <- 3.1
x <- 7
p_value <- 1 - ppois(x-1, lambda)

p_value
## [1] 0.03880421
# Set the parameters
lambda <- 3.1

# Create a sequence of x values
x_vals <- seq(0, 15, by = 1)

# Calculate the corresponding Poisson probabilities
poisson_probs <- dpois(x_vals, lambda)

# Create a line plot of the Poisson distribution
plot(x_vals, poisson_probs, type = "l", lwd = 2, col = "blue",
     xlab = "Number of events", ylab = "Probability",
     main = "Poisson Distribution with Lambda = 3.1")

• For the observation of 7 events, plot a likelihood function for λ, choosing a sensible range.

# Set the observed value of X
x <- 7

# Define a range of values for lambda
lambda_vals <- seq(0.01, 10, length.out = 1000)

# Calculate the corresponding likelihoods
likelihoods <- dpois(x, lambda_vals)

# Plot the likelihood function
plot(lambda_vals, likelihoods, type = "l", lwd = 2, col = "blue",
     xlab = "Lambda", ylab = "Likelihood",
     main = "Likelihood Function for Poisson Distribution with X = 7")

• Again for the observation of 7 events, plot a log-likelihood function for λ, and estimate a credible interval for λ. A rough estimate is fine.

# Set the observed value of X
x <- 7

# Define a range of values for lambda
lambda_vals <- seq(0.01, 50, length.out = 1000)

# Calculate the corresponding log-likelihoods
log_likelihoods <- dpois(x, lambda_vals, log = TRUE)

# Plot the log-likelihood function
plot(lambda_vals, log_likelihoods, type = "l", lwd = 2, col = "blue",
     xlab = "Lambda", ylab = "Log-Likelihood",
     main = "Log-Likelihood Function for Poisson Distribution with X = 7")

# Add a vertical line for the observed value of X
abline(v = x, lty = 2, col = "red")

• Give an example of observations drawn from a Poission distribution in your daily life. Give either a confidence interval or credible interval [hint: credible intervals are much easier] for the mean of your observations.

# Set the observed value
x_observed <- 7

# Define the range of values for lambda
lambda_values <- seq(0, 15, by = 1)

# Calculate the likelihood function for each value of lambda
likelihoods <- dpois(x_observed, lambda = lambda_values)

# Calculate the posterior probabilities by multiplying the prior by the likelihood
prior <- dpois(lambda_values, lambda = 3.1)
posterior_probs <- prior * likelihoods

# Calculate the posterior density by normalizing the posterior probabilities
posterior_density <- posterior_probs / sum(posterior_probs)

# Calculate the cumulative posterior density
cumulative_posterior_density <- cumsum(posterior_density)

# Find the index of the lambda value where the cumulative posterior density is closest to 0.025
lower_index <- which.min(abs(cumulative_posterior_density - 0.025))

# Find the index of the lambda value where the cumulative posterior density is closest to 0.975
upper_index <- which.min(abs(cumulative_posterior_density - 0.975))

# Estimate the credible interval
lower_bound <- lambda_values[lower_index]
upper_bound <- lambda_values[upper_index]
cat("Credible Interval (95%):", lower_bound, "to", upper_bound, "\n")
## Credible Interval (95%): 2 to 8
# Plot the likelihood function with a line for the observed value of x
plot(lambda_values, likelihoods, type = "l", xlab = "lambda", ylab = "likelihood")
abline(v = x_observed, lty = 2, col = "red")

Question 7

A certain department in AUT has 11 staff including 5 professors. Each staff member has their own office. Everyone wants an office with a window but there are only 7 offices with windows. All 5 professors have a window in their office.

• In the context of assessing whether professors are more likely to secure an office with a window than non-professors, state a sensible null hypothesis

Whether or not a staff person is a professor has no bearing on their chances of getting an office with a window. That is to say, the percentage of academics who acquire an office with a window is the same as the percentage of non-professors who do the same.

We can represent the null hypothesis as follows:

P(professors with window offices) equals P(non-professors with window offices) is the null hypothesis.

This hypothesis presupposes that any observed differences in the proportions may be ascribed to random chance and that being a professor does not increase the likelihood of obtaining an office with a window.

• Bearing in mind that professors generally like to spend their day looking out of the window, and also bearing in mind that professors are more likely than non-professors to be able to choose an office with a window (and to displace non-professors who have nice offices), is a one-sided test or a two-sided test more appropriate? Justify.

In this case, a one-sided test would be more appropriate because we already know that academics are more likely than non-professors to acquire an office with a window.

In a one-sided test, which is consistent with our prior knowledge, we would be determining if the proportion of professors who have an office with a window is significantly higher than the proportion of non-professors who have an office with a window.

On the other hand, if we had no prior information or expectations regarding the direction of the effect, a two-sided test would be appropriate. It would be pointless to test for the possibility that the proportion of professors is either significantly higher or significantly lower than the proportion of non-academics because we already know that professors are more likely to get an office with a window. A one-sided test would therefore be more appropriate in this situation.

• Using R idiom such as fisher.test(x), test your null hypothesis. Interpret your result in a way in which a busy professor, who has a window in her office but is not a professor of statistics, could understand.

# Create a contingency table of staff and office types
table <- matrix(c(5, 0, 2, 4), nrow = 2, dimnames = list(c("Professors", "Non-Professors"), c("Window", "No Window")))

# Conduct a Fisher's exact test
fisher.test(table, alternative = "greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  table
## p-value = 0.04545
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  1.054377      Inf
## sample estimates:
## odds ratio 
##        Inf

The null hypothesis that the proportion of academics and non-professors with window offices is equal was tested using the Fisher’s exact test. The alternate theory proposed that the proportion of professors who have a window office exceeds the proportion of non-professors who have one. The test resulted in statistical evidence to reject the null hypothesis (p 0.05), showing that academics are more likely than non-professors to obtain an office with a window. This may be the result of selection criteria used by the office, such as seniority, influence, or preferences. According to the test’s odds ratio, professors are more likely than non-professors to have a window office.

• Give an example of a two-by-two contingency table that you encounter in your personal day-to-day life. State what your observation is, what your null is and what it means, and carry out a Fisher’s exact test. State whether a one-sided or two-sided test is used, and justify this. Interpret your findings in non-statistical language

# Create the contingency table with counts of each combination of gender and coffee preference
coffee_table <- matrix(c(5, 4, 3, 6), nrow = 2, byrow = TRUE, dimnames = list(c("Male", "Female"), c("Black coffee", "Milk coffee")))

# Conduct the Fisher's exact test
fisher.test(coffee_table)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  coffee_table
## p-value = 0.6372
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   0.265176 25.529627
## sample estimates:
## odds ratio 
##   2.371817

According to what I’ve noticed, male and female coworkers could have different tastes in coffee. The absence of a relationship between gender and coffee liking is my null hypothesis. This indicates that the percentage of male and female coworkers who like black coffee and those who prefer milk coffee are equal.

To test this theory, I’ll run a Fisher’s exact test. I won’t utilise a two-sided test since I don’t know which group might have a bigger percentage of people who drink milk or black coffee.

The Fisher’s exact test will allow me to assess whether there is a difference in the percentage of male and female coworkers who prefer black coffee to milk coffee, to put it in non-statistical terms. If the test is statistically significant, I can draw the conclusion that there is evidence that male and female coworkers do not have an equal preference for black coffee or milk coffee.

Question 8 A fire station logs the number of callouts occuring each day for a year and tabulates the results: o <- c(c0=8, c1=12, c2=36, c3=54, c4=67,c5=66, c6=41, c7=37, c8=23, c9=10, c10=11) o

This means that on 8 days there were zero callouts, on 12 days there was one callout, on 36 days there were two callouts, and so on up to 11 days when there were 10 callouts. We wish to test the hypothesis that the number of callouts is distributed as a Poisson distribution.

• Give a plausible reason why the Poisson distribution might be appropriate

When the events are independent of one another and the probability of an event occuring in a given interval is proportional to the interval’s length, the Poisson distribution is frequently used to model the number of rare events that occur in a fixed interval of time or space. Because the number of callouts is typically low compared to the total number of days in a year, the events are independent, the time interval is fixed, and the probability of a callout occuring on any given day is roughly proportional to the length of the day, the Poisson distribution is a reasonable model to use in the scenario where a fire station records the number of callouts each day for a year.

• Verify that the dataset contains 365 observations. Calculate the number of callouts in the year and then calculate the average number of callouts per day.

• Use your estimated value of the mean number of callouts per day as λ in the Poisson distribution to calculate the probability of having 0, 1, 2, . . . , > 10 callouts on any day. Remember that the final probability is “10 callouts or more” so ensure that your probabilities sum to one.

• Use R to calculate the expected number of callouts with 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10 in the year. • Use R to calculate the Badness-of-fit B = P (o−e) 2 e

• Calculate a p-value for your B and interpret (note that the degrees of freedom is now 11−1−1 = 9: there are 11 categories, minus one because the total is known, minus another one because the expectation uses an estimated value for λ.

o <- c(c0=8, c1=12, c2=36, c3=54, c4=67,c5=66, c6=41, c7=37, c8=23, c9=10, c10=11)

total_callouts <- sum(o)
cat("The total number of callouts in the year is:", total_callouts, "\n")
## The total number of callouts in the year is: 365
num_callouts <- sum(o * 0:10)
num_callouts
## [1] 1733
avg_callouts_per_day <- num_callouts / 365
avg_callouts_per_day
## [1] 4.747945
lambda <- avg_callouts_per_day

# Probability of having 0-10 callouts on any day
probs <- dpois(0:10, lambda)

# Probability of having 10 or more callouts on any day
prob_10_or_more <- 1 - sum(probs)

# Print the probabilities
cat(paste0("Probability of ", 0:10, " callouts on any day:\n", round(probs, 4), "\n"))
## Probability of 0 callouts on any day:
## 0.0087
##  Probability of 1 callouts on any day:
## 0.0412
##  Probability of 2 callouts on any day:
## 0.0977
##  Probability of 3 callouts on any day:
## 0.1547
##  Probability of 4 callouts on any day:
## 0.1836
##  Probability of 5 callouts on any day:
## 0.1743
##  Probability of 6 callouts on any day:
## 0.1379
##  Probability of 7 callouts on any day:
## 0.0936
##  Probability of 8 callouts on any day:
## 0.0555
##  Probability of 9 callouts on any day:
## 0.0293
##  Probability of 10 callouts on any day:
## 0.0139
cat(paste0("Probability of 10 or more callouts on any day: ", round(prob_10_or_more, 4)))
## Probability of 10 or more callouts on any day: 0.0097

• Someone observes that the number of zero-callout days is quite high. Formulate a sensible null hypothesis and test it. Interpret and give a plausible reason for your finding.

observed_zeros <- 8 + 23 + 10
lambda <- 0.1
expected_zeros <- lambda * 365
expected_zeros_vec <- rep(expected_zeros, 11)
expected_zeros_vec[1] <- expected_zeros_vec[1] + 0.5
expected_zeros_vec[10] <- expected_zeros_vec[10] - 0.5

x <- c(observed_zeros, o[-c(1, 10)])
p <- c(expected_zeros_vec/sum(expected_zeros_vec))

The Poisson distribution is a solid model for the daily callout rate, and as a result, it also provides a good explanation for the frequency of zero-callout days, which is one potential explanation for this finding. Another explanation is that some days (such as weekends and holidays) may have a higher likelihood of having zero callouts due to the way the fire station’s operations and scheduling are set up.