Homework 9

Worked with Brogan Pietrocini and Kaden Buckley

The following table displays the time (measured continuously in minutes) until the first goal was scored in each of 20 professional hockey games. The sample mean is 12.1 minutes, the sample median is 8.9 minutes, and the sample SD is 13.1 minutes.

a. You wish to estimate the population median time until the first goal is scored. Describe in detail in words how you could use the sample data and simulation to find an appropriate 95% bootstrap percentile confidence interval.

Step 1: Create a spinner with all 20 values of time until the first goal is scored, each with equal sectors.

Step 2: Spin spinner 20 times to simulate a single sample.

Step 3: Find the sample median of the simulated values.

Step 4: Repeat many times, recording the sample median for each simulated sample.

Step 5: Find the SD of the simulated sample medians to approximate the standard errors.

Step 6: Based on the simulation results, the observed sample median +- 2 SE will return a 95% confidence interval.

b. Code and run the simulation from the previous part. Summarize the results; provide a histogram of the bootstrap distribution, the value of the bootstrap SE, and a 95% confidence interval.

data = read.csv("hockey.csv")

x = c(data$minutes)

n = length(x)

Nrep = 10000

M = rep(NA, Nrep)

for (r in 1:Nrep) {
  y = sample(x, n, replace = TRUE)
  M[r] = median(y)
}

hist(M)

sd(M)

## [1] 2.102498

median(x) + 2 * c(-1, 1) * sd(M)

## [1]  4.695004 13.104996

quantile(M, c(0.05, 0.95))

##   5%  95% 
##  4.4 11.7

c. Using the above information, compute the endpoints of each of the following bootstrap 95% confidence intervals for the population median.

i. Normal Interval: sample median +- 2SE

=8.90+-2(2.1)=(4.7, 13.1)

ii. Percentile interval: (4.15, 12.70)

d. Write a clearly worded sentence reporting the bootstrap percentile confidence interval from the previous part in context.

We estimate with 95% confidence that the population median time until first goal is scored is between 4.15 and 12.7 minutes for professional hockey games.

Two different machines (A and B) that fill packages of a certain candy are calibrated to a weight of 50 grams. Naturally, the weights of individual packages vary somewhat in the production process, but too much variation is undesirable. A small sample of packages is taken from each machine; the weights (grams) are in the following table.

Machines: Bootstrap

a. Describe in detail in words how you could use the sample data and simulation to find a 95% bootstrap percentile confidence interval for the ratio of the variances of packages weights for the two machines.

Step 1: Create 2 spinners, one for 5 equally likely values of Machine A’s weights, and another for Machine B’s weights. Spin each 5 times to simulate one sample.

Step 2: Find the ratio of variances of package weights of the sample by taking the variance of the simulated Machine A samples divided by the variance of simulated Machine B samples.

Step 3: Repeat many times, recording the ratio of variances of package weights for each simulated sample.

Step 4: Find the 0.025 percentile of all simulated values and the .975 percentile of all simulated values to get a 95% confidence interval of the ratio of variances of package weights.

b. Write code to implement the procedure from the previous part, and run the simulation to find a 95% bootstrap percentile confidence interval. Include your code and output.

# Define the data for machines A and B
machine_A <- c(47.1, 48.7, 50.1, 50.2, 50.5)
machine_B <- c(48.6, 50.5, 50.6, 51.4, 52.0)

# Define sample sizes for each machine
n_a <- length(machine_A)
n_b <- length(machine_B)

# Set the number of simulation repetitions
N_reps <- 10000

# Initialize vectors to store sample variances and variance ratios
sample_var_a <- numeric(N_reps)
sample_var_b <- numeric(N_reps)
var_ratio <- numeric(N_reps)

# Perform the simulation
for (r in 1:N_reps) {
  sample_a <- sample(machine_A, n_a, replace = TRUE)
  sample_b <- sample(machine_B, n_b, replace = TRUE)
  
  sample_var_a[r] <- var(sample_a)
  sample_var_b[r] <- var(sample_b)
  
  var_ratio[r] <- sample_var_a[r] / sample_var_b[r]
}

# Create a histogram of variance ratios
hist(var_ratio, breaks = 500, xlim = c(0, 25))

# Calculate and print the 95% confidence interval for the variance ratio
quantile(var_ratio, c(0.025, 0.975))

##        2.5%       97.5% 
##  0.01634383 17.42264648

c. Write a clearly worded sentence reporting the bootstrap percentile confidence interval from the previous part in context.

From the bootstrap percentile CI, we estimate with 95% confidence that the true ratio of the variances of package weights for Machine A are between 0.015 to 15.48 times larger than variances for Machine B.

Same setup as the previous problem.) Two different machines (A and B) that fill packages of a certain candy are calibrated to a weight of 50 grams. Naturally, the weights of individual packages vary somewhat in the production process, but too much variation is undesirable. A small sample of packages is taken from each machine; the weights (grams) are in the following table. Describe in detail how you could use simulation to approximate the p-value of the permutation test which uses the ratio of variances as the test statistic.

The null hypothesis states that there is no significant difference between the variances in weights of the 2 machines.

Step 1: We use 10 cards, with each value form the table written on them.

Step 2: Shuffle and deal 5 to represent Machine A, then deal the remaining 5 to represent Machine B, no replacement.

Step 3: After shuffling and dealing find the variances of Machine A and B, and the ratio of variances of Machine A to variances of Machine B by calculating Var(x)Machine A/Var(x)Machine B.

Step 4: Repeat many times, each time recording the hypothetical ratio of variances.

Step 5: To estimate the p-value: find the proportion of repetitions for which the difference in ratio of variances is as extreme as 1.96/1.69=1.16.

Homework 9

Oliver Sheehan

Machines: Bootstrap