We start off by loading the sample data with the repair times of Atlas Telecommunications and summarizing it:

atlas <- read.csv(file = "../data/atlascom.csv")

str(atlas)

summary(atlas)

There are 1664 data entries, the mean value is 8.522 minutes, while the median time is 3.63 minutes.

Visualize the distribution of Atlas’ repair times, marking the mean with a vertical line

# Taking into account only the Time variable
time <- atlas$Time

plot(density(time), main = "Atlas Repair Times", lwd = 2, col = "tomato")
abline(v = mean(time), lty = "dashed")

The plot shows that most of the repair times are contained within 10 minutes or so, however, there are some scenarios in which the repair times increase even up to 3 hours.

PUC checks: Formulate the null and alternative hypotheses for testing Atlas’ claim about their average repair time

Let’s suppose that Atlas claims that it takes an average 7.6 minutes to repair phone services its customers. Given that the New York Public Utilities Commission (PUC) would like to check for any deviations from such claim, we suppose a two-tailed scenario, where the null hypothesis states that the average repair time equals 7.6 minutes, and the alternative hypothesis states that the average repair time is different from 7.6 minutes. Hence:

2. Hypothesis Testing

Calculate the estimated population mean and construct a 99% confidence interval for it

Earlier we saw that the population mean is 8.522 minutes. We can use this estimate to build a confidence interval:

# Computing mean, standard deviation and standard error
mean_time <- mean(time)
sd_time <- sd(time)
se <- sd_time/sqrt(length(time))

# 99% CI
mean_time + c(-2.58, 2.58) * se

The interval indicates that we can be 99% confident that the mean repair time lays between 7.59 and 9.45 minutes. This interval barely includes the mean repair time of our null hypothesis.

Perform a t-test to find the t-statistic and p-value, assessing the validity of Atlas’ repair time claim

The t-test is a perfect instrument for evaluating whether there is a significant difference between Atlas’ claimed repair time and the actual computed mean. A t-statistic is calculated by subtracting the null hypothesis’ mean from the actual computed mean, and then dividing the result by the standard error:

# Null hypothesis' mean 
mu_0 <- 7.6

# Computing the t-statistic
t_stat <- (mean_time - mu_0)/se
t_stat

We found a t-statistic equal to 2.56. Let’s also compute the t-statistic’s p-value:

# Computing the degrees of freedom
df <- length(time) - 1

# Computing the p-value
p_value <- 1 - pt(t_stat, df)
p_value

Interpret the result

The t-statistic is the ratio of departure of the estimated value of the mean from its hypothesized value to its standard error. If the t-statistic falls out of the 99% CI range (-2.58, 2.58; 3 standard deviations to the left of the mean, and 3 standard deviations to the right) there is a likelihood that the null distribution of t (which is centered on 0) is not the proper distribution for our data. Then, we’d have to consider the alternative distribution, which shares the same shape with the null distribution. Our t-statistic (2.56) is basically on the left boundary of the 3-standard deviations interval, (-2.58, 2.58).

As the number of degrees of freedom is large, we can approximate the t-distribution with a normal distribution. The p-value helps us determine whether or not the t-statistic is statistically significant. In our scenario, we are conducting a two-tailed test, which means that the critical p-value must be halved (it becomes 0.005 for a 99% confidence case). We saw that our p-value was 0.0053, hence slightly bigger than the critical p-value, which according to theory means that we cannot reject the null hypothesis, leaving us with a borderline case. That would mean that the claimed repair time of Atlas is true if we consider a 99% confidence, but given the borderline scenario we encountered, this conclusion is debatable and shall be taken with a pinch of salt.

3. Bootstrap Analysis

We have performed a traditional t-test which left us with a non-rejection. Now, we are going to employ bootstrapping on three statistics (means, difference of means, t-interval) and see whether all methods share the same outcome as the traditional t-test.

Bootstrapped Means: Estimate the 99% confidence interval of the average repair time

We start off with the bootstrapped means. We create a function that takes a sample of the original data with replacement and calculates the mean of the sample. Then, we use replicate() to repeat this process 2000 times, which grants us with 2000 means and allows us to compute a 99% confidence interval of the mean:

# 2000 bootstraps
boots <- 2000

# Bootstrapped means function
compute_sample_mean <- function(sample0) {
  resample <- sample(sample0, length(sample0), replace = TRUE)
  mean(resample)
}

# Setting seed
set.seed(150)

# Computing the 99% CI of the mean
boot_99 <- replicate(boots, compute_sample_mean(time))
perc_99_CI <- quantile(boot_99, probs = c(0.025, 0.995))
perc_99_CI

The interval above does not include the hypothesized average repair time, which means that we can reject the null hypothesis and point out that the average repair time is different from 7.6 minutes. Let’s see if the next methods lead to the same conclusion.

Bootstrapped Difference of Means: Calculate the 99% confidence interval of the difference between the population mean and the hypothesized mean

For the difference of means, we create a function with a similar structure to the one in the section above; however, instead of computing the mean, we subtract the hypothesized mean of 7.6 minutes from the bootstrapped mean. Then, we use replicate() to repeat the process 2000 times and compute a 99% confidence interval of the differences:

# Bootstrapped differences function
boot_mean_diffs <- function(sample0, mean_hyp) {
  resample <- sample(sample0, length(sample0), replace = TRUE)
  return(mean(resample) - mean_hyp)
}

# Setting seed
set.seed(350)

# Computing the 99% CI of the differences
mean_diffs <- replicate(boots, boot_mean_diffs(time, mu_0))
diff_99_CI <- quantile(mean_diffs, probs = c(0.005, 0.995))
diff_99_CI

As zero is not included in the interval, we can reject the null hypothesis and say that the average repair time is different from 7.6 minutes. Indeed, if the claimed repair time was actual, there would not be a significant difference between that time and the boostrapped means (hence the difference would have been much closer to zero, and the interval would have likely contained zero).

Bootstrapped t-interval: Determine the 99% confidence interval of the bootstrapped t-statistic

Our last bootstrap sees the computation of multiple t-statistics after the data is resampled. Again, we bootstrap 2000 times. The boot_t_stat() function takes care of resampling the data, computing the difference between the computed mean from the resampled data and hypothesized mean, computing standard error and finally the t-statistic:

# Bootstrapped t-stat function
boot_t_stat <- function(sample0, mean_hyp) {
  resample <- sample(sample0, length(sample0), replace = TRUE)
  diff <- mean(resample) - mean_hyp
  se <- sd(resample)/sqrt(length(resample))
  return(diff/se)
}

# Setting seed
set.seed(450)

# Computing the 99% CI of the t-statistics
t_boots <- replicate(boots, boot_t_stat(time, mu_0))
t_99_CI <- quantile(t_boots, probs = c(0.005, 0.995))
t_99_CI

Even in this case, zero is not included in the interval, which means that we can reject the null hypothesis and say that the average repair time is different from 7.6 minutes.

Plot the distribution for each of the bootstrapped analyses

# Bootstrapped means
plot(density(boot_99), main = "Bootstrapped Means", ylab = "", xlab = "Minutes")
abline(v = mean(boot_99), lwd = 1.5, col = "tomato") # mean of bootstrapped means
abline(v = perc_99_CI, lwd = 1.5, col = "cornflowerblue") # confidence interval boundaries
abline(v = mu_0, lwd = 1.5, lty = "dashed", col = "purple") # 7.6 minutes claim

The plot confirms how the claimed average repair time is out of the 99% confidence interval built from the bootstrapped means.

# Bootstrapped difference of means
plot(density(mean_diffs), main = "Bootstrapped Difference of Means", ylab = "", xlab = "Minutes")
abline(v = mean(mean_diffs), lwd = 1.5, col = "tomato") # mean
abline(v = diff_99_CI, lwd = 1.5, col = "cornflowerblue") # confidence interval boundaries
abline(v = 0, lwd = 1.5, lty = "dashed", col = "purple") # zero

Similarly, zero is not included in the 99% confidence interval of the bootstrapped difference of means.

# Bootstrapped t-statistics
plot(density(t_boots), main = "Bootstrapped t-statistics", ylab = "", xlab = "Minutes")
abline(v = mean(t_boots), lwd = 1.5, col = "tomato") # mean of bootstrapped t-statistics
abline(v = t_99_CI, lwd = 1.5, col = "cornflowerblue") # confidence interval boundaries
abline(v = 0, lwd = 1.5, lty = "dashed", col = "purple") # zero

And finally, the last visualization also shows that zero is (barely) not included in the confidence interval generated from bootstrapped t-statistics.

4. Comments and Implications

Do the t-test and the bootstrapping methods agree with each other on the outcome?

The traditional t-test leads to the conclusion that we cannot reject the null hypothesis. According to the three bootstrap analyses, we can reject the null hypothesis and conclude that the mean repair time of Atlas is not equal to 7.6 minutes, even though it’s worth noting that the results might have been lightly affected by the seed choice.

Implications

The comprehensive analysis of Atlas Telecommunications’ service repair times, encompassing both traditional statistical tests and advanced bootstrapping methodologies, provides a nuanced understanding of the company’s performance against its claims. While the t-test and bootstrapped t-statistic present borderline results, indicating a close call in directly supporting or refuting Atlas’ claims, the bootstrapped analyses of means and difference of means offer a more decisive stance on rejecting the null hypothesis. This suggests that, statistically, Atlas’ actual repair times may not align with its claimed performance.

Considering bootstrap, each analysis we ran showed that the critical value of interest always stood somewhere on the left tail of the distributions, which means that Atlas’ repair times are likely higher than the claimed 7.6 minutes on average. The company should consider the following:

Atlas Telecommunications Service Quality Analysis

1. Data Visualization and Hypothesis Formulation