A Comparison of the Traditional Confidence Interval, the Nonparametric Bootstrap Confidence Interval, and the Parametric Bootstrap Confidence Interval in R with a Walkthrough of the Theory

By Michael Archibeque

Key Words: Bootstrap, Bootstrapping, Glivenko-Cantelli, Nonparametric, Parametric, Confidence Interval

Abstract: This article explores the different theoretical assumptions necessary to execute a Traditional Confidence Interval, Nonparametric Bootstrap Confidence Interval, and Parametric Bootstrap Confidence Interval. Throughout this article the same data set is used to demonstrate the procedures in the R programming language so that it can be shown that the three methods produce similar results when the data set happens to have the same known probability distribution. The Cumulative Distribution Function (CDF), the Empirical Cumulative Distribution Function (ECDF), and the Glivenko–Cantelli Theorem are discussed.

Introduction

A data set from a random experiment is a collection of random variates. These random variates are numeric values assigned by functions called random variables to represent the outcomes of experiment trials (Hogg, Tanis, & Zimmerman, 2015, p. 41). The outcomes of random variables, known as statistical relations, unlike functional relations, are subject to variation according to a probability distribution (Kutner, Nachtsheim, & Neter, 2004, p. 3). However, so long as one trial does not influence the outcome of another trial and each trial was conducted with the same procedure, these random variables should be independent and have identical probability distributions. Thus, the data set should contain information from which statistical inferences and estimates can be made. Within this paper, using the definitions above as a given, the assumptions of the traditional Confidence Interval, the Nonparametric Bootstrap Confidence Interval, and the Parametric Bootstrap Confidence Interval methods will be explored, and a contrast will be created by demonstrating how the three methods use the same collection of random variates to come to similar conclusions with varying amounts of assumptions about the underlying distribution.

Throughout this paper, the specific example used will be a study of the differences in heights of young men when measured in the morning in comparison to when measured in the evening. 41 students at a boarding school were used as subjects. Each student’s height was measured (in millimeters) in the morning and in the evening. Every student was taller in the morning. The differences in heights between the morning and the evening for each individual subject are the variates. The random variables are the unseen processes that change the height from morning to evening. The distribution of these random variables is unknown, but we know they are independent and identically distributed because the height of one subject does not influence the height of another and the procedure of the experiment was conducted the same way for each observation.

In R (R version 3.1.2, 2014), the statistical programming language used throughout this article, the data set for this study appears as below. Here x is a vector with 41 elements that each contain the differences between the participants’ morning and evening heights in millimeters (Suess & Trumbo, Unpublished 2nd Edition).

x = c(8.50, 9.75, 9.75, 6.00, 4.00, 10.75, 9.25, 13.25, 10.50,
12.00, 11.25, 14.50, 12.75, 9.25, 11.00, 11.00, 8.75, 5.75,
9.25, 11.50, 11.75, 7.75, 7.25, 10.75, 7.00, 8.00, 13.75,
5.50, 8.25, 8.75, 10.25, 12.50, 4.50, 10.75, 6.75, 13.25,
14.75, 9.00, 6.25, 11.75, 6.25)

The Traditional Confidence Interval

In traditional statistics, in order to make inferences from the data some assumptions must be made. Chief among these assumptions is the assumption of the distribution of the random variables, not just for the subjects in the experiment, but for all those that might be subject to the experimental condition regardless of whether or not they participated in the study.

There are a few clues to the type of distribution that influence the outcome of a random variable. For example, probability distributions generally apply to only one of two types of data, either discrete or continuous, so this information can eliminate the possibility of many distributions (Hogg, Tanis, & Zimmerman, 2015, pp. 43,88). Then there are ways of testing to see if the data conforms to the general pattern of the presumed distribution.

For our specific example, the fractional values of the measurements indicate this is continuous data, and the Normal Distribution is among those probability distributions that may apply, so we can test the assumption of Normality.

To test the assumption of Normality, we plotted a Quantile-Quantile Plot (Q-Q Plot) and performed a Shapiro-Wilk Test. When the quantiles of the data are plotted against the Normal quantiles, the data quantiles followed the same pattern as the Normal Distribution quantiles.Furthermore, the Shapiro-Wilk test, which has a null hypothesis of Normality, produces a p-value that causes us to fail to the reject the null hypothesis and continue with the assumption of Normality.

# Q-Q Plot of the Data
qqnorm(x, datax=TRUE) 
# Q-Q Line of the Data
qqline(x, datax=TRUE)

# Shapiro-Wilk Test
shapiro.test(x)

## 
##  Shapiro-Wilk normality test
## 
## data:  x
## W = 0.98383, p-value = 0.817

With the assumption of Normality reasonably supported, traditional statistics would allow for inferences and estimates to be made about the population with or without knowing the mean and/or standard deviation of the underlying population using the T Distribution in substitute of the Normal Distribution.

For this particular example, a Traditional Confidence Interval can be constructed using the T Distribution, since the population mean and standard deviation are unknown but the data behaves similarly to the Normal Distribution, using whichever level of confidence selected that the true mean difference between morning and evening heights is contained within the interval by way of the formula:

\[\bar{x} \pm t_{\alpha/2}*\frac{s}{\sqrt{n}}\] where \[s^2 = \frac{\Sigma_{i}(x_{i}-\bar{x})^2}{(n-1)}\]

The basic structure of the Traditional Confidence Interval is:

Collect a random sample \(x_i\) of size \(n\) from the population.
Inspect the data to determine if the data is discrete or continuous to eliminate potential probability distributions.
Select the best potential candidate probability distribution to explain the outcomes of the random variables.
Inspect and/or statistically test the data to confirm there is reasonable support for the assumption that the data behaves according to the selected probability distribution.
Make Maximum Likelihood Estimations (MLE) for the parameters of the probability distribution determined to match the general behavior of the data.
Use the lower and upper quantiles associated with the desired level of confidence for the assumed probability distribution with MLE parameters inserted to create an interval where it is believed the true value of interest should lay.

# Traditional Confidence Interval of the Mean Difference
n = length(x);                                    #The number of elements in vector x
x.bar = mean(x);                                  #The mean of the values in vector x
s = sd(x)                                         #The standard deviation of the values in vector x
                                                  #    Lower and Upper Limits of the Confidence Interval
x.bar - qt(c(.975,.025), n-1) * s / sqrt(n)       #using the T-Distrution with MLEs inserted

## [1]  8.734251 10.460871

# Traditional T-Test with CI using the Base R function 
t.test(x)

## 
##  One Sample t-test
## 
## data:  x
## t = 22.469, df = 40, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   8.734251 10.460871
## sample estimates:
## mean of x 
##  9.597561

From this outcome, we can conclude that we are 95% confident that the true mean difference in height is between approximately 8.73 and 10.46 millimeters.

However, what happens if all indications are that the data is not Normally Distributed or any other known distribution? Traditionally, the T Distribution based Confidence Interval can continue to be used if the data is continuous and the population is thought to be mound-shaped. Yet, if the distribution is heavy-tailed, skewed, or there is no recognizable shape, then nonparametric methods need to be employed (Ott & Longnecker, 2010, p. 256).

The Nonparametric Bootstrap (NPB)

Among these nonparametric tests is the Nonparametric Bootstrap (NPB). The central idea of the NPB is to resample from the original data set, thus producing a larger number of replications from which the sampling distribution can be approximated (Ott & Longnecker, 2010, p. 259). All that is known about the population is that it is capable of producing the n particular observations at hand, and all that is assumed is that the data was sampled at random from this unknown population, the random variables are independent and identically distributed, and there is enough useful information within the data to approximate the Cumulative Distribution Function (CDF) of the population.

The approximation of the CDF is done by using information from the Empirical Cumulative Distribution Function (ECDF), which is the distribution of the probability of the actual data points without additional information, denoted as \[F_n (X)\] (Hogg, Tanis, & Zimmerman, 2015, p. 227). It is defined as:

\[F_n (X) = \frac{1}{n} ∑_{i=1}^nI_{(x_i\in{A})}(X_i)\]

where

\[I_{(x_i\in{A})} \]

is an indicator function defined as:

\[I(x)= \left\{ \begin{array}{rcl} \mbox{1} & \mbox{if} & \mbox{x} & {\in} & \mbox{A} \\ \mbox{0} & \mbox{if} & \mbox{x} & {\notin} & \mbox{A} \end{array}\right.\]

where A is a subset of all possible X.

But how is this ECDF useful? According to the Glivenko–Cantelli Theorem:

\[\|F_n-F\|_\infty=_{x\in R}^{sup}|F_n(x)-F(x)|\rightarrow0\ \text{almost surely}\]

the uniform norm of the difference between the piecewise ECDF, known as \(F_n\), and the pointwise CDF, known as \(F\), is equal to the supremum of the difference between \(F_n\) and \(F\), which almost surely approaches zero as \(n→∞\). Thus, there will eventually be no difference between the ECDF and the underlying CDF as n increases towards infinity. So then, it is possible to use the limited information to create estimates of the parameters of the underlying CDF by “relying on the analogy between the sample and the population” (Mooney & Duval, 1993, p. 11).

plot.ecdf(x)

curve(pnorm(x,mean(x), sd(x)))

plot.ecdf(x)
curve(pnorm(x,mean(x), sd(x)), add = TRUE)

The basic structure of the Nonparametric Bootstrap Method is:

Collect a random sample \(x_i\) of size \(n\) from the population.
From the original random sample \(x_i\) select a resample of size \(n\) with replacement called \(x_i^*\).
Create an estimate of the desired parameter \(\hat{θ}\) using the bootstrap sample \(x_i^*\). This parameter is called \(\hat{θ}^*\).
Repeat step 2 and 3 a large number of times \(B\).
Estimate the quantiles of the distribution of \(\hat{θ}^*\) in order to construction a confidence intervals from which to make estimates. (Efron & Tibshirani, An Introduction to the Bootstrap, 1993, p. 47) (Ott & Longnecker, 2010, p. 260) (Chernick, 2008, p. 9)

The idea behind bootstrapping is that the variability found in the CDF will be mimicked as \(B\) replications increases in size (Chernick, 2008, p. 11). Because bootstrapping requires hundred to several thousand resampling replications, the algorithm for bootstrapping must be executed by computer, which is why code for the statistical programming language R is prominently featured throughout this article.

In our specific example, we are creating a 95% confidence interval for the mean difference in the heights from morning to evening.

To construct the confidence interval, 95% of the possible values of the mean must be captured in the interval, which is represented in the following formula:

\[ P\{v_L \leq V=\bar{X}-\mu\leq v_U\}=\\ P\{v_L-\bar{X}\leq -\mu \leq v_U-\bar{X}\}=\\ P\{v_L-\bar{X}\geq \mu \geq v_U-\bar{X}\}=\\ P\{\bar{X}-v_U\leq\mu\leq\bar{X}-v_L\}=0.95\]

where V is the difference between the estimated mean and population mean and the lower limit of V is set to the 0.025 quantile and the upper limit of V is set to 0.975 quantile so that 0.95 of the possible values are within the interval.

Using the Bootstrap Quantile Method, we isolate the lower and upper bound of our confidence interval using the formula:

\[ (\bar{X}-\hat{v_U},\bar{X}-\hat{v_L})=\\ (\bar{X}-\{\bar{X_U^*}-\hat{\mu}\},\bar{X}-\{\bar{X_L^*}-\hat{\mu}\})=\\ (\bar{X}+\hat{\mu}-\{\bar{X_U^*}\},\bar{X}+\hat{\mu}-\{\bar{X_L^*}\})=\\ (2\bar{X}-\bar{X_U^*},2\bar{X}-\bar{X_L^*})\]

# Nonparametric Bootstrap Confidence Interval
set.seed(42)                                              # Setting the random generator seed
n = length(x);                                            # The size of the original sample
x.bar = mean(x)                                           # The mean of the original sample.
B = 10000                                                 # The large number of resamples
x.resampled = sample(x, B*n, repl=TRUE)                   # The B resamples
B.Rows.N.Cols.X.Resampled = matrix(x.resampled, nrow=B)   # B x n matrix of resamples 
x.bar.stars = rowMeans(B.Rows.N.Cols.X.Resampled)         # vector of B `x-bar-star's 
resampled.v = x.bar.stars - x.bar                         # vector of v-star's 
U_L.quantiles = quantile(resampled.v, c(.975,.025))       # Estimate quantiles of V 
x.bar - U_L.quantiles                                     # Nonparametric Bootstrap Confidence Interval

##    97.5%     2.5% 
##  8.77439 10.44512

From the NPB Confidence Interval, the outcome is similar to the Traditional Confidence Interval, and it can be concluded that true mean difference is between approximately 8.77 and 10.45 millimeters.

The Parametric Bootstrap (PB)

So far, we have examined the assumptions and performed an estimate using a traditional parametric method, the Traditional Confidence Interval, and an estimate using a non-traditional nonparametric method, the NPB Confidence Interval, which has remarkably fewer prerequisite assumptions than the Traditional Confidence Interval but yields similar results.

The freedom of so few assumptions may make it tempting to not expand any further on the Bootstrap Method. However, there are some theoretical points of interest that do yield an interesting technique called the Parametric Bootstrap (PB).

Earlier, we noted the difference between the piecewise \(F_n\), the ECDF, and the pointwise \(F\), the CDF, almost surely approaches zero as \(n→∞\), which allows the relationship between the ECDF and the CDF to be used to make estimates about the population from the sample. “If we assume further that \(F\) is absolutely continuous, then smoothed distributions are natural and . . . Taking this a step further, if we assume that \(F\) has a parametric form . . . then the appropriate estimator for \(F\) would be . . . with the maximum likelihood estimates . . .” (Chernick, 2008, p. 124) . Thus, the NPB technique of resampling can be executed with \(F_n\) replaced with an estimated CDF, \(\hat{F}_\text{distribution}\), where the maximum likelihood estimators (MLE) from the sample are used to fill in the parameters of the estimated distribution. (Efron, The Jackknife, the Bootstrap and Other Resampling Plans, 1982, p. 30)

For our specific example, the difference in height experiment, since the assumption of Normality has been reasonably supported with a Q-Q Plot and the Shapiro-Wilks Test enroute to the earlier Traditional Confidence Interval, this assumption will be leveraged to demonstrate the Parametric Bootstrap Method. The sampling space containing our 41 original data points will be replaced with a sample space based on the estimated Normal CDF with the MLEs for the mean, the observed mean from the sample, and standard deviation, the observed standard deviation from original sample, as the parameters, then the bootstrap algorithm will be executed. This creates a different set of possibilities within the elements of the resampling vector and matrix because rather than the elements being filled with some sample with replacement of the exact values from the original data set, the elements of the vector and matrix are filled with some of all possible values within the scope of the estimated CDF.

The basic structure of the Parametric Bootstrap Method is:

Collect a random sample \(x_i\) of size \(n\) from the population.
From the original random sample \(x_i\) create MLE estimates for the parameters of the assumed probability distribution of the random variable.
Use a random number generator for the presumed probability distribution with the MLE parameters specified to generate a sample of size \(n\) called \(x_i^*\).
Create an estimate of the desired parameter \(\hat{θ}\) using the sample \(x_i^*\) from the random number generator. This parameter is called \(\hat{θ}^*\).
Repeat step 3 and 4 a large number of times \(B\).
Estimate the quantiles of the distribution of \(\hat{θ}^*\) in order to construction a confidence intervals from which to make estimates.

# Parametric Bootstrap Confidence Interval
set.seed(42)                                                        # Setting the random generator seed
n = length(x)                                                       # The size of the original sample
x.bar = mean(x)                                                     # The MLE of the mean  
x.sample.sd = sd(x)                                                 # The MLE of the standard deviation
B = 10000                                                           # The large number of samples generated
psuedo.rnorm.x.bar.sd.sims = rnorm(B*n, x.bar, x.sample.sd)         # The random number generator samples
B.Rows.N.Cols.X.sims = matrix(psuedo.rnorm.x.bar.sd.sims, nrow=B)   # B x n matrix of samples 
x.bar.sims = rowMeans(B.Rows.N.Cols.X.sims)                         # Vector of the B sample means
parametric.resampled.v = x.bar.sims - x.bar                         # vector of v-star's    
U_L.quantiles = quantile(parametric.resampled.v, c(.975,.025))      # estimated quantiles of V 
x.bar - U_L.quantiles                                               # parametric bootstrap CI of the mean

##     97.5%      2.5% 
##  8.752424 10.432419

As expected, the PB Confidence Interval yielded similar results to those produced by the traditional Confidence Interval as well as NPB, with the true mean difference in height being between approximately 8.75 and 10.43 millimeters.

It has been noted that the theoretical calculations needed for a PB are impossible outside of a narrow family of models (Efron, The Jackknife, the Bootstrap and Other Resampling Plans, 1982, p. 30). However, while the PB is rarely seen in practice, its use has been justified in simulation situations and when an asymptotic distribution does not provide a good small sample approximation (Chernick, 2008).

Conclusion

Overall, the Traditional Confidence Interval, Nonparametric Bootstrap Confidence Interval, and Parametric Bootstrap Confidence Intervals consistently produced similar results. Of these methods, the Nonparametric Bootstrap demonstrates itself to be a powerful tool in practice when working with small data and limited background information.

Bibliography

Chernick, M. R. (2008). Bootstrap Methods: A Guide for Practitioners and Researchers . Hoboken, NJ: John Wiley & Sons. Inc.

Efron, B. (1979). Bootstrap methods: Another look at the jackknife. pp. 1-26.

Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics.

Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. New York, NY: Chapman & Hall.

Hogg, R. V., Tanis, T. E., & Zimmerman, D. (2015). Probability and Statistical Inference. Upper Saddle River, NJ: Pearson Higher Education.

Kutner, M. H., Nachtsheim, C. J., & Neter, J. (2004). Applied Linear Regression Models, 4th Edition. New York, NY: McGraw-Hill/Irwin.

Mooney, C. Z., & Duval, R. D. (1993). BOOTSTRAPPING: A Nonparametric Approach to Statistical Inference. Newbury Park, CA, USA: Sage University Paper.

Ott, R. L., & Longnecker, M. (2010). An Introduction to Statistical Methods and Data Analysis, 6th Ed. Belmont, CA: Brooks/Cole.

R version 3.1.2, T. R. (2014). R version 3.1.2 (2014-10-31) – “Pumpkin Helmet”.

Suess, E. A., & Trumbo, B. E. (Unpublished 2nd Edition). Introduction to Probability Simulation and Gibbs Sampling with R, 2nd Edition.