Assignment Objectives
Understand the theoretical basis of Bootstrap sampling methods
for approximating sampling distributions.
Assess the performance of Bootstrap sampling distributions
against exact and asymptotic sampling distributions.
Implement Bootstrap sampling algorithm and construct sampling
distributions using R.
Use of AI Tools
Policy on AI Tool Use: Students must adhere to the
AI tool policy specified in the course syllabus. The direct copying of
AI-generated content is strictly prohibited. All submitted work must
reflect your own understanding; where external tools are consulted,
content must be thoroughly rephrased and synthesized in your own
words.
Code Inclusion Requirement: Any code included in
your essay must be properly commented to explain the purpose and/or
expected output of key code lines. Submitting AI-generated code without
meaningful, student-added comments will not be accepted.
Asymptotic Distribution of Sample Variance
Assume that \(\{ x_1, x_2, \cdots, x_n \}
\to F(x)\) with \(\mu = E[X]\)
and \(\sigma^2 = \text{var}(X)\).
Denote
\[
s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \mu)^2
\]
If \(n\) is large,
\[
s^2 \to N\left(\sigma^2, \frac{\mu_4-\sigma^4}{n} \right)
\]
where \(\mu_4 = E[(X_i - \mu)^4]\)
is tje 4th central moment which can be estimated by
\[
\hat{\mu}_4 = \frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^4.
\]
Note: This describes the asymptotic convergence of
the sample variance, following from the central limit theorem (CLT). The
sample size required for this approximation to hold is
situation-dependent.
Question 1: Asymptotic vs Bootstrap Sampling
Distributions
Write an essay summarizing the concepts of Asymptotic and Bootstrap
Sampling Distributions, along with their key applications. Your
discussion should be grounded in your personal understanding of the
material. Any external sources including AI tools consulted must be
clearly cited.
Essay Prompt: Discuss the concepts of the bootstrap
sampling plan, the bootstrap sampling distribution, and the asymptotic
sampling distribution in the context of statistics (e.g., sample mean
and variance) computed from an independent and identically distributed
(i.i.d.) sample. Your discussion should:
Clearly outline the key assumptions required for each
method.
Explain the practical application of each distribution.
Provide guidance on when and why one should be preferred over the
other in statistical inference.
The statistical sampling process is done to understand the population
of interest in question. Due to the total population distribution often
being unknown, a sampling distribution can be used to make assumptions
and gain a better understanding of how the data is distributed. This can
be used to make predictions and estimations, based upon sample
statistics and the sampling distribution. There are a few approaches to
estimating the sampling distribution and two of these include the
asymptotic sampling distribution and the bootstrap sampling
distribution. Both of these methods are useful in approximating the
sampling distribution and gaining a better understanding of the data in
question.
The asymptotic sampling distribution is one way to create a sampling
distribution that can be used to make assumptions and inferences about
the overall population distribution of a particular data set of
interest. The asymptotic sampling distribution describes the
distribution as the sample size n grows infinitely large. As n grows
large \(s^2 \to
N\left(\sigma^2, \frac{\mu_4-\sigma^4}{n} \right)\). This serves
as the sample variance for that of an asymptotic sampling distribution.
This means that the asymptotic sampling distribution is useful for
calculating sample statistics that can be used to make inferences about
the population, as the population parameters often are unknown in the
random sampling process.
The key assumptions of the asymptotic sampling distribution are that
the observations must be independent and identically distributed, the
sample size must be sufficiently large, and the population must have
finite mean and variance. An important outcome of the asymptotic
distribution is that it will result in a normal distribution as long as
the assumptions are followed. This is due in part to the Central Limit
Theorem which states that as the sample size grows sufficiently large,
the sampling distribution will reach an approximately normal
distribution, regardless of the population distribution.
The bootstrap sampling distribution method is another valuable choice
in approximating a sampling distribution. This bootstrap method involves
repeatedly drawing samples from the observed data’s empirical
distribution, with replacement. This sampling process serves as a
stand-in for the unknown population distribution, and allows for
conclusions to be made regarding the distribution of the data. As the
sample size grows larger and larger, the sampling distribution will get
closer to that of the population distribution. The bootstrap method uses
the empirical distribution, \(\hat{F}_n\) as an approximation to the true
\({F}_n\) of the population. Then the
bootstrap sampling process draws many samples of size n with
replacement, to simulate what repeated data collection as an experiment
would do.
The key assumption of the bootstrap sampling method are that the data
are independently and identically distributed, and that the original
sample is representative of the overall population. The bootstrap method
is a nonparametric sampling method, so no assumption of a normal
distribution is required.
A bootstrap method can be used the created bootstrap confidence
intervals which can be interpreted as standard statistical confidence
intervals. For example, a bootstrap confidence interval would tell how
confident one can be that the true value is between the lower and upper
bound of the confidence interval. Additionally, the bootstrap method
could be used to estimate the standard error and create approximate
sampling distributions which can be used to make assumptions about the
overall population distribution.
While both the asymptotic and bootstrap sampling distributions are
great ways to visualize sampling distributions in order to see how they
are distributed and calculate useful sample statistics, there are cases
where one would be ideal to use over the other. The asymptotic sampling
distribution is ideal for large samples, however, if a sample size is
not sufficiently large, the assumptions for the asymptotic sampling
distribution would not be met. So, in the case of a small sample size,
the bootstrap method would be the better choice. However, if the sample
size is sufficiently large, then the asymptotic sampling distribution
would be the ideal choice, as long as all other assumptions of the
asymptotic distribution are indeed met. Overall, both sampling methods
provide for useful statistical analysis and learning more about how a
particular sample of data is distributed, which can be used to further
understand the likely trends of the overall population.
Question 2: Daily Coffee Sales (in mL) at Two Different Cafe
Locations
This data set represents the volume of regular brewed coffee sold per
day (in milliliters) at two different cafe locations over a period of 50
days.
2850, 3200, 2900, 3100, 2950, 7800, 8100, 7900, 3300, 3050, 4000, 4200, 3150, 3400, 7700, 8200,
3250, 4400, 3100, 4200, 4500, 4800, 4300, 8500, 8200, 8900, 8700, 3250, 3000, 4600, 4100, 8400,
8800, 3350, 4700, 3100, 8100, 3050, 8300, 4100, 3100, 8300, 8900, 8200, 4400, 4500, 3250, 4600,
8400, 3300, 4200, 4500, 4800, 4300, 8500
We are interested in finding the sampling distribution of sample
means that will be used for various inferences about the underlying
population mean.
- Based on the given data, can the Central Limit Theorem be used to
derive the asymptotic sampling distribution of the sample mean? Justify
your answer.
Before doing anything else, I will create a data set called ‘coffee’
with the given values above. An create a histogram of these values.
coffee <- c(2850, 3200, 2900, 3100, 2950, 7800, 8100, 7900, 3300, 3050, 4000, 4200, 3150, 3400, 7700, 8200, 3250, 4400, 3100, 4200, 4500, 4800, 4300, 8500, 8200, 8900, 8700, 3250, 3000, 4600, 4100, 8400, 8800, 3350, 4700, 3100, 8100, 3050, 8300, 4100, 3100, 8300, 8900, 8200, 4400, 4500, 3250, 4600, 8400, 3300, 4200, 4500, 4800, 4300, 8500)
hist(coffee)

The histogram shows that the distribution of the coffee observations
appear to be bimodal. The asymptotic sampling distribution does not
require normality, only finite mean and variance and that the sample
size is sufficiently large. The sample size is sufficiently large, so
the Central Limit Theorem be used to derive the asymptotic sampling
distribution in this case. However, it is still worth noting that the
data does not follow a normal distribution, but the Central Limit
Theorem can still be used.
- Apply the bootstrap method to estimate the sampling distribution
(often called the bootstrap sampling distribution) of the sample mean.
Generate a kernel density estimate from the bootstrap sample means and
plot it. Then, use this bootstrap distribution to validate your
conclusion from part (a). Make sure your visuals are effective in
enhancing the presentation of these results.
B <- 10000
boot_means <- replicate(B, mean(sample(coffee, replace = TRUE)))
# Kernel Density Plot
plot(density(boot_means),
main="Bootstrap Sampling Distribution of Sample Mean",
xlab="Sample Mean")
abline(v=mean(coffee), col="red", lwd=2)

The bootstrap approach shows that the distribution of the sample
follows an approximately normal distribution. The above graph shows the
kernel density estimate. This validates the conclusion made in part a
that it is alright to apply the Central Limit Theorem, and that because
the sample size is sufficiently large following the random sampling
approach will result in a distribution that is approximately normal even
if the original distribution was not. The bootstrap method works well in
this case for providing a distribution of the sampling mean.
- Repeat the analysis in parts (a) and (b) for the sample
variance.
boot_vars <- replicate(B, var(sample(coffee, replace = TRUE)))
plot(density(boot_vars),
main="Bootstrap Sampling Distribution of Sample Variance",
xlab="Sample Variance")
abline(v=var(coffee), col="red", lwd=2)

The distribution of the sample variance shows some noticeable skew to
the left, however, this skew is not too incredibly severe. It is worth
noting that variance measurements tend to be more sensitive to skewness
than that of the mean. So, this is not surprising that the sample
variance is not as close to that of a true normal distribution compared
to the sample mean. In this case, the bootstrap method would be a good
choice for approximating the sample variance distribution due to it
being a nonparametric approach so that the assumption of normality is
not required. This means that despite the sample variance distribution
showing some slight skewness to the left, the bootstrap method is still
a good choice for a sampling distribution.
