Topic 4: Sampling Distributions


These are the solutions for Computer Lab 5.

Please note that answers are given to 4 decimal places of accuracy at most.


1 Sample mean distribution
(normal population distribution)

1.1

\(\overline{X} \sim N(12, 1.3768)\). Note here that \(\dfrac{\sigma^2}{n} = \dfrac{34.42}{25} = 1.3768\).

1.2

Example R code for these questions is provided below:

1.2.1

pnorm(10, mean = 12, sd = sqrt(1.3768)) # Note that we have to use the standard deviation, not variance here
## [1] 0.04414475

So \(P(\overline{X} < 10) = 0.0441\).

1.2.2

pnorm(15, mean = 12, sd = sqrt(1.3768)) - pnorm(4, mean = 12, sd = sqrt(1.3768)) 
## [1] 0.994717

So \(P(4 \leq \overline{X} \leq 15) = 0.9947\).

1.2.3

1 - pnorm(13, mean = 12, sd = sqrt(1.3768)) 
## [1] 0.197039

So \(P(\overline{X} > 13)= 0.1970\).

1.3

1.3.1

\(P(\overline{X} < 10)\) is the probability that the sample mean number of fuel tanks per craft is less than 10 (given that the population mean is 12 and assuming we took a random sample of 25). This probability is equal to approximately \(4.44\%\).

1.3.2

\(P(4 \leq \overline{X} \leq 15)\) is the probability that the sample mean number of fuel tanks per craft is between 4 and 15 (given that the population mean is 12 and assuming we took a random sample of 25). This probability is equal to approximately \(99.47\%\).

1.3.3

\(P(\overline{X} > 13)\) is the probability that the sample mean number of fuel tanks per craft is greater than 13 (given that the population mean is 12 and assuming we took a random sample of 25). This probability is equal to approximately \(19.7\%\).

1.4

What we are really asking here is, find the number \(a\) such that \(P(\overline{X} \leq a) = 0.4\). To solve this in R, we have

a <- qnorm(0.4, mean = 12, sd = sqrt(1.3768)) 
a
## [1] 11.70273

Thus, the number of fuel tanks per craft must be 11.7 for this condition to be satisfied.

1.5

Similar to 1.4 above, we would like to find the number \(b\) such that \(P(\overline{X} > b) = 0.1\). We can rearrange this so that \(P(\overline{X} \leq b) = 0.9\). To solve this in R, we have

b <- qnorm(0.9, mean = 12, sd = sqrt(1.3768)) 
b
## [1] 13.50374

Thus, the number of fuel tanks per craft must be 13.5 for this condition to be satisfied.

1.6

Similar to 1.5 above, we would like to find the number \(c\) such that \(P(\overline{X} > c) = 0.05\). We can rearrange this so that \(P(\overline{X} \leq c) = 0.95\). To solve this in R, we have

c <- qnorm(0.95, mean = 12, sd = sqrt(1.3768)) 
c
## [1] 13.93002

Thus, the number of fuel tanks per craft must be 13.93 for this condition to be satisfied.

1.7

We have

lower <- qnorm(0.025, mean = 12, sd = sqrt(1.3768)) 
upper <- qnorm(0.975, mean = 12, sd = sqrt(1.3768)) 
c(lower, upper)
## [1]  9.700235 14.299765

So our range of values for fuel tanks per craft is (9.7, 14.3).

1.8

Unfortunately, because our sample size is less than 30, and because we do not know the distribution of the population, we cannot actually apply the Central Limit Theorem. Therefore we are unable to determine the distribution of \(\overline{X}\) here.

2 Sample mean distribution
(unknown population distribution)

2.1

\(\overline{X} \stackrel{\tiny \text{approx.}}\sim N\left(35, 32.5^2 / 60\right)\). Note here that we are given \(\sigma\), not \(\sigma^2\).

It follows that \(\dfrac{\sigma^2}{n} = \dfrac{32.5^2}{60} \approx 17.60417\). We can say that \(\overline{X}\) is approximately normally distributed due to the Central Limit Theorem, the conditions of which are satisfied since our sample is larger than 30.

2.2

Example R code for these questions is provided below:

2.2.1

pnorm(42, mean = 35, sd = sqrt(32.5^2 / 60)) 
## [1] 0.9523781

So \(P(\overline{X} < 42) = 0.9524\).

2.2.2

pnorm(38, mean = 35, sd = sqrt(32.5^2 / 60)) - pnorm(18.5, mean = 35, sd = sqrt(32.5^2 / 60))  
## [1] 0.7626573

So \(P(18.5 \leq \overline{X} \leq 38) = 0.7627\).

2.2.3

1 - pnorm(26, mean = 35, sd = sqrt(32.5^2 / 60)) 
## [1] 0.9840251

So \(P(\overline{X} > 26)= 0.9840\).

2.2.4

pnorm(30, mean = 35, sd = sqrt(32.5^2 / 60))  +
1- pnorm(36, mean = 35, sd = sqrt(32.5^2 / 60)) 
## [1] 0.5225017

So \(P(\overline{X} < 30) + P(\overline{X} > 36)= 0.5225\).

2.3

2.3.1

\(P(\overline{X} < 42)\) is the probability that the sample mean diameter of an asteroid is less than 42 kms (given that the population mean is 35 and assuming we took a random sample of 60). This probability is equal to approximately \(95.24\%\).

2.3.2

\(P(18.5 \leq \overline{X} \leq 38)\) is the probability that the sample mean diameter of an asteroid is between 18.5 kms and 38 kms (given that the population mean is 35 and assuming we took a random sample of 60). This probability is equal to approximately \(76.27\%\).

2.3.3

\(P(\overline{X} > 26)\) is the probability that the sample mean diameter of an asteroid is greater than 26kms (given that the population mean is 35 and assuming we took a random sample of 60). This probability is equal to approximately \(98.40\%\).

2.3.4

\(P(\overline{X} < 30) + P(\overline{X} > 36)\) is the probability that the sample mean diameter of an asteroid is either less than 30 kms, or greater than 36 kms (given that the population mean is 35 and assuming we took a random sample of 60). This probability is equal to approximately \(52.25\%\).

2.4

Given our sample, with \(\overline{X} \stackrel{\tiny \text{approx.}}\sim N\left(35, 32.5^2 / 60 \right)\), we can compute \(P(\overline{X} \geq 42)\). In R, we have

1- pnorm(42, mean = 35, sd = sqrt(32.5^2 / 60)) 
## [1] 0.04762194

As \(P(\overline{X} \geq 42) = 0.0476\), it is probably safe to say the miner is exaggerating, as based on our sample, there is a less than \(5\%\) chance of observing an asteroid with a diameter of 42 kms or greater. (This could potentially be because the asteroids have gotten smaller due to repeated mining over the years, but we will not give the miner the benefit of the doubt).

2.5

Unfortunately, we cannot determine the distribution of \(\overline{X}\) for this new asteroid cluster, as we do not know the population distribution, and the sample size is too small (\(15 < 30\)) to apply the Central Limit Theorem.

2.6

Example R code is shown below.

example2 <- rnorm(1000, mean = 35, sd = sqrt((32.5^2)/60))
hist(example2, main = "Histogram of 1000 sample means, n = 60",
     xlab = "Asteroid mean diameter (kms)", col = "skyblue", freq = FALSE)
curve(dnorm(x, mean = 35, sd = sqrt((32.5^2)/60)), add = TRUE, col = "blue", lwd = 2)

Note that your results may look slightly different, as the data is being randomly generated, but due to the number of simulated means, your answers to part 2.2 should still align with the shape of the histogram. It is also worth noting that the mean of this data, namely 35.18 is much closer to the population mean than if we simply estimated one sample mean.

2.7

Example R code is shown below.

# Divide the plot window into four
par(mfrow = c(2, 2), cex = 0.8, mex = 0.8)

# 4 sample sizes are to be considered.  These are stored in ns
ns <- c(30, 60, 200, 1000)

# For each choice of n, we will simulate 10,000 sample means
trials <- 10000

# The below 'for' loop cycles though the choices of n stored in ns
for(n in ns){
  norm.means <- rnorm(trials, mean = 35, sd = sqrt(32.5^2/n)) # randomly generate n data values from the
  # distribution being considered - each value represents a sample mean
  hist(norm.means, freq = FALSE, breaks = 20, col = "red", xlim = c(20, 50),
       xlab = expression(bar(x)),
       main = paste("Histogram of means, n = ", n))
  curve(dnorm(x, mean = 35, sd = sqrt(32.5^2/n)), add = TRUE,
        col = "blue", lwd = 2) # Overlay the normal density on the histogram that, according to
  # the Central Limit Theorem, will approximate the distribution of
  # the sample mean if n is sufficiently large.
}

Note that as the sample size \(n\) increases, the mean of the data stays the same (at 35), but the variability of the data decreases (following the tenets of the Central Limit Theorem).


That’s everything covered! If there were any parts you were unsure about, take a look back over the relevant sections of the Topic 4 material.


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.