Topic 4: Sampling Distributions


In Topic 4, we learnt about sampling distributions. In this computer lab, we will consider different calculations relating to the Central Limit Theorem and the distribution of the sample mean.

In Lecture 4, we covered material that will be very helpful for today’s computer lab. If you have not already watched Lecture 4, you may wish to do so now.

After working through the questions in this computer lab, you will be ready to complete Quiz 5. If you have time during today’s lab, you may like to work on the quiz.

For all the questions in this lab, where relevant, please round your answers to 4 decimal places of accuracy.

1 Sample mean distribution
(normal population distribution)

🏑 Imagine that the year is 2040, and you are running an inventory on deep-space mining craft for your company. You know that the number of fuel tanks on all the different crafts are normally distributed, with population mean \(\mu = 12\) fuel tanks per craft, and population variance \(\sigma^2 = 34.42\).

Suppose that you select a random sample of \(n=25\) spacecraft in the shipyard to assess.

1.1

Let \(\overline{X}\) denote the sample mean. Write down the distribution of \(\overline{X}\) in mathematical notation.

1.2

πŸ’» Using the distribution defined in 1.1, carry out the following calculations in R. Remember, it can be helpful to sketch pictures of the distribution when working out these different probabilities.

Hint: The pnorm R function will be helpful.


🎧 Online students πŸ’¬ For each of the following sub-questions selected by the facilitator, enter your answer next to the question in the shared Google Doc.


1.2.1

Find the probability \(P(\overline{X} < 10)\).

1.2.2

Find the probability \(P(4 \leq \overline{X} \leq 15)\).

1.2.3

Find the probability \(P(\overline{X} > 13)\).

1.3

🏑 Explain in words what each of the probabilities you have obtained in 1.2 above represent, in this context.


🎧 Online students πŸ’¬ Volunteer to share your screen and explain your answers to this question.


1.4

πŸ’» Suppose that the probability that \(\overline{X}\) is less than a certain number of fuel tanks is equal to 40%. If the true population mean is \(\mu = 12\) fuel tanks per craft, and we are assessing our sample of \(n=25\) spacecraft, what must this number of fuel tanks be?

Hint: You can use R for this question too (think back to Computer Lab 4). Click on the code chunk below for some guidance if you are stuck - you will have to fill in the missing ...’s:

# We can use the qnorm function for this type of question
qnorm(..., ..., ...)


🎧 Online students πŸ’¬ Enter your answer next to the question in the shared Google Doc.


1.5

πŸ’» Suppose that there is only a 10% probability of observing more than a certain number of fuel tanks for \(\overline{X}\) . If the true population mean is \(\mu = 12\), and we are assessing our sample of \(n=25\) spacecraft, what must this number of fuel tanks be?


🎧 Online students πŸ’¬ Enter your answer next to the question in the shared Google Doc.


1.6

πŸ’» Repeat 1.5, but this time suppose that there is only a 5% probability of observing more than a certain number of fuel tanks for \(\overline{X}\).


🎧 Online students πŸ’¬ Enter your answer next to the question in the shared Google Doc.


1.7

πŸ’» Suppose that, just as in 1.5 and 1.6, we are assessing the fuel tanks in our sample of \(n=25\) spacecraft, and we know that the true population mean is \(\mu = 12\) fuel tanks per craft. This time however, we would like to determine a range of values for the number of fuel tanks per craft, within which the majority of space craft will lie.

Firstly, we would like to determine the number of fuel tanks for \(\overline{X}\) such that there is only a 2.5% probability of observing less than this number of fuel tanks on a craft.

Secondly, we would like to determine the number of fuel tanks for \(\overline{X}\) such that there is only a 2.5% probability of observing more than this number of fuel tanks on a craft.

Compute these two values now, and report them as an interval (e.g.Β if the lower number is 8 and the higher number is 10, write the interval as (8, 10).)


🎧 Online students πŸ’¬ Enter your answer next to the question in the shared Google Doc.


1.8

πŸ’» Now that you have finished your inventory of this shipyard, you travel to another shipyard in a neighbouring state. Here, most of the craft are in use, so you are only able to assess \(20\) craft. You do not know the distribution of fuel tanks for the craft in this shipyard, but are told they have a mean of \(14\), and a standard deviation of \(7.55\).

Let \(\overline{X}\) denote the sample mean number of fuel tanks for craft from this new shipyard. Write down the distribution of \(\overline{X}\) in mathematical notation.


🎧 Online students πŸ’¬ Volunteer to share your screen and explain your answers to this question.


2 Sample mean distribution
(unknown population distribution)

Suppose that you are engaged in an asteroid-belt mining operation. \(60\) asteroids of various diameters are currently being mined. These \(60\) asteroids can be considered a random sample of asteroids taken from the overall population (all asteroids in the belt). While the distribution of the asteroids’ diameters is unknown, suppose scanning tools have revealed that the mean diameter of the asteroids in the belt is \(35\) kms, with \(\sigma = 32.5\).

2.1

πŸ’» Let \(\overline{X}\) denote the sample mean. Write down the distribution of \(\overline{X}\) in mathematical notation. What special result does your answer rely upon?

2.2

πŸ’» Using this distribution, carry out the following calculations in R. Remember, it can be helpful to sketch pictures of the distribution when working out these different probabilities.


🎧 Online students πŸ’¬ For each of the following sub-questions selected by the facilitator, enter your answer next to the question in the shared Google Doc.


2.2.1

Find the probability \(P(\overline{X} < 42)\).

2.2.2

Find the probability \(P(18.5 \leq \overline{X} \leq 38)\).

2.2.3

Find the probability \(P(\overline{X} > 26)\).

2.2.4

\(P(\overline{X} < 30) + P(\overline{X} > 36)\)

2.3

🏑 Explain in words what each of the probabilities you have obtained in 2.2 above represent, in this context.


🎧 Online students πŸ’¬ Volunteer to share your screen and explain your answers to this question.


2.4

🏑 A fellow miner claims that the last time they mined this asteroid belt, all the asteroids were at least 42 kms in diameter. Based on your sample data, do you believe they are telling the truth?


🎧 Online students πŸ’¬ Volunteer to share your screen and explain your answers to this question.


2.5

🏑 Suppose that another mining team, operating on a nearby asteroid cluster, has had to haul \(15\) asteroids back to your station for further processing. The distribution of the asteroid cluster is unknown, with \(\mu = 19.25\) and \(\sigma^2 = 8.1\). Let \(\overline{X}\) denote the sample mean. What is the distribution of \(\overline{X}\)? Make sure to clearly explain your answer.


🎧 Online students πŸ’¬ Volunteer to share your screen and explain your answers to this question.


2.6

🏑 We can simulate the sample mean of a sample of 60 asteroids, with the characteristics of those you are mining in 2, using the following code:

set.seed(16)
example <- rnorm(1, mean = 35, sd = sqrt((32.5^2)/60))

This gives us an estimate of 37 for the sample mean \(\overline{x}\).

Rather than relying on this one simulated sample mean value, we could repeat this process multiple times, to ensure our results are accurate. Suppose we simulate 1000 sample means, using a sample of 60 asteroids for each estimate (as outlined above). We can then plot a histogram of these 1000 sample means as follows:

example2 <- rnorm(1000, mean = 35, sd = sqrt((32.5^2)/60))
hist(example2, main = "Histogram of 1000 sample means, n = 60",
     xlab = "Asteroid mean diameter (kms)", col = "skyblue", freq = FALSE)
curve(dnorm(x, mean = 35, sd = sqrt((32.5^2)/60)), add = TRUE, col = "blue", lwd = 2)

If we compute the mean of these estimated sample means, we obtain the value 35.18. Note that we have also added a curve to this histogram, representing the normal density curve we would obtain for this data via the Central Limit Theorem.

What can you observe from this histogram? Do your answers to 2.2 seem to align with the shape of the histogram?

2.7

🏑 Suppose that after a period of reduced activity, a rare metal is discovered in some of the asteroids. As a result, mining activity increases. First, mining expands from 30 asteroids to (once more) cover 60 asteroids, then 200 asteroids, and finally reaches a peak of 1000 asteroids.

For each of these sample sizes, we will now simulate 10000 sample means, using appropriate mean and standard deviation values.

Run the R code below to simulate these sample means, and produce histograms and associated normal density curves, for each of these situations. (Make sure to read the comments, but don’t worry if you don’t fully understand all the steps just yet.)

# Divide the plot window into four
par(mfrow = c(2, 2), cex = 0.8, mex = 0.8)

# 4 sample sizes are to be considered.  These are stored in ns
ns <- c(30, 60, 200, 1000)

# For each choice of n, we will simulate 10,000 sample means
trials <- 10000

# The below 'for' loop cycles though the choices of n stored in ns
for(n in ns){
    norm.means <- rnorm(trials, mean = 35, sd = sqrt(32.5^2/n)) # randomly generate n data values from the
    # distribution being considered - each value represents a sample mean
  hist(norm.means, freq = FALSE, breaks = 20, col = "red", xlim = c(20, 50),
       xlab = expression(bar(x)),
       main = paste("Histogram of means, n = ", n))
  curve(dnorm(x, mean = 35, sd = sqrt(32.5^2/n)), add = TRUE,
        col = "blue", lwd = 2) # Overlay the normal density on the histogram that, according to
  # the Central Limit Theorem, will approximate the distribution of
  # the sample mean if n is sufficiently large.
}

Compare the histograms produced by this code to each other (note that the second one should look similar to the histogram you obtained in 2.6). What do you notice about the shape and characteristics of the histogram, as \(n\) increases?

Hint: Refer to Computer Lab 4, and the Topic 4 Readings, if you are not sure how to approach this question.


Well done, that’s everything for today’s core lab!


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.