We’re going to look at a specific example to illustrate the concept of the sampling didstribution of the mean. We’ll use the variable carat in the dataframe diamonds, which is in the ggplot package. Before we start, we need to make sure we have the package loaded.

library(ggplot2)

OK, Now we have the data, so let’s look at the distribution of carat weights and get the population parameters \(\mu\) and \(\sigma\).

hist(diamonds$carat,main = "Histogram of Raw Data")

pop.mu <- mean(diamonds$carat)
pop.sigma <- sd(diamonds$carat)

pop.mu
## [1] 0.7979397
pop.sigma
## [1] 0.4740112

Now let’s get sample estimates of the mean from a very small sample of size 10. Do this several times and get a feeling for the kind of values we’re seeing.

# Sample size 10
x10 <- sample(diamonds$carat,10)
mean(x10)
## [1] 1.053

Now let’s get sample estimates of the mean from a larger sample of size 1,000. Do this also several times and get a feeling for the kind of values we’re seeing.

# Sample size 1,000
x1000 <- sample(diamonds$carat,1000)
mean(x1000)
## [1] 0.80495

What happened? How the sample means based on samples of size 1,000 differ from those based on sample of size 10?