Short simulation example to show that the sample mean (x bar) is a random variable

Simulate the generation of a sample of 10 values (n= 10) from a variable X that follows a normal distribution. The population mean of the variable (μ) is 20 and the population standard deviation (σ) is 2 .

set.seed(1)
sample1= rnorm(n= 10, mean= 20, sd= 2)
sample1
##  [1] 18.74709 20.36729 18.32874 23.19056 20.65902 18.35906 20.97486 21.47665
##  [9] 21.15156 19.38922

The mean of the sample1 (x bar) is:

round (mean (sample1), 3)
## [1] 20.264

What do we get if we generate a second sample?

set.seed(2)
sample2= rnorm(n= 10, mean= 20, sd= 2)
sample2
##  [1] 18.20617 20.36970 23.17569 17.73925 19.83950 20.26484 21.41591 19.52060
##  [9] 23.96895 19.72243

The mean of the sample2 is:

round (mean (sample2), 3)
## [1] 20.422

Conclusion: As the previous example showed, every time we take a sample, the values of the sample will be different; therefore, the mean of the sample will be different too. In other words, the mean of the sample changes from one sample to another; thus, it is a random variable rather than a constant.

Let’s generate 100 samples each with 10 values, for each sample let’s compute the sample mean, and let’s compare all 100 samples means to the population mean of 20.

sample_means= vector(mode= "numeric")

for(i in 1:100){
  set.seed(i)
  sample_means= c(sample_means, round (mean(rnorm(n= 10, mean= 20, sd= 2)), 2))
}

Let’s plot the results

plot (sample_means, xlab= "Sample", ylab= "Xbar", yaxt = "n")
axis(2, at=seq(18,22,0.5), labels=seq(18,22,0.5))
abline(h= 20)

What changes if each sample has 50 values instead of 10? In other words, what changes if the sample size is increased from 10 to 50?

Let’s generate 100 samples each with 50 values, for each sample let’s compute the sample mean, and let’s compare all 100 samples means to the population mean of 20.

sample_means2= vector(mode= "numeric")

for(i in 1:100){
  set.seed(i)
  sample_means2= c(sample_means2, round (mean(rnorm(n= 50, mean= 20, sd= 2)), 2))
}

Let’s plot the results

plot (sample_means2, xlab= "Sample", ylab= "Xbar", yaxt = "n")
axis(2, at=seq(18,22,0.5), labels=seq(18,22,0.5))
abline(h= 20)