Confidence intervals for the sample mean using R

Kellie Stilson

Sampling Distribution. Suppose you draw a random sample of size 9 from a population with a normal distribution and compute the sample mean x ̄. Imagine doing this 100 times so that you have 100 sample means.

(a)Generate 9 random numbers, normally distributed with mean 80 and standard deviation 5, and repeat this 100 times:

dta <- replicate(100, rnorm(9, mean = 80, sd = 5))
dim(dta)

## [1]   9 100

(b) Calculate the sample means:

xbar <- apply(dta, 2, mean)

Compare the distribution of sample means with the population distribution:

par(mfrow = c(2, 1))
hist(dta[1, ], nclass = 20, col = "lightpink", xla = "values for 100 subjects", 
    main = "population measurements", xlim = c(60, 100))
hist(xbar, nclass = 20, col = "skyblue", xlab = "means for samples of size 9", 
    main = "sample means", xlim = c(60, 100))

plot of chunk unnamed-chunk-3

par(mfrow = c(1, 1))

Generally, the population distribution seem to follow a normal distribution more closely than the sample means, as seen in the graph. Additionally, the population distribution covers a wider range of data.

Confidence Intervals I. Suppose n = 9 men are selected at random from a population. Assume the heights of the men in this population are normal with mean 69 inches and standard deviation 3 inches. Simulate the results of this selection 100 times and in each case find a 90 % confidence interval for the mean. (a) In order to do this, generate 100 columns of normally distributed random numbers (mean 69, standard deviation 3), each column has 9 rows.

heights <- replicate(100, rnorm(9, mean = 69, sd = 3))

(b) Calculate 90 % confidence intervals for each row:

tint <- matrix(NA, nrow = dim(heights)[2], ncol = 2)
for (i in 1:dim(heights)[2]) {
    temp <- t.test(heights[, i], conf.level = 0.9)
    tint[i, ] <- temp$conf.int
}
colnames(tint) <- c("low", "up")

What is the width of each confidence interval?

width <- apply(tint, 1, diff)
tint <- cbind(tint, width)
tint <- data.frame(tint)
tint

##       low    up width
## 1   67.83 70.90 3.071
## 2   67.84 72.02 4.172
## 3   66.70 69.23 2.534
## 4   66.71 71.02 4.305
## 5   65.57 68.94 3.370
## 6   69.09 72.84 3.758
## 7   68.43 71.56 3.122
## 8   66.88 69.73 2.856
## 9   65.09 69.82 4.735
## 10  68.05 72.35 4.298
## 11  67.32 72.33 5.012
## 12  66.37 69.89 3.525
## 13  64.71 69.46 4.750
## 14  67.91 71.09 3.188
## 15  68.01 70.85 2.843
## 16  66.69 68.73 2.035
## 17  65.03 68.63 3.600
## 18  66.79 69.71 2.912
## 19  66.20 70.74 4.535
## 20  67.10 69.94 2.839
## 21  67.70 70.93 3.232
## 22  67.67 71.05 3.383
## 23  66.00 69.77 3.773
## 24  66.73 69.79 3.052
## 25  68.16 72.26 4.097
## 26  66.35 70.57 4.215
## 27  67.08 72.04 4.964
## 28  68.33 71.91 3.579
## 29  67.55 69.00 1.450
## 30  68.09 70.99 2.905
## 31  66.94 70.91 3.968
## 32  68.48 70.45 1.970
## 33  67.33 70.91 3.588
## 34  67.14 70.82 3.675
## 35  66.20 71.63 5.428
## 36  66.09 70.41 4.316
## 37  66.42 71.09 4.665
## 38  69.59 72.44 2.847
## 39  67.30 70.90 3.604
## 40  66.58 69.72 3.136
## 41  67.18 69.98 2.794
## 42  67.81 71.03 3.216
## 43  68.00 70.77 2.772
## 44  66.23 69.20 2.968
## 45  68.17 71.32 3.149
## 46  68.15 70.85 2.695
## 47  66.33 70.65 4.312
## 48  67.48 70.11 2.626
## 49  65.47 69.59 4.121
## 50  67.58 72.16 4.585
## 51  65.90 70.57 4.671
## 52  65.79 68.77 2.987
## 53  67.76 70.24 2.477
## 54  68.08 72.94 4.860
## 55  67.88 71.79 3.912
## 56  66.49 70.21 3.714
## 57  67.72 70.71 2.995
## 58  66.36 69.98 3.622
## 59  66.21 68.71 2.499
## 60  66.36 71.14 4.780
## 61  67.37 72.50 5.132
## 62  67.01 70.28 3.269
## 63  68.99 73.95 4.956
## 64  67.07 71.08 4.010
## 65  64.21 69.64 5.429
## 66  68.76 71.96 3.206
## 67  65.96 70.13 4.172
## 68  67.66 70.82 3.159
## 69  66.62 71.94 5.318
## 70  65.55 71.20 5.653
## 71  67.23 70.73 3.500
## 72  66.81 69.54 2.726
## 73  66.25 69.66 3.419
## 74  68.37 70.87 2.500
## 75  66.70 71.83 5.133
## 76  66.58 70.83 4.251
## 77  66.80 70.83 4.026
## 78  67.34 70.75 3.405
## 79  67.80 71.62 3.822
## 80  67.04 70.30 3.251
## 81  68.57 72.37 3.806
## 82  67.49 71.78 4.294
## 83  68.50 72.06 3.564
## 84  68.16 72.13 3.968
## 85  67.88 72.00 4.121
## 86  66.11 69.76 3.655
## 87  68.46 70.80 2.338
## 88  68.12 70.48 2.364
## 89  66.84 70.08 3.239
## 90  68.43 71.18 2.746
## 91  66.17 68.92 2.753
## 92  66.73 71.23 4.501
## 93  67.12 69.46 2.338
## 94  67.54 70.42 2.884
## 95  66.32 72.07 5.753
## 96  67.04 69.66 2.621
## 97  68.06 71.59 3.533
## 98  69.50 71.99 2.483
## 99  67.08 70.63 3.554
## 100 67.98 70.65 2.678

indx <- (tint$low <= 69) & (tint$up >= 69)
sum(indx)

## [1] 91

(d) How many intervals do you expect to contain μ?

Since the confidence level is 90% then of the 100 samples, I expect 90% of them (i.e. 90 samples to contain \( \mu \))

(e) Do all of the intervals have the same width? Why or why not?

No the intervals do not have the same width because the two endpoints of the interval involve a random variable (specifically the midpoint of region is randomly generated).

Confidence Intervals II. Suppose you computed 95% confidence intervals instead of 90 % confidence intervals. Would you expect more or fewer intervals to cover 69? Would they be narrower or wider? My initial answer: I expect there to be more intervals and that the intervals will be wider. Check

tint <- matrix(NA, nrow = dim(heights)[2], ncol = 2)
for (i in 1:dim(heights)[2]) {
    temp <- t.test(heights[, i], conf.level = 0.95)
    tint[i, ] <- temp$conf.int
}
colnames(tint) <- c("low", "up")
width <- apply(tint, 1, diff)
tint <- cbind(tint, width)
tint <- data.frame(tint)
tint

##       low    up width
## 1   67.46 71.27 3.809
## 2   67.34 72.52 5.174
## 3   66.39 69.54 3.142
## 4   66.20 71.53 5.339
## 5   65.17 69.35 4.179
## 6   68.64 73.29 4.660
## 7   68.06 71.93 3.872
## 8   66.54 70.08 3.542
## 9   64.52 70.39 5.872
## 10  67.53 72.86 5.330
## 11  66.72 72.94 6.216
## 12  65.94 70.32 4.371
## 13  64.14 70.03 5.891
## 14  67.52 71.48 3.953
## 15  67.67 71.19 3.525
## 16  66.45 68.97 2.523
## 17  64.59 69.06 4.464
## 18  66.44 70.05 3.612
## 19  65.66 71.28 5.624
## 20  66.76 70.28 3.521
## 21  67.31 71.32 4.008
## 22  67.26 71.46 4.196
## 23  65.55 70.23 4.678
## 24  66.37 70.15 3.785
## 25  67.67 72.75 5.081
## 26  65.85 71.08 5.227
## 27  66.48 72.64 6.156
## 28  67.90 72.34 4.439
## 29  67.38 69.18 1.798
## 30  67.74 71.34 3.602
## 31  66.47 71.39 4.921
## 32  68.25 70.69 2.443
## 33  66.89 71.34 4.450
## 34  66.70 71.26 4.558
## 35  65.55 72.28 6.732
## 36  65.57 70.93 5.352
## 37  65.86 71.65 5.785
## 38  69.25 72.78 3.531
## 39  66.87 71.34 4.469
## 40  66.21 70.09 3.888
## 41  66.85 70.31 3.465
## 42  67.42 71.41 3.988
## 43  67.67 71.11 3.437
## 44  65.88 69.56 3.681
## 45  67.79 71.70 3.906
## 46  67.83 71.17 3.342
## 47  65.82 71.16 5.347
## 48  67.17 70.43 3.256
## 49  64.98 70.09 5.110
## 50  67.03 72.71 5.685
## 51  65.34 71.13 5.793
## 52  65.43 69.13 3.704
## 53  67.47 70.54 3.072
## 54  67.49 73.52 6.027
## 55  67.41 72.26 4.851
## 56  66.05 70.65 4.606
## 57  67.36 71.07 3.715
## 58  65.93 70.42 4.492
## 59  65.91 69.01 3.099
## 60  65.79 71.72 5.928
## 61  66.76 73.12 6.364
## 62  66.62 70.67 4.054
## 63  68.40 74.54 6.146
## 64  66.59 71.56 4.972
## 65  63.56 70.29 6.732
## 66  68.37 72.35 3.975
## 67  65.46 70.64 5.173
## 68  67.28 71.20 3.918
## 69  65.98 72.58 6.594
## 70  64.87 71.88 7.010
## 71  66.81 71.15 4.340
## 72  66.49 69.87 3.380
## 73  65.84 70.07 4.239
## 74  68.07 71.17 3.101
## 75  66.08 72.45 6.366
## 76  66.07 71.34 5.271
## 77  66.32 71.31 4.993
## 78  66.94 71.16 4.223
## 79  67.34 72.08 4.739
## 80  66.65 70.69 4.031
## 81  68.11 72.83 4.720
## 82  66.97 72.30 5.325
## 83  68.07 72.49 4.419
## 84  67.68 72.60 4.920
## 85  67.39 72.50 5.110
## 86  65.67 70.20 4.532
## 87  68.18 71.08 2.899
## 88  67.83 70.76 2.931
## 89  66.45 70.47 4.017
## 90  68.10 71.51 3.406
## 91  65.84 69.25 3.413
## 92  66.19 71.77 5.582
## 93  66.84 69.74 2.899
## 94  67.19 70.77 3.576
## 95  65.63 72.76 7.134
## 96  66.73 69.98 3.250
## 97  67.63 72.01 4.381
## 98  69.21 72.29 3.079
## 99  66.65 71.06 4.408
## 100 67.65 70.98 3.321

indx <- (tint$low <= 69) & (tint$up >= 69)
sum(indx)

## [1] 97

I found that there were more intervals, and that the width was wider.

Confidence Intervals III. Suppose you took samples of size n = 100 instead of n = 9. Would you expect more or fewer 90% confidence intervals to cover 69? Would they be narrower or wider?

My Initial Answer: I expect the number of intervals to cover 69 to be roughly the same, but perhaps slightly more than 90 because each sample has more data points covered.

heights <- replicate(100, rnorm(100, mean = 69, sd = 3))

(b) Calculate 90 % confidence intervals for each row:

tint <- matrix(NA, nrow = dim(heights)[2], ncol = 2)
for (i in 1:dim(heights)[2]) {
    temp <- t.test(heights[, i], conf.level = 0.9)
    tint[i, ] <- temp$conf.int
}
colnames(tint) <- c("low", "up")
width <- apply(tint, 1, diff)
tint <- cbind(tint, width)
tint <- data.frame(tint)
tint

##       low    up  width
## 1   68.72 69.66 0.9407
## 2   68.57 69.54 0.9726
## 3   68.17 69.12 0.9488
## 4   68.48 69.56 1.0829
## 5   68.11 69.14 1.0350
## 6   68.22 69.16 0.9335
## 7   68.67 69.71 1.0416
## 8   68.47 69.47 1.0085
## 9   69.01 70.11 1.0918
## 10  69.49 70.53 1.0395
## 11  68.32 69.36 1.0406
## 12  68.79 69.76 0.9716
## 13  68.85 69.78 0.9329
## 14  68.42 69.42 1.0031
## 15  68.20 69.30 1.0949
## 16  68.67 69.60 0.9284
## 17  68.66 69.69 1.0308
## 18  68.25 69.30 1.0459
## 19  68.52 69.50 0.9821
## 20  68.37 69.35 0.9800
## 21  68.20 69.15 0.9498
## 22  68.78 69.74 0.9651
## 23  67.93 69.06 1.1376
## 24  69.10 70.09 0.9907
## 25  68.97 69.88 0.9090
## 26  68.49 69.54 1.0470
## 27  68.37 69.41 1.0394
## 28  68.81 69.75 0.9423
## 29  68.14 69.13 0.9919
## 30  68.46 69.53 1.0710
## 31  68.73 69.61 0.8771
## 32  68.44 69.48 1.0314
## 33  68.41 69.34 0.9268
## 34  68.16 69.16 0.9953
## 35  68.62 69.60 0.9858
## 36  68.43 69.39 0.9637
## 37  68.36 69.31 0.9537
## 38  68.90 69.83 0.9374
## 39  68.44 69.44 1.0066
## 40  68.28 69.28 1.0002
## 41  68.73 69.67 0.9407
## 42  68.48 69.47 0.9858
## 43  68.37 69.42 1.0452
## 44  68.26 69.23 0.9744
## 45  68.38 69.51 1.1251
## 46  68.25 69.10 0.8494
## 47  68.42 69.57 1.1531
## 48  68.61 69.53 0.9123
## 49  68.99 70.11 1.1196
## 50  68.11 69.17 1.0670
## 51  68.37 69.36 0.9893
## 52  68.75 69.75 0.9939
## 53  68.51 69.47 0.9621
## 54  68.34 69.31 0.9664
## 55  68.19 69.19 1.0035
## 56  68.43 69.31 0.8749
## 57  68.46 69.40 0.9333
## 58  68.36 69.36 0.9956
## 59  68.30 69.23 0.9327
## 60  68.68 69.60 0.9254
## 61  68.48 69.41 0.9363
## 62  68.94 69.90 0.9625
## 63  68.76 69.79 1.0321
## 64  67.66 68.61 0.9494
## 65  67.82 68.81 0.9821
## 66  67.96 68.93 0.9637
## 67  68.58 69.66 1.0811
## 68  68.51 69.61 1.1012
## 69  67.91 68.94 1.0301
## 70  68.28 69.24 0.9659
## 71  68.53 69.54 1.0074
## 72  68.59 69.62 1.0258
## 73  68.46 69.45 0.9927
## 74  68.14 69.12 0.9724
## 75  68.28 69.29 1.0020
## 76  69.12 70.23 1.1103
## 77  68.32 69.40 1.0797
## 78  67.73 68.76 1.0318
## 79  68.87 69.95 1.0788
## 80  68.63 69.64 1.0117
## 81  68.26 69.22 0.9584
## 82  68.62 69.61 0.9955
## 83  68.58 69.56 0.9788
## 84  69.33 70.32 0.9955
## 85  68.52 69.52 1.0021
## 86  68.39 69.32 0.9281
## 87  68.52 69.52 0.9962
## 88  69.25 70.18 0.9257
## 89  68.63 69.52 0.8916
## 90  68.48 69.43 0.9496
## 91  68.52 69.50 0.9772
## 92  68.07 69.17 1.0958
## 93  68.63 69.56 0.9284
## 94  68.72 69.56 0.8479
## 95  68.64 69.53 0.8919
## 96  69.02 69.95 0.9347
## 97  69.02 69.99 0.9732
## 98  69.06 70.04 0.9778
## 99  68.26 69.26 0.9985
## 100 68.29 69.24 0.9414

indx <- (tint$low <= 69) & (tint$up >= 69)
sum(indx)

## [1] 86

Thus,


Error in eval(expr, envir, enclos) : object 'index' not found

intervals contain the true mean.

Robustness. The assumption for small sample confidence intervals (t-intervals) is that the population is normally distributed. What happens if this assumption is violated?

Generate observations from an exponential distribution with mean 25, and compute 90% confidence intervals. (a) Generate 1000 observations from this distribution, and then make a histogram and a normal probability plot. Describe your results.

mean = 25
dta <- rexp(1000, rate = 1/mean)
hist(dta, nclass = 20, col = "skyblue", main = "random numbers from exponential dist.")

plot of chunk unnamed-chunk-11


qqnorm(dta)
qqline(dta)

plot of chunk unnamed-chunk-11

(b) Create an empty matrix and label the first 5 columns

xx <- matrix(NA, nrow = 1000, ncol = 5)
colnames(xx) <- c("xbar", "stdev", "lcl", "ucl", "Cover true mean?")

mean = 25
dta <- replicate(1000, rexp(10, rate = 1/mean))
xbar <- apply(dta, 2, mean)
stdev <- apply(dta, 2, sd)

Compute the lower and upper ends of the t-intervals.

tint <- matrix(NA, nrow = dim(dta)[2], ncol = 2)
for (i in 1:dim(dta)[2]) {
    temp <- t.test(dta[, i], conf.level = 0.9)
    tint[i, ] <- temp$conf.int
}
colnames(tint) <- c("lcl", "ucl")

Store the information:

xx[, 1] <- xbar
xx[, 2] <- stdev
xx[, 3] <- tint[, 1]
xx[, 4] <- tint[, 2]

(g) How many samples include the true mean? Store this in the column cover

xx[, 5] <- (xx[, 3] < mean) & (xx[, 4] > mean)
cover <- xx[, 5]
tint <- cbind(tint, cover)
tint <- data.frame(tint)

Compute the sum of this column.

sum(xx[, 5])

## [1] 852

What do you find? Is this what you expected? How does this compare to a coverage of confidence interval for a normal distribution such as heights in the section Confidence Intervals I if it would have been done 1000 times?

Fewer than 90% of my intervals contained the true mean - which is what I expected to happen from the exponential distribution, however the result was a better estimation than I expected. Closer to 80% of the intervals (852) covered the true mean. This is still relatively close to 90% of the intervals, but this is most likely because 1000 is such a high number. If we had used n=100 as in Confidence Intervals I, the result may not be as close. This lack of continuity is also easily seen on the QQ plot.

If Confidence Intervals I would have been done 1000 times, I would expect about 900 of the intervals to contain the true mean because the sample was normally distributed. Redo this simulation with simms=1000

heights <- replicate(1000, rnorm(9, mean = 69, sd = 3))
tint <- matrix(NA, nrow = dim(heights)[2], ncol = 2)
for (i in 1:dim(heights)[2]) {
    temp <- t.test(heights[, i], conf.level = 0.9)
    tint[i, ] <- temp$conf.int
}
colnames(tint) <- c("low", "up")
width <- apply(tint, 1, diff)
tint <- cbind(tint, width)
tint <- data.frame(tint)
indx <- (tint$low <= 69) & (tint$up >= 69)
sum(indx)

## [1] 891

As is shown, 891 intervals cover the true mean - closer to 90%.