(a)Generate 9 random numbers, normally distributed with mean 80 and standard deviation 5, and repeat this 100 times:
dta <- replicate(100, rnorm(9, mean = 80, sd = 5))
dim(dta)
## [1] 9 100
(b) Calculate the sample means:
xbar <- apply(dta, 2, mean)
Compare the distribution of sample means with the population distribution:
par(mfrow = c(2, 1))
hist(dta[1, ], nclass = 20, col = "lightpink", xla = "values for 100 subjects",
main = "population measurements", xlim = c(60, 100))
hist(xbar, nclass = 20, col = "skyblue", xlab = "means for samples of size 9",
main = "sample means", xlim = c(60, 100))
par(mfrow = c(1, 1))
Generally, the population distribution seem to follow a normal distribution more closely than the sample means, as seen in the graph. Additionally, the population distribution covers a wider range of data.
heights <- replicate(100, rnorm(9, mean = 69, sd = 3))
(b) Calculate 90 % confidence intervals for each row:
tint <- matrix(NA, nrow = dim(heights)[2], ncol = 2)
for (i in 1:dim(heights)[2]) {
temp <- t.test(heights[, i], conf.level = 0.9)
tint[i, ] <- temp$conf.int
}
colnames(tint) <- c("low", "up")
What is the width of each confidence interval?
width <- apply(tint, 1, diff)
tint <- cbind(tint, width)
tint <- data.frame(tint)
tint
## low up width
## 1 67.83 70.90 3.071
## 2 67.84 72.02 4.172
## 3 66.70 69.23 2.534
## 4 66.71 71.02 4.305
## 5 65.57 68.94 3.370
## 6 69.09 72.84 3.758
## 7 68.43 71.56 3.122
## 8 66.88 69.73 2.856
## 9 65.09 69.82 4.735
## 10 68.05 72.35 4.298
## 11 67.32 72.33 5.012
## 12 66.37 69.89 3.525
## 13 64.71 69.46 4.750
## 14 67.91 71.09 3.188
## 15 68.01 70.85 2.843
## 16 66.69 68.73 2.035
## 17 65.03 68.63 3.600
## 18 66.79 69.71 2.912
## 19 66.20 70.74 4.535
## 20 67.10 69.94 2.839
## 21 67.70 70.93 3.232
## 22 67.67 71.05 3.383
## 23 66.00 69.77 3.773
## 24 66.73 69.79 3.052
## 25 68.16 72.26 4.097
## 26 66.35 70.57 4.215
## 27 67.08 72.04 4.964
## 28 68.33 71.91 3.579
## 29 67.55 69.00 1.450
## 30 68.09 70.99 2.905
## 31 66.94 70.91 3.968
## 32 68.48 70.45 1.970
## 33 67.33 70.91 3.588
## 34 67.14 70.82 3.675
## 35 66.20 71.63 5.428
## 36 66.09 70.41 4.316
## 37 66.42 71.09 4.665
## 38 69.59 72.44 2.847
## 39 67.30 70.90 3.604
## 40 66.58 69.72 3.136
## 41 67.18 69.98 2.794
## 42 67.81 71.03 3.216
## 43 68.00 70.77 2.772
## 44 66.23 69.20 2.968
## 45 68.17 71.32 3.149
## 46 68.15 70.85 2.695
## 47 66.33 70.65 4.312
## 48 67.48 70.11 2.626
## 49 65.47 69.59 4.121
## 50 67.58 72.16 4.585
## 51 65.90 70.57 4.671
## 52 65.79 68.77 2.987
## 53 67.76 70.24 2.477
## 54 68.08 72.94 4.860
## 55 67.88 71.79 3.912
## 56 66.49 70.21 3.714
## 57 67.72 70.71 2.995
## 58 66.36 69.98 3.622
## 59 66.21 68.71 2.499
## 60 66.36 71.14 4.780
## 61 67.37 72.50 5.132
## 62 67.01 70.28 3.269
## 63 68.99 73.95 4.956
## 64 67.07 71.08 4.010
## 65 64.21 69.64 5.429
## 66 68.76 71.96 3.206
## 67 65.96 70.13 4.172
## 68 67.66 70.82 3.159
## 69 66.62 71.94 5.318
## 70 65.55 71.20 5.653
## 71 67.23 70.73 3.500
## 72 66.81 69.54 2.726
## 73 66.25 69.66 3.419
## 74 68.37 70.87 2.500
## 75 66.70 71.83 5.133
## 76 66.58 70.83 4.251
## 77 66.80 70.83 4.026
## 78 67.34 70.75 3.405
## 79 67.80 71.62 3.822
## 80 67.04 70.30 3.251
## 81 68.57 72.37 3.806
## 82 67.49 71.78 4.294
## 83 68.50 72.06 3.564
## 84 68.16 72.13 3.968
## 85 67.88 72.00 4.121
## 86 66.11 69.76 3.655
## 87 68.46 70.80 2.338
## 88 68.12 70.48 2.364
## 89 66.84 70.08 3.239
## 90 68.43 71.18 2.746
## 91 66.17 68.92 2.753
## 92 66.73 71.23 4.501
## 93 67.12 69.46 2.338
## 94 67.54 70.42 2.884
## 95 66.32 72.07 5.753
## 96 67.04 69.66 2.621
## 97 68.06 71.59 3.533
## 98 69.50 71.99 2.483
## 99 67.08 70.63 3.554
## 100 67.98 70.65 2.678
© How many intervals contain the true mean μ?
indx <- (tint$low <= 69) & (tint$up >= 69)
sum(indx)
## [1] 91
(d) How many intervals do you expect to contain μ?
Since the confidence level is 90% then of the 100 samples, I expect 90% of them (i.e. 90 samples to contain \( \mu \))
(e) Do all of the intervals have the same width? Why or why not?
No the intervals do not have the same width because the two endpoints of the interval involve a random variable (specifically the midpoint of region is randomly generated).
tint <- matrix(NA, nrow = dim(heights)[2], ncol = 2)
for (i in 1:dim(heights)[2]) {
temp <- t.test(heights[, i], conf.level = 0.95)
tint[i, ] <- temp$conf.int
}
colnames(tint) <- c("low", "up")
width <- apply(tint, 1, diff)
tint <- cbind(tint, width)
tint <- data.frame(tint)
tint
## low up width
## 1 67.46 71.27 3.809
## 2 67.34 72.52 5.174
## 3 66.39 69.54 3.142
## 4 66.20 71.53 5.339
## 5 65.17 69.35 4.179
## 6 68.64 73.29 4.660
## 7 68.06 71.93 3.872
## 8 66.54 70.08 3.542
## 9 64.52 70.39 5.872
## 10 67.53 72.86 5.330
## 11 66.72 72.94 6.216
## 12 65.94 70.32 4.371
## 13 64.14 70.03 5.891
## 14 67.52 71.48 3.953
## 15 67.67 71.19 3.525
## 16 66.45 68.97 2.523
## 17 64.59 69.06 4.464
## 18 66.44 70.05 3.612
## 19 65.66 71.28 5.624
## 20 66.76 70.28 3.521
## 21 67.31 71.32 4.008
## 22 67.26 71.46 4.196
## 23 65.55 70.23 4.678
## 24 66.37 70.15 3.785
## 25 67.67 72.75 5.081
## 26 65.85 71.08 5.227
## 27 66.48 72.64 6.156
## 28 67.90 72.34 4.439
## 29 67.38 69.18 1.798
## 30 67.74 71.34 3.602
## 31 66.47 71.39 4.921
## 32 68.25 70.69 2.443
## 33 66.89 71.34 4.450
## 34 66.70 71.26 4.558
## 35 65.55 72.28 6.732
## 36 65.57 70.93 5.352
## 37 65.86 71.65 5.785
## 38 69.25 72.78 3.531
## 39 66.87 71.34 4.469
## 40 66.21 70.09 3.888
## 41 66.85 70.31 3.465
## 42 67.42 71.41 3.988
## 43 67.67 71.11 3.437
## 44 65.88 69.56 3.681
## 45 67.79 71.70 3.906
## 46 67.83 71.17 3.342
## 47 65.82 71.16 5.347
## 48 67.17 70.43 3.256
## 49 64.98 70.09 5.110
## 50 67.03 72.71 5.685
## 51 65.34 71.13 5.793
## 52 65.43 69.13 3.704
## 53 67.47 70.54 3.072
## 54 67.49 73.52 6.027
## 55 67.41 72.26 4.851
## 56 66.05 70.65 4.606
## 57 67.36 71.07 3.715
## 58 65.93 70.42 4.492
## 59 65.91 69.01 3.099
## 60 65.79 71.72 5.928
## 61 66.76 73.12 6.364
## 62 66.62 70.67 4.054
## 63 68.40 74.54 6.146
## 64 66.59 71.56 4.972
## 65 63.56 70.29 6.732
## 66 68.37 72.35 3.975
## 67 65.46 70.64 5.173
## 68 67.28 71.20 3.918
## 69 65.98 72.58 6.594
## 70 64.87 71.88 7.010
## 71 66.81 71.15 4.340
## 72 66.49 69.87 3.380
## 73 65.84 70.07 4.239
## 74 68.07 71.17 3.101
## 75 66.08 72.45 6.366
## 76 66.07 71.34 5.271
## 77 66.32 71.31 4.993
## 78 66.94 71.16 4.223
## 79 67.34 72.08 4.739
## 80 66.65 70.69 4.031
## 81 68.11 72.83 4.720
## 82 66.97 72.30 5.325
## 83 68.07 72.49 4.419
## 84 67.68 72.60 4.920
## 85 67.39 72.50 5.110
## 86 65.67 70.20 4.532
## 87 68.18 71.08 2.899
## 88 67.83 70.76 2.931
## 89 66.45 70.47 4.017
## 90 68.10 71.51 3.406
## 91 65.84 69.25 3.413
## 92 66.19 71.77 5.582
## 93 66.84 69.74 2.899
## 94 67.19 70.77 3.576
## 95 65.63 72.76 7.134
## 96 66.73 69.98 3.250
## 97 67.63 72.01 4.381
## 98 69.21 72.29 3.079
## 99 66.65 71.06 4.408
## 100 67.65 70.98 3.321
indx <- (tint$low <= 69) & (tint$up >= 69)
sum(indx)
## [1] 97
I found that there were more intervals, and that the width was wider.
Confidence Intervals III. Suppose you took samples of size n = 100 instead of n = 9. Would you expect more or fewer 90% confidence intervals to cover 69? Would they be narrower or wider?
My Initial Answer: I expect the number of intervals to cover 69 to be roughly the same, but perhaps slightly more than 90 because each sample has more data points covered.
heights <- replicate(100, rnorm(100, mean = 69, sd = 3))
(b) Calculate 90 % confidence intervals for each row:
tint <- matrix(NA, nrow = dim(heights)[2], ncol = 2)
for (i in 1:dim(heights)[2]) {
temp <- t.test(heights[, i], conf.level = 0.9)
tint[i, ] <- temp$conf.int
}
colnames(tint) <- c("low", "up")
width <- apply(tint, 1, diff)
tint <- cbind(tint, width)
tint <- data.frame(tint)
tint
## low up width
## 1 68.72 69.66 0.9407
## 2 68.57 69.54 0.9726
## 3 68.17 69.12 0.9488
## 4 68.48 69.56 1.0829
## 5 68.11 69.14 1.0350
## 6 68.22 69.16 0.9335
## 7 68.67 69.71 1.0416
## 8 68.47 69.47 1.0085
## 9 69.01 70.11 1.0918
## 10 69.49 70.53 1.0395
## 11 68.32 69.36 1.0406
## 12 68.79 69.76 0.9716
## 13 68.85 69.78 0.9329
## 14 68.42 69.42 1.0031
## 15 68.20 69.30 1.0949
## 16 68.67 69.60 0.9284
## 17 68.66 69.69 1.0308
## 18 68.25 69.30 1.0459
## 19 68.52 69.50 0.9821
## 20 68.37 69.35 0.9800
## 21 68.20 69.15 0.9498
## 22 68.78 69.74 0.9651
## 23 67.93 69.06 1.1376
## 24 69.10 70.09 0.9907
## 25 68.97 69.88 0.9090
## 26 68.49 69.54 1.0470
## 27 68.37 69.41 1.0394
## 28 68.81 69.75 0.9423
## 29 68.14 69.13 0.9919
## 30 68.46 69.53 1.0710
## 31 68.73 69.61 0.8771
## 32 68.44 69.48 1.0314
## 33 68.41 69.34 0.9268
## 34 68.16 69.16 0.9953
## 35 68.62 69.60 0.9858
## 36 68.43 69.39 0.9637
## 37 68.36 69.31 0.9537
## 38 68.90 69.83 0.9374
## 39 68.44 69.44 1.0066
## 40 68.28 69.28 1.0002
## 41 68.73 69.67 0.9407
## 42 68.48 69.47 0.9858
## 43 68.37 69.42 1.0452
## 44 68.26 69.23 0.9744
## 45 68.38 69.51 1.1251
## 46 68.25 69.10 0.8494
## 47 68.42 69.57 1.1531
## 48 68.61 69.53 0.9123
## 49 68.99 70.11 1.1196
## 50 68.11 69.17 1.0670
## 51 68.37 69.36 0.9893
## 52 68.75 69.75 0.9939
## 53 68.51 69.47 0.9621
## 54 68.34 69.31 0.9664
## 55 68.19 69.19 1.0035
## 56 68.43 69.31 0.8749
## 57 68.46 69.40 0.9333
## 58 68.36 69.36 0.9956
## 59 68.30 69.23 0.9327
## 60 68.68 69.60 0.9254
## 61 68.48 69.41 0.9363
## 62 68.94 69.90 0.9625
## 63 68.76 69.79 1.0321
## 64 67.66 68.61 0.9494
## 65 67.82 68.81 0.9821
## 66 67.96 68.93 0.9637
## 67 68.58 69.66 1.0811
## 68 68.51 69.61 1.1012
## 69 67.91 68.94 1.0301
## 70 68.28 69.24 0.9659
## 71 68.53 69.54 1.0074
## 72 68.59 69.62 1.0258
## 73 68.46 69.45 0.9927
## 74 68.14 69.12 0.9724
## 75 68.28 69.29 1.0020
## 76 69.12 70.23 1.1103
## 77 68.32 69.40 1.0797
## 78 67.73 68.76 1.0318
## 79 68.87 69.95 1.0788
## 80 68.63 69.64 1.0117
## 81 68.26 69.22 0.9584
## 82 68.62 69.61 0.9955
## 83 68.58 69.56 0.9788
## 84 69.33 70.32 0.9955
## 85 68.52 69.52 1.0021
## 86 68.39 69.32 0.9281
## 87 68.52 69.52 0.9962
## 88 69.25 70.18 0.9257
## 89 68.63 69.52 0.8916
## 90 68.48 69.43 0.9496
## 91 68.52 69.50 0.9772
## 92 68.07 69.17 1.0958
## 93 68.63 69.56 0.9284
## 94 68.72 69.56 0.8479
## 95 68.64 69.53 0.8919
## 96 69.02 69.95 0.9347
## 97 69.02 69.99 0.9732
## 98 69.06 70.04 0.9778
## 99 68.26 69.26 0.9985
## 100 68.29 69.24 0.9414
indx <- (tint$low <= 69) & (tint$up >= 69)
sum(indx)
## [1] 86
Thus,
Error in eval(expr, envir, enclos) : object 'index' not found
intervals contain the true mean.
Generate observations from an exponential distribution with mean 25, and compute 90% confidence intervals. (a) Generate 1000 observations from this distribution, and then make a histogram and a normal probability plot. Describe your results.
mean = 25
dta <- rexp(1000, rate = 1/mean)
hist(dta, nclass = 20, col = "skyblue", main = "random numbers from exponential dist.")
qqnorm(dta)
qqline(dta)
(b) Create an empty matrix and label the first 5 columns
xx <- matrix(NA, nrow = 1000, ncol = 5)
colnames(xx) <- c("xbar", "stdev", "lcl", "ucl", "Cover true mean?")
© Now generate 1000 samples with 10 observations each (i.e. small samples) of random data from an exponential distribution with mean 25.
mean = 25
dta <- replicate(1000, rexp(10, rate = 1/mean))
xbar <- apply(dta, 2, mean)
stdev <- apply(dta, 2, sd)
Compute the lower and upper ends of the t-intervals.
tint <- matrix(NA, nrow = dim(dta)[2], ncol = 2)
for (i in 1:dim(dta)[2]) {
temp <- t.test(dta[, i], conf.level = 0.9)
tint[i, ] <- temp$conf.int
}
colnames(tint) <- c("lcl", "ucl")
Store the information:
xx[, 1] <- xbar
xx[, 2] <- stdev
xx[, 3] <- tint[, 1]
xx[, 4] <- tint[, 2]
(g) How many samples include the true mean? Store this in the column cover
xx[, 5] <- (xx[, 3] < mean) & (xx[, 4] > mean)
cover <- xx[, 5]
tint <- cbind(tint, cover)
tint <- data.frame(tint)
Compute the sum of this column.
sum(xx[, 5])
## [1] 852
What do you find? Is this what you expected? How does this compare to a coverage of confidence interval for a normal distribution such as heights in the section Confidence Intervals I if it would have been done 1000 times?
Fewer than 90% of my intervals contained the true mean - which is what I expected to happen from the exponential distribution, however the result was a better estimation than I expected. Closer to 80% of the intervals (852) covered the true mean. This is still relatively close to 90% of the intervals, but this is most likely because 1000 is such a high number. If we had used n=100 as in Confidence Intervals I, the result may not be as close. This lack of continuity is also easily seen on the QQ plot.
If Confidence Intervals I would have been done 1000 times, I would expect about 900 of the intervals to contain the true mean because the sample was normally distributed. Redo this simulation with simms=1000
heights <- replicate(1000, rnorm(9, mean = 69, sd = 3))
tint <- matrix(NA, nrow = dim(heights)[2], ncol = 2)
for (i in 1:dim(heights)[2]) {
temp <- t.test(heights[, i], conf.level = 0.9)
tint[i, ] <- temp$conf.int
}
colnames(tint) <- c("low", "up")
width <- apply(tint, 1, diff)
tint <- cbind(tint, width)
tint <- data.frame(tint)
indx <- (tint$low <= 69) & (tint$up >= 69)
sum(indx)
## [1] 891
As is shown, 891 intervals cover the true mean - closer to 90%.