QUESTION 65: The data file contains valueas for breast cnxcer mortality from 1950 to 1960(y) and the adult white female population in 1960(x) for 301 counties in North Carolina, South Carolina and Georgia.
Reading in the Cancer Data
Cancer <- read.csv(url("http://statistics.csueastbay.edu/~jkerr/STAT65012/cancer.txt"), header=FALSE, col.names=c("Mortality","Female"))
N <- length(Cancer$Mortality)
graphics.off()
x11()
hist(Cancer$Mortality, freq=FALSE, main='Histogram of the Population Values for Cancer Mortality.')
What are the population variance and the standard deviation?
cat('(b), The Mortality Population mean is ', mean(Cancer$Mortality), 'whereas the Total is ', sum(Cancer$Mortality), 'with variance ', var(Cancer$Mortality), 'and standard devation ', sd(Cancer$Mortality), '\n')
(b), The Mortality Population mean is 39.85714 whereas the Total is 11997 with variance 2598.736 and standard devation 50.9778
c = numeric(100)
for (i in 1:100){
c[i] = mean(sample(Cancer$Mortality, 25, replace=FALSE))
}
x11()
hist(c, freq=FALSE, main='Sampling Distribution of the Mean')
samp = sample(Cancer$Mortality, 25, replace=FALSE)
cat('(d), The mean estimate is ', mean(samp), 'and the estimate of the total is ', sum(samp), '\n')
(d), The mean estimate is 44.16 and the estimate of the total is 1104
cat('(e), the variance estimate is ', var(samp), 'and the estimate of the standard deviation is ', sd(samp), '\n')
(e), the variance estimate is 2772.057 and the estimate of the standard deviation is 52.65032
Do the intervals cover the population values?
sdxbar = sqrt(var(samp)/25*(1-25/301))
sdxbar
[1] 9.695443
cat('((f), \n','A CI for the mean is \n (', mean(samp) -1.96*sdxbar,',', mean(samp)+1.96*sdxbar, ') \n')
((f),
A CI for the mean is
( 24.39675 , 63.92325 )
cat('A CI for the total is \n (', 301*mean(samp) -1.96*301*sdxbar,',', 301*mean(samp)+1.96*301*sdxbar, ') \n')
A CI for the total is
( 7343.421 , 19240.9 )
samp = sample(Cancer$Mortality, 100, replace=FALSE)
samp
[1] 6 11 12 13 37 34 51 3 59 9 17 0 35 236 46
[16] 267 1 91 33 20 69 47 145 6 13 13 3 60 37 11
[31] 66 5 72 12 27 4 41 90 8 244 29 24 11 24 27
[46] 23 15 55 17 30 14 88 15 16 77 20 117 73 11 36
[61] 3 12 4 9 27 32 42 30 10 26 16 41 4 17 8
[76] 163 45 21 18 7 15 10 12 66 30 37 5 127 1 23
[91] 5 12 11 103 12 16 63 167 70 27
cat('(g), \n the estimate of the mean is ', mean(samp), 'and the estimate of the total is ', sum(samp), '\n')
(g),
the estimate of the mean is 40.23 and the estimate of the total is 4023
cat('The variance estimate is ', var(samp), 'and the estimate of the standard deviation is ', sd(samp), '\n')
The variance estimate is 2562.906 and the estimate of the standard deviation is 50.62515
sdxbar = sqrt(var(samp)/25*(1-25/301))
sdxbar
[1] 9.695443
cat('A CI for the mean is \n (', mean(samp) -1.96*sdxbar,',', mean(samp)+1.96*sdxbar, ') \n')
A CI for the mean is
( 21.22693 , 59.23307 )
cat('A CI for the total is \n (', 301*mean(samp) -1.96*301*sdxbar,',', 301*mean(samp)+1.96*301*sdxbar, ') \n')
A CI for the total is
( 6389.307 , 17829.15 )
d = e = numeric(100)
for (i in 1:100){
j = sample(1:N, 25 , replace=FALSE)
d[i] = mean(Cancer$Mortality[j])/mean(Cancer$Female[j])*mean(Cancer$Female)
}
d
[1] 38.15240 40.89760 43.95836 40.58202 39.41910 38.41363
[7] 39.33495 39.32256 41.92111 40.31226 42.99267 41.21637
[13] 42.60476 43.68683 35.95339 41.37712 39.32311 41.03023
[19] 42.34080 35.62929 32.80578 44.05012 44.47511 38.74844
[25] 40.15742 42.20169 43.51888 38.88297 38.51299 37.08890
[31] 39.70523 40.74819 39.55563 38.25036 39.49280 42.02282
[37] 42.19977 39.36714 38.04093 38.99459 36.27532 37.67923
[43] 41.59181 39.44719 38.14552 39.09272 43.16982 43.59040
[49] 40.80089 40.28861 38.74162 40.98429 41.96745 40.39247
[55] 38.81621 40.18486 38.95019 41.59613 41.40039 39.43862
[61] 35.63055 38.67316 43.71671 42.50469 41.03244 40.78019
[67] 38.80527 40.10426 35.68050 38.91007 40.20016 39.35046
[73] 37.27795 40.40142 45.52198 44.12525 36.97069 39.55491
[79] 39.26207 34.83981 40.98078 39.12503 39.34630 44.57584
[85] 37.85070 36.68134 34.66507 40.98935 38.11079 40.84179
[91] 39.84139 40.76611 40.94539 41.95381 39.55925 37.51121
[97] 41.57982 41.99816 36.61487 40.63078
cat('The new sampling distribution appears much less variable than that of part (c) \n')
The new sampling distribution appears much less variable than that of part (c)
j = sample(1:N, 25 , replace=FALSE)
j
[1] 110 257 115 279 45 66 187 265 42 121 289 53 260 155 136
[16] 191 237 10 50 100 148 185 249 1 131
ratio.mean = mean(Cancer$Mortality[j])/mean(Cancer$Female[j])*mean(Cancer$Female)
ratio.mean
[1] 38.02448
ratio.total = mean(Cancer$Mortality[j])/mean(Cancer$Female[j])*mean(Cancer$Female)*N
ratio.total
[1] 11445.37
partd.mean = mean(Cancer$Mortality[j])
partd.mean
[1] 34.72
partd.total = partd.mean*N
partd.total
[1] 10450.72
cat('The estimates are close but both estimates based on the ratio are larger than their part (d) counterparts, resulting in estimates closer to the population values \n')
The estimates are close but both estimates based on the ratio are larger than their part (d) counterparts, resulting in estimates closer to the population values