Use the data described in Example 5.10.
5.10. A forester wants to estimate the total number of farm acres planted with trees for a state. Because the number of acres of trees varies considerably with the size of the farm, he decides to stratify on farm sizes. The 240 farms in the state are placed in one of four categories according to size. A stratified random sample of 40 farms, selected by using proportional allocation, yields the results shown in the accompanying table on number of acres planted in trees. Estimate the total number of acres of trees on farms in the state. And place a bound on the error of estimation. Graph the data on an appropriate plot and comment on the variation as we move from I to IV.
setwd("D:\\cycu\\111-1\\sampling\\R note\\ch5")
data.raw<-read.csv("EXERCISE5.10.csv")
data.raw
## StratI StratII StratIII StratIV
## 1 97 125 142 167
## 2 67 155 256 655
## 3 42 67 310 220
## 4 125 96 440 540
## 5 25 256 495 780
## 6 92 47 510 NA
## 7 105 310 320 NA
## 8 86 236 396 NA
## 9 27 220 196 NA
## 10 43 352 NA NA
## 11 45 142 NA NA
## 12 59 190 NA NA
## 13 53 NA NA NA
## 14 21 NA NA NA
colnames(data.raw)
## [1] "StratI" "StratII" "StratIII" "StratIV"
stratum <- rep(colnames(data.raw),each=dim(data.raw)[1])
y <- c(as.matrix(data.raw))
fpc.st<-rep(c(86,72,52,30),each=14) # Ni
n.st<-rep(c(14,12,9,5),each=14) # ni
weight.st<-fpc.st/n.st # sampling weights
data<-data.frame(stratum, y, fpc.st, n.st, weight.st)[is.na(y)==F,]
data
## stratum y fpc.st n.st weight.st
## 1 StratI 97 86 14 6.142857
## 2 StratI 67 86 14 6.142857
## 3 StratI 42 86 14 6.142857
## 4 StratI 125 86 14 6.142857
## 5 StratI 25 86 14 6.142857
## 6 StratI 92 86 14 6.142857
## 7 StratI 105 86 14 6.142857
## 8 StratI 86 86 14 6.142857
## 9 StratI 27 86 14 6.142857
## 10 StratI 43 86 14 6.142857
## 11 StratI 45 86 14 6.142857
## 12 StratI 59 86 14 6.142857
## 13 StratI 53 86 14 6.142857
## 14 StratI 21 86 14 6.142857
## 15 StratII 125 72 12 6.000000
## 16 StratII 155 72 12 6.000000
## 17 StratII 67 72 12 6.000000
## 18 StratII 96 72 12 6.000000
## 19 StratII 256 72 12 6.000000
## 20 StratII 47 72 12 6.000000
## 21 StratII 310 72 12 6.000000
## 22 StratII 236 72 12 6.000000
## 23 StratII 220 72 12 6.000000
## 24 StratII 352 72 12 6.000000
## 25 StratII 142 72 12 6.000000
## 26 StratII 190 72 12 6.000000
## 29 StratIII 142 52 9 5.777778
## 30 StratIII 256 52 9 5.777778
## 31 StratIII 310 52 9 5.777778
## 32 StratIII 440 52 9 5.777778
## 33 StratIII 495 52 9 5.777778
## 34 StratIII 510 52 9 5.777778
## 35 StratIII 320 52 9 5.777778
## 36 StratIII 396 52 9 5.777778
## 37 StratIII 196 52 9 5.777778
## 43 StratIV 167 30 5 6.000000
## 44 StratIV 655 30 5 6.000000
## 45 StratIV 220 30 5 6.000000
## 46 StratIV 540 30 5 6.000000
## 47 StratIV 780 30 5 6.000000
boxplot(y~stratum, data=data,horizontal = T, xlab="acres planted in trees")
The sample mean and sample variance increased from stratum I to IV, which implies that the larger the farm acres, the larger the farm acres planted with trees and the more the variations of the farm acres planted with trees. There is no outlier.
library(survey)
st.design<-svydesign(id=~1, strata=~stratum, weight=~weight.st,fpc=~fpc.st, data=data)
(r.st<-svytotal(~y,st.design,na.rm=T))
## total SE
## y 50506 4331.6
(B.st<-2*SE(r.st))
## y
## y 8663.124
Thus, \(\hat{\tau}_{st}=50506\) and the bound on the error of estimation, \(B=2\sqrt{\hat{V}(\hat{\tau}_{st})}=8663.124\)
Estimate \(\hat{\tau}_i\) in each stratum.
svyby(~y, ~stratum, st.design, svytotal)
## stratum y se
## StratI StratI 5448.714 688.5024
## StratII StratII 13176.000 1805.4101
## StratIII StratIII 17708.889 2042.6564
## StratIV StratIV 14172.000 3294.9120