Stratified Random Sampling

Use the data described in Example 5.10.

5.10. A forester wants to estimate the total number of farm acres planted with trees for a state. Because the number of acres of trees varies considerably with the size of the farm, he decides to stratify on farm sizes. The 240 farms in the state are placed in one of four categories according to size. A stratified random sample of 40 farms, selected by using proportional allocation, yields the results shown in the accompanying table on number of acres planted in trees. Estimate the total number of acres of trees on farms in the state. And place a bound on the error of estimation. Graph the data on an appropriate plot and comment on the variation as we move from I to IV.

setwd("D:\\cycu\\111-1\\sampling\\R note\\ch5")
data.raw<-read.csv("EXERCISE5.10.csv")
data.raw
##    StratI StratII StratIII StratIV
## 1      97     125      142     167
## 2      67     155      256     655
## 3      42      67      310     220
## 4     125      96      440     540
## 5      25     256      495     780
## 6      92      47      510      NA
## 7     105     310      320      NA
## 8      86     236      396      NA
## 9      27     220      196      NA
## 10     43     352       NA      NA
## 11     45     142       NA      NA
## 12     59     190       NA      NA
## 13     53      NA       NA      NA
## 14     21      NA       NA      NA
colnames(data.raw)
## [1] "StratI"   "StratII"  "StratIII" "StratIV"
stratum <- rep(colnames(data.raw),each=dim(data.raw)[1])
y <- c(as.matrix(data.raw))
fpc.st<-rep(c(86,72,52,30),each=14) # Ni 
n.st<-rep(c(14,12,9,5),each=14)     # ni      
weight.st<-fpc.st/n.st                   # sampling weights
data<-data.frame(stratum, y, fpc.st, n.st, weight.st)[is.na(y)==F,]
data
##     stratum   y fpc.st n.st weight.st
## 1    StratI  97     86   14  6.142857
## 2    StratI  67     86   14  6.142857
## 3    StratI  42     86   14  6.142857
## 4    StratI 125     86   14  6.142857
## 5    StratI  25     86   14  6.142857
## 6    StratI  92     86   14  6.142857
## 7    StratI 105     86   14  6.142857
## 8    StratI  86     86   14  6.142857
## 9    StratI  27     86   14  6.142857
## 10   StratI  43     86   14  6.142857
## 11   StratI  45     86   14  6.142857
## 12   StratI  59     86   14  6.142857
## 13   StratI  53     86   14  6.142857
## 14   StratI  21     86   14  6.142857
## 15  StratII 125     72   12  6.000000
## 16  StratII 155     72   12  6.000000
## 17  StratII  67     72   12  6.000000
## 18  StratII  96     72   12  6.000000
## 19  StratII 256     72   12  6.000000
## 20  StratII  47     72   12  6.000000
## 21  StratII 310     72   12  6.000000
## 22  StratII 236     72   12  6.000000
## 23  StratII 220     72   12  6.000000
## 24  StratII 352     72   12  6.000000
## 25  StratII 142     72   12  6.000000
## 26  StratII 190     72   12  6.000000
## 29 StratIII 142     52    9  5.777778
## 30 StratIII 256     52    9  5.777778
## 31 StratIII 310     52    9  5.777778
## 32 StratIII 440     52    9  5.777778
## 33 StratIII 495     52    9  5.777778
## 34 StratIII 510     52    9  5.777778
## 35 StratIII 320     52    9  5.777778
## 36 StratIII 396     52    9  5.777778
## 37 StratIII 196     52    9  5.777778
## 43  StratIV 167     30    5  6.000000
## 44  StratIV 655     30    5  6.000000
## 45  StratIV 220     30    5  6.000000
## 46  StratIV 540     30    5  6.000000
## 47  StratIV 780     30    5  6.000000
boxplot(y~stratum, data=data,horizontal = T, xlab="acres planted in trees")

The sample mean and sample variance increased from stratum I to IV, which implies that the larger the farm acres, the larger the farm acres planted with trees and the more the variations of the farm acres planted with trees. There is no outlier.

library(survey)
st.design<-svydesign(id=~1, strata=~stratum, weight=~weight.st,fpc=~fpc.st, data=data)
(r.st<-svytotal(~y,st.design,na.rm=T))
##   total     SE
## y 50506 4331.6
(B.st<-2*SE(r.st))
##          y
## y 8663.124

Thus, \(\hat{\tau}_{st}=50506\) and the bound on the error of estimation, \(B=2\sqrt{\hat{V}(\hat{\tau}_{st})}=8663.124\)

Estimate \(\hat{\tau}_i\) in each stratum.

svyby(~y, ~stratum, st.design, svytotal)
##           stratum         y        se
## StratI     StratI  5448.714  688.5024
## StratII   StratII 13176.000 1805.4101
## StratIII StratIII 17708.889 2042.6564
## StratIV   StratIV 14172.000 3294.9120