\(survey\) is a well-known R package for survey sampling.
Install the \(survey\) package, and then load the package in R.
#install.packages("survey") # after running this commend, you will have "survey" installed in your R
library(survey) # load the package
“4.40: An auditor detects that a certain firm is regularly overstating the dollar amounts of inventories because of delays in recording withdrawals. The auditor wants to estimate the total overstated amount on 1000 listed items by obtaining exact inventory amounts on a random sample of 15 items and comparing these exact figures with the recorded amounts.Estimate the total overstated amount on the 1000 types of items and place a bound on the error of estimation.”
\(N=1000\), \(n=15\), parameter \(\tau = \sum_{i=1}^n u_i\), where \(u_i\) the overstated amount on the \(i\)th items.
setwd("D:\\course\\111-1\\Survey Sampling\\Rnote\\Rpub\\sampling")
data.40<-read.csv("EXERCISE4_40.csv")
data.40
## Item Audited Recorded Overstat
## 1 1 175 210 35
## 2 2 295 305 10
## 3 3 68 91 23
## 4 4 74 82 8
## 5 5 128 140 12
## 6 6 241 250 9
## 7 7 362 384 22
## 8 8 72 80 8
## 9 9 59 82 23
## 10 10 112 140 28
## 11 11 118 124 6
## 12 12 210 230 20
## 13 13 240 260 20
## 14 14 123 147 24
## 15 15 96 108 12
\(ids\): \(ids=\)~\(1\) means selecting samples from whole population (no stratification)
\(fpc\): finite population correction. \(fpc=\)~\(x\) says that input variable \(x\) contain the population size. \(x\) is a vector of length n. Use \(~\) when \(x\) is in the supplied data set.
\(data\): data frame to look up. Must include the variable of interest.
N<-1000
n<-15
N.v<-rep(N, n) # the population size for the n samples
design1 <- svydesign(ids = ~1, fpc = N.v, data=data.40) # finite population
w = rep(N/n,n)
design2 <- svydesign(ids = ~1, weight =w, data=data.40) # when N >> n
\(svymean(x, design)\): estimators for the population mean
\(svytotal(x, design)\): estimators for the population total
\(conf(object, level = 0.95)\): the confidence interval
\(SE(object)\): extracts standard errors from an object
r.total.1<-svytotal(~Overstat, design1)
r.total.1
## total SE
## Overstat 17333 2222.8
As a result, \(\hat{\tau}=17333, \hat{V}(\hat{\tau})=(2222.8)^2\). Check the results by using the formula in textbook: \[\hat{\tau}=N\bar{y},\] \[\hat{V}(\hat{\tau})=N^2(1-n/N)(s^2/n)\]
mean(data.40$Overstat)*N #tau hat
## [1] 17333.33
sd.total<-sd(data.40$Overstat)
N^2*(1-n/N)*(sd.total^2/n) #V hat
## [1] 4940635
SE(r.total.1)^2
## Overstat
## Overstat 4940635
The bound on the error of \(\hat{\tau}\):
2*SE(r.total.1)
## Overstat
## Overstat 4445.508
Therefore, the estimator of the total overstated amount on the 1000 types of items is \(\hat{\tau} = \$ 17333\). The bound on the error of \(\hat{\tau}\) is \(\$\) 4445.5.
Since 1000 >> 15, we can also ignore the fpc.
r.total.2<-svytotal(~Overstat, design2)
r.total.2
## total SE
## Overstat 17333 2239.6
2*SE(r.total.2)
## Overstat
## Overstat 4479.229