Load the brain weight data
data <- read.csv("D:\\course\\111-1\\Survey Sampling\\Rnote\\Rpub\\mammals.csv")
dim(data)
## [1] 68 2
head(data)
## species brain_wt
## 1 African Elephant 5712.0
## 2 African giant pouched rat 6.6
## 3 Arctic Fox 44.5
## 4 Arctic ground squirrel 5.7
## 5 Asian elephant 4603.0
## 6 Baboon 179.5
number of simple random samples (m) = 50; sample size (n) = 5
set.seed(12345)
bw<-data[,2] # brain weight data
mu<-mean(bw) # population mean
n<-5 # sample size
N<-length(bw) # population size
y<-sample(bw,n,replace=F) # simple random sample of size n
m<-50 # numbers of simple random samples
index.cover<-rep(0,m) # if ith interval cover mu
## 1st simple random sample
mu_hat<-mean(y) # estimator of population mean = sample mean
s2<-sum((y-mu_hat)^2)/(n-1) # sample variance
V_hat<-(1-n/N)*s2/n # estimator of V (V is the variance of the sample mean)
ci_L<-mu_hat-2*sqrt(V_hat) # lower bound of the confidence interval (use bound of error = 2 sd.)
ci_U<-mu_hat+2*sqrt(V_hat) # upper bound of the confidence interval (use bound of error = 2 sd.)
# check if the confidence interval cover the population mean (1:cover, 0:not cover)
if(ci_L<=mu && ci_U>=mu){
index.cover[1]<-1
}
# plot the 1st simple random sample results and set the environment of the plot
plot(mu_hat,1,xlim=c(-1500,4000),ylim=c(0,50),xlab="Intervals",ylab="Sample number",main="Figure 4.1")
segments(ci_L,1,ci_U,1)
abline(v=mu,col=2,lty=2)
## 2nd - 50th simple random sample, repeat the calculation in 1st simple random sample
i=2
for(i in 2:m){
y<-sample(bw,n,replace=F) #sample of size n
mu_hat<-mean(y)
s2<-sum((y-mu_hat)^2)/(n-1)
V_hat<-(1-n/N)*s2/n
ci_L<-mu_hat-2*sqrt(V_hat)
ci_U<-mu_hat+2*sqrt(V_hat)
if(ci_L<=mu && ci_U>=mu){
index.cover[i]<-1
}
points(mu_hat,i)
segments(ci_L,i,ci_U,i)
}
Number of simple random samples whose confidence interval cover the population mean \(\mu\):
sum(index.cover)
## [1] 23
The coverage rage:
mean(index.cover)
## [1] 0.46
Only 23 of the intervals (46%) cover the true population mean of 394.5; from the figure above, many of the intervals are too short and lie to far to the left. It may cause by the right skew of the original data set.
If we use the log-transformed data: \(ln\)(brain weight)
bw_ln<-log(data[,2]) # ln(brain weight)
mu<-mean(bw_ln)
n<-5
N<-length(bw_ln)
y<-sample(bw_ln,n,replace=F)
m<-50
index.cover<-rep(0,m)
## 1st simple random sample
mu_hat<-mean(y)
s2<-sum((y-mu_hat)^2)/(n-1)
V_hat<-(1-n/N)*s2/n
ci_L<-mu_hat-2*sqrt(V_hat)
ci_U<-mu_hat+2*sqrt(V_hat)
# check if the confidence interval cover the population mean (1:cover, 0:not cover)
if(ci_L<=mu && ci_U>=mu){
index.cover[1]<-1
}
# plot the 1st simple random sample results and set the environment of the plot
plot(mu_hat,1,xlim=c(-2.5,9),ylim=c(0,50),xlab="Intervals",ylab="Sample number",main="Figure 4.2: ln(brain weight)")
segments(ci_L,1,ci_U,1)
abline(v=mu,col=2,lty=2)
## 2nd - 50th simple random sample, repeat the calculation in 1st simple random sample
i=2
for(i in 2:m){
y<-sample(bw_ln,n,replace=F)
mu_hat<-mean(y)
s2<-sum((y-mu_hat)^2)/(n-1)
V_hat<-(1-n/N)*s2/n
ci_L<-mu_hat-2*sqrt(V_hat)
ci_U<-mu_hat+2*sqrt(V_hat)
if(ci_L<=mu && ci_U>=mu){
index.cover[i]<-1
}
points(mu_hat,i)
segments(ci_L,i,ci_U,i)
}
sum(index.cover)
## [1] 44
mean(index.cover)
## [1] 0.88
After log-transformed, 44 of the calculated intervals (88%) cover the population mean 2.98. The intervals are also more uniform in length.
the 2sd CI will not work well unless there is reasonable assurance that the sample means being studied have sampling distributions that are not too far from normal. In this case, the sample size is too small to be affected by the right skew samples. Therefore, the distribution of the sample mean will also be right skew and be far from normal. (as shown in Figure 3.4)