Objective : Use R as a calculator & Physical Notes.

Ch 4 : Simple Random Sample

Ex. 4.1)

List all possible simple random samples of size n = 2 that can be selected from the population {0, 1, 2, 3, 4}. Calculate s2 for the population and V(y) for the sample.

N <- 5
n <- 2
sample_space <- choose(N,n) # 10 total comb. of n = 2
sample_space

## [1] 10

P <- 0:4
mu <- (1/5)*sum(P)
mu

## [1] 2

sigma2 <- (1/5)*sum((P-mu)^2)
FPC <- (N-n)/(N-1)
V_ybar <- (sigma2/n)*FPC
V_ybar

## [1] 0.75

Ex. 4.14)

State park officials were interested in the proportion of campers who consider the camp- site spacing adequate in a particular campground. They decided to take a simple random sample of n = 30 from the first N = 300 camping parties that visited the campground. Let yi = 0 if the head of the ith party sampled does not think the campsite spacing is ad- equateandyi = 1ifhedoes(i =1,2,…,30).Usethedataintheaccompanyingtable to estimate p, the proportion of campers who consider the campsite spacing adequate. Place a bound on the error of estimation.

rm(list=ls())
N <- 300
n <- 30
sm_yi <- 25
phat <- sm_yi/n; round(phat, 3)

## [1] 0.833

phat_c <- 1-phat
FPC <- (1-n/N)
Vhat_phat <- ((phat*phat_c)*(FPC))/(n-1); round(Vhat_phat, 3)

## [1] 0.004

Z_c <- 2 # est. 1.96
B <- Z_c*sqrt(Vhat_phat)
round(B, 3)

## [1] 0.131

Ex. 4.15)

Use the data in Exercise 4.14 to determine the sample size required to estimate p with a bound on the error of estimation of magnitude B = 0.05.

B <- .05
D <- (B^2)/4
n <- (N*phat*phat_c)/((N-1)*D+(phat*phat_c))
ceiling(n) # round to nearest integer

## [1] 128

Ex. 4.16)

A simple random sample of 100 water meters within a community is monitored to esti- mate the average daily water consumption per household over a specified dry spell. The sample mean and sample variance are found to be y = 12.5 and s2 = 1252. If we assume that there are N = 10,000 households within the community, estimate m, the true mean daily consumption.

rm(list=ls())
ybar <- 12.5
s2 <- 1252
N <-10000
n <- 100
FPC <- 1-n/N
Vhat_ybar <- (s2/n)*FPC; round(sqrt(Vhat_ybar), 3)

## [1] 3.521

Z_c <- 2 # est. 1.96
B <- Z_c*sqrt(Vhat_ybar); round(B, 3)

## [1] 7.041

Ex. 4.17)

Using data in Exercise 4.16, estimate the total number of gallons of water, t, used daily during the dry spell. Place a bound on the error of estimation.

tau_hat <- N*ybar; round(tau_hat, 3)

## [1] 125000

Vhat_tau <- (N^2)*Vhat_ybar; round(sqrt(Vhat_tau), 3)

## [1] 35206.25

B <- 2*sqrt(Vhat_tau); round(B, 3)

## [1] 70412.5

Ex. 4.19)

A dentist was interested in the effectiveness of a new toothpaste. A group of N = 1000 schoolchildren participated in a study. Prestudy records showed there was an average of 2.2 cavities every six months for the group. After three months of the study, the dentist sampled n = 10 children to determine how they were progressing on the new toothpaste. Using the data in the accompanying table, estimate the mean number of cavities for the entire group and place a bound on the error of estimation.

rm(list=ls())
N <-1000
y <- c(0,4,2,3,2,0,3,4,1,1)
hist(y)

df <- data.frame(y=y); df

ybar <- mean(y); ybar

## [1] 2

n <- nrow(df)
s2 <- sum((y-ybar)^2)/(n-1)
FPC <- 1-n/N
Vhat_ybar <- (s2/n)*FPC; round( sqrt(Vhat_ybar), 3)

## [1] 0.469

B <- 2*sqrt(Vhat_ybar)
round(B, 3)

## [1] 0.938

Ex. 4.20)

The Fish and Game Department of a particular state was concerned about the direction of its future hunting programs. To provide for a greater potential for future hunting, the department wanted to determine the proportion of hunters seeking any type of game bird. A simple random sample of n = 1000 of the N = 99,000 licensed hunters was obtained. Suppose 430 indicated that they hunted game birds. Estimate p, the proportion of licensed hunters seeking game birds. Place a bound on the error of estimation.

rm(list=ls())
n <- 1000
N <- 99000
FPC <- 1-n/N
sm_yi <- 430
phat <- sm_yi/n; round(phat, 3)

## [1] 0.43

qhat <- 1-phat
Vhat_phat <- (phat*qhat*FPC)/(n-1)
B <- 2*sqrt(Vhat_phat); round(B, 3)

## [1] 0.031

Ex. 4.21)

Using the data in Exercise 4.20, determine the sample size the department must obtain to estimate the proportion of game bird hunters, given a bound on the error of estimation of magnitude B = 0.02.

B <- .02
D <- (B^2)/4
n <- (N*phat*qhat)/((N-1)*D+phat*qhat)
ceiling(n)

## [1] 2392

Ex. 4.22)

A company auditor was interested in estimating the total number of travel vouchers that were incorrectly filed. In a simple random sample of n = 50 vouchers taken from a group of N = 250, 20 were filed incorrectly. Estimate the total number of vouchers from the N = 250 that have been filed incorrectly and place a bound on the error of estima- tion. [Hint: If p is the population proportion of incorrect vouchers, then Np is the total number of incorrect vouchers. An estimator of Np is NpN , which has an estimated vari- ance given by N2VN (pN ).]

rm(list=ls())
N <- 250
n <- 50
sm_y <- 20
phat <- sm_y/n
qhat <- 1-phat
tau_hat <- N*phat; round(tau_hat, 3)

## [1] 100

FPC <- 1 - n/N
Vhat_phat <- (phat*qhat*FPC)/(n-1)
Vhat_tau <- (N^2)*Vhat_phat
B <- 2*sqrt(Vhat_tau); round(B, 3)

## [1] 31.298

Ex. 4.48)

A marketing research firm estimates the proportion of potential customers preferring a certain brand of lipstick by “randomly” selecting 100 women who come by their booth in a shopping mall. Of the 100 sampled, 65 women stated a preference for brand A.

How would you estimate the true proportion of women preferring brand A, with a bound on the error of estimation?

\[ \hat{p}=\frac{1}{n}\sum_{i\in S}{y_i} \]

\[ B =Z_c*\sqrt{\hat{V}_{\hat{p}}} \]

\[ Z_c \approx2; \hat{V}_{\hat{p}}=\text{FPC}\frac{\hat{p}\hat{q}}{n-1} \]

\[ \text{FPC}=1-\frac{n}{N} \]

What is the target population in this study?

“potential customers preferring a certain brand of lipstick” – which is a poorly defined target population

Did the marketing research firm select a simple random sample?

No, this is a convenient sample – “selecting 100 women who come by their booth in a shopping mall” – self selection bias.

What additional problems do you see with this type of sampling?

not everyone is a customer. This is not a probability sample–cant apply probabalistic methods to a non-prob. samp. – further they will have been biased by the entertainment of those there

Ch 5 : Stratified Random Sampling

Ex. 5.1)

A chain of department stores is interested in estimating the proportion of accounts receivable that are delinquent. The chain consists of four stores. To reduce the cost of sampling, stratified random sampling is used, with each store as a stratum. Because no information on population proportions is available before sampling, proportional allocation is used. From the accompanying table, estimate p, the proportion of delinquent accounts for the chain, and place a bound on the error of estimation.

rm(list=ls())
N1 <- 65
N2 <- 42
N3 <- 93
N4 <- 25

n1 <- 14
n2 <- 9
n3 <- 21
n4 <- 6

numb_delinquent_accounts <- c(4,2,8,1)
Ni <- c(N1, N2, N3, N4)
N <- sum(Ni); N

## [1] 225

ni <- c(n1,n2,n3,n4)
pi <- numb_delinquent_accounts/ni; round(pi, 3)

## [1] 0.286 0.222 0.381 0.167

qi <- 1-pi

phat <- (1/N)*sum(pi*Ni); round(phat, 3)

## [1] 0.3

FPCi <- (1 - ni/Ni); round(FPCi, 3)

## [1] 0.785 0.786 0.774 0.760

Vhat_pst <- (1/N^2)*sum((pi*qi*FPCi*Ni^2)/(ni-1))

Z_c <- 2

B <- 2*sqrt(Vhat_pst); round(B, 3)

## [1] 0.117

Ch 4 : Ratio, Reg. & Difference est.

Ex. 6.1)

\[ \text{Lec. Formulation : } \]

\[ \hat{\text{V}}(\hat{\tau_y})=\tau_x^2 \hat{\text{V}}(r) \\ \hat{\text{V}}(r)= \text{FPC}*\frac{1}{\mu_x^2}\frac{s_r^2}{n}\\ s^2_r=\frac{1}{n-1}\sum_{i\in S}(y_i - rx_i)\\ \text{FPC}=\frac{N-n}{N-1}\approx\frac{N-n}{N}\approx 1-\frac{n}{N} \]

Calculation

x <- c(.3, .5, .4, .9, .7, .2, .6, .5, .8, .4, .8, .6)
y <- c(6, 9, 7, 19, 15, 5,12, 9, 20, 9, 18, 13)
T_x <- 75
sm_y <- sum(y)
sm_x <- sum(x)
df <- data.frame(x=x, y=y)
n <- 12
N <- 250
r <- sm_y/sm_x
T_y <- r*T_x
round(T_y, 2) # estimation

## [1] 1589.55

FPC <- 1-n/N
sr2 <- (1/(n-1))*sum((y-r*x)^2)
mu_x <- T_x/N
V_r <- FPC*(1/mu_x^2)*(sr2/n)
V_Ty <- V_r * T_x^2
B <- 2*sqrt(V_Ty)
round(B, 2) # Bound

## [1] 186.32

ci_Ty <- T_y + c(-1, 1)*B
ci_Ty # conf intvl

## [1] 1403.235 1775.870

Ex. 6.2)

Calculation

T_y2 <- mean(y)*N
round(T_y2, 2) # est

## [1] 2958.33

V_Ty2 <-(var(y)/n)*FPC*N^2
B2 <- 2*sqrt(V_Ty2)
round(B2, 2) # bound

## [1] 730.13

# plt
plot(x,y)
abline(a=0, b = r, col = "green")

ci_Ty2 <- T_y2 + c(-1, 1)*B2
ci_Ty2 # conf intvl

## [1] 2228.199 3688.468

Comparison :

tab <- data.frame(
  Estimate = c("T_y", "V_Ty", "B","T_y2", "V_Ty2", "B2"),
  Value = c(T_y, V_Ty, B, T_y2, V_Ty2, B2)
)

tab

Given the origin-intercept, with slope r model good fit, it is clear that our ratio est is very likely better. Further given the fact that our est. bounds are dramatically smaller with our ratio est.

Further notice :

\[ N\bar{y} =\tau_x*r \iff \bar{y}=r\mu_x \\ \text{ or,}\\ N\bar{y} =\tau_x*r \iff \bar{x}=\mu_x \]

proof :

\[ N\bar{y} =\tau_xr \\ N\bar{y} =(N\mu_x)r \\ \boxed{\bar{y}=\mu_xr} \\ \bar{y}=\mu_x \frac{\bar{y}}{\bar{x}} \\ 1=\mu_x \frac{1}{\bar{x}}\\ \boxed{\bar{x}=\mu_x} \]

ex. 6.3)

rm(list=ls()) #del gloabal env obj

x <- c("25,100", "32,200", "29,600", "35,000", "34,400", "26,500", "28,700", "28,200", "34,600", "32,700", "31,500", "30,600", "27,700", "28,500")
x <- gsub(",", "", x) |> as.numeric() # rep "," with empty space, then make it a num-vec

y <- c(3800, 5100, 4200, 6200, 5800, 4100, 3900, 3600, 3800, 4100, 4500, 5100, 4200, 4000)

df <- data.frame(x=x, y=y)
df

plot(x,y)

Recall :

\[ \hat{\text{V}}(r)= \text{FPC}*\frac{1}{\mu_x^2}\frac{s_r^2}{n}\\ s^2_r=\frac{1}{n-1}\sum_{i\in S}(y_i - rx_i)\\ \text{FPC}=\frac{N-n}{N-1}\approx\frac{N-n}{N}\approx 1-\frac{n}{N} \]

Calculation

n <- 14
N <- 150
r <- mean(y)/mean(x)
round(r, 3)

## [1] 0.147

FPC <- 1-n/N 
Sr2 <- (1/(n-1))*sum((y-r*x)^2)
V_r <- (Sr2/n)*(1/mean(x)^2)*FPC #est. of mu_x
B <- 2*sqrt(V_r)
round(B, 4)

## [1] 0.0102

ci <- r + c(-1,1)*B
ci

## [1] 0.1364745 0.1569654

ex. 6.4)

recall :

rm(list=ls())

x <- c(550, 720, 1500, 1020, 620, 980, 928, 1200, 1350, 1750, 670, 729, 1530)
y <- c(610, 780, 1600, 1030, 600, 1050, 977, 1440, 1570, 2210, 980, 865, 1710)
df <- data.frame(x=x, y=y)
df

plot(x,y)

Calculation

T_x <- 128200
n <- 13
N <- 123
FPC <- 1-n/N
mu_x <- T_x/N

r <- sum(y)/sum(x)
T_y <- T_x * r 
round(T_y, 2) # est

## [1] 145943.8

y_hat <- r*x
sr2 <- (1/(n-1))*(sum((y-y_hat)^2))
V_r <- (sr2/n)*FPC*(1/mu_x^2)
V_Ty <- V_r*T_x^2
B <- 2*sqrt(V_Ty)
round(B, 2) # slightly off 7353.67

## [1] 7353.67

ex. 6.5)

Calculation

mu_y <- mu_x*r
round(mu_y, 2)

## [1] 1186.53

V_muy <- (mu_x^2)*V_r
B <- 2*sqrt(V_muy)
round(B, 2)

## [1] 59.79

ex. 6.6)

rm(list=ls())
x <- c(14.3, 15.7, 17.8, 17.5, 13.2, 18.8, 17.6, 14.3, 14.9, 17.9, 19.2)
y <- c(15.2, 16.1, 18.1, 17.6, 14.5, 19.4, 17.5, 14.1, 15.2, 18.1, 19.5)
N <- 763
n <- 11
mu_x <- 17.2

calculation

r <- sum(y)/sum(x)
r

## [1] 1.022627

plot(x,y)
abline(a=0, b=r, col = "red") # pretty decent fit

mu_y <- r*mu_x
round(mu_y, 2)

## [1] 17.59

FPC <- 1-n/N
sr2 <- (1/(n-1))*sum((y-r*x)^2)

V_r <- (sr2/n)*FPC*(1/mu_x^2)
V_muy <- (sr2/n)*FPC #notice they cancel out (mu_x^2)
B <- 2*sqrt(V_muy)
round(B, 3)

## [1] 0.271

Applied Sampling Practice Catelog

Isaiah C. Mireles

Ch 4 : Simple Random Sample

Ex. 4.1)

Ex. 4.14)

Ex. 4.15)

Ex. 4.16)

Ex. 4.17)

Ex. 4.19)

Ex. 4.20)

Ex. 4.21)

Ex. 4.22)

Ex. 4.48)

Ch 5 : Stratified Random Sampling

Ex. 5.1)

Ch 4 : Ratio, Reg. & Difference est.

Ex. 6.1)

Calculation

Ex. 6.2)

Calculation

Comparison :

ex. 6.3)

Calculation

ex. 6.4)

Calculation

ex. 6.5)

Calculation

ex. 6.6)

calculation

ex. 6.5)

Ch 7 : Systematic Sampling

Ch 8 : Cluster Sampling

Ch 9 : Two-Stage Cluster Sampling

Ch 11 : Supplemental Topics

Ch 12 : Summary