Objective : Use R as a calculator & Physical Notes.
List all possible simple random samples of size n = 2 that can be selected from the population {0, 1, 2, 3, 4}. Calculate s2 for the population and V(y) for the sample.
N <- 5
n <- 2
sample_space <- choose(N,n) # 10 total comb. of n = 2
sample_space
## [1] 10
P <- 0:4
mu <- (1/5)*sum(P)
mu
## [1] 2
sigma2 <- (1/5)*sum((P-mu)^2)
FPC <- (N-n)/(N-1)
V_ybar <- (sigma2/n)*FPC
V_ybar
## [1] 0.75
State park officials were interested in the proportion of campers who consider the camp- site spacing adequate in a particular campground. They decided to take a simple random sample of n = 30 from the first N = 300 camping parties that visited the campground. Let yi = 0 if the head of the ith party sampled does not think the campsite spacing is ad- equateandyi = 1ifhedoes(i =1,2,…,30).Usethedataintheaccompanyingtable to estimate p, the proportion of campers who consider the campsite spacing adequate. Place a bound on the error of estimation.
rm(list=ls())
N <- 300
n <- 30
sm_yi <- 25
phat <- sm_yi/n; round(phat, 3)
## [1] 0.833
phat_c <- 1-phat
FPC <- (1-n/N)
Vhat_phat <- ((phat*phat_c)*(FPC))/(n-1); round(Vhat_phat, 3)
## [1] 0.004
Z_c <- 2 # est. 1.96
B <- Z_c*sqrt(Vhat_phat)
round(B, 3)
## [1] 0.131
Use the data in Exercise 4.14 to determine the sample size required to estimate p with a bound on the error of estimation of magnitude B = 0.05.
B <- .05
D <- (B^2)/4
n <- (N*phat*phat_c)/((N-1)*D+(phat*phat_c))
ceiling(n) # round to nearest integer
## [1] 128
A simple random sample of 100 water meters within a community is monitored to esti- mate the average daily water consumption per household over a specified dry spell. The sample mean and sample variance are found to be y = 12.5 and s2 = 1252. If we assume that there are N = 10,000 households within the community, estimate m, the true mean daily consumption.
rm(list=ls())
ybar <- 12.5
s2 <- 1252
N <-10000
n <- 100
FPC <- 1-n/N
Vhat_ybar <- (s2/n)*FPC; round(sqrt(Vhat_ybar), 3)
## [1] 3.521
Z_c <- 2 # est. 1.96
B <- Z_c*sqrt(Vhat_ybar); round(B, 3)
## [1] 7.041
Using data in Exercise 4.16, estimate the total number of gallons of water, t, used daily during the dry spell. Place a bound on the error of estimation.
tau_hat <- N*ybar; round(tau_hat, 3)
## [1] 125000
Vhat_tau <- (N^2)*Vhat_ybar; round(sqrt(Vhat_tau), 3)
## [1] 35206.25
B <- 2*sqrt(Vhat_tau); round(B, 3)
## [1] 70412.5
A dentist was interested in the effectiveness of a new toothpaste. A group of N = 1000 schoolchildren participated in a study. Prestudy records showed there was an average of 2.2 cavities every six months for the group. After three months of the study, the dentist sampled n = 10 children to determine how they were progressing on the new toothpaste. Using the data in the accompanying table, estimate the mean number of cavities for the entire group and place a bound on the error of estimation.
rm(list=ls())
N <-1000
y <- c(0,4,2,3,2,0,3,4,1,1)
hist(y)
df <- data.frame(y=y); df
ybar <- mean(y); ybar
## [1] 2
n <- nrow(df)
s2 <- sum((y-ybar)^2)/(n-1)
FPC <- 1-n/N
Vhat_ybar <- (s2/n)*FPC; round( sqrt(Vhat_ybar), 3)
## [1] 0.469
B <- 2*sqrt(Vhat_ybar)
round(B, 3)
## [1] 0.938
The Fish and Game Department of a particular state was concerned about the direction of its future hunting programs. To provide for a greater potential for future hunting, the department wanted to determine the proportion of hunters seeking any type of game bird. A simple random sample of n = 1000 of the N = 99,000 licensed hunters was obtained. Suppose 430 indicated that they hunted game birds. Estimate p, the proportion of licensed hunters seeking game birds. Place a bound on the error of estimation.
rm(list=ls())
n <- 1000
N <- 99000
FPC <- 1-n/N
sm_yi <- 430
phat <- sm_yi/n; round(phat, 3)
## [1] 0.43
qhat <- 1-phat
Vhat_phat <- (phat*qhat*FPC)/(n-1)
B <- 2*sqrt(Vhat_phat); round(B, 3)
## [1] 0.031
Using the data in Exercise 4.20, determine the sample size the department must obtain to estimate the proportion of game bird hunters, given a bound on the error of estimation of magnitude B = 0.02.
B <- .02
D <- (B^2)/4
n <- (N*phat*qhat)/((N-1)*D+phat*qhat)
ceiling(n)
## [1] 2392
A company auditor was interested in estimating the total number of travel vouchers that were incorrectly filed. In a simple random sample of n = 50 vouchers taken from a group of N = 250, 20 were filed incorrectly. Estimate the total number of vouchers from the N = 250 that have been filed incorrectly and place a bound on the error of estima- tion. [Hint: If p is the population proportion of incorrect vouchers, then Np is the total number of incorrect vouchers. An estimator of Np is NpN , which has an estimated vari- ance given by N2VN (pN ).]
rm(list=ls())
N <- 250
n <- 50
sm_y <- 20
phat <- sm_y/n
qhat <- 1-phat
tau_hat <- N*phat; round(tau_hat, 3)
## [1] 100
FPC <- 1 - n/N
Vhat_phat <- (phat*qhat*FPC)/(n-1)
Vhat_tau <- (N^2)*Vhat_phat
B <- 2*sqrt(Vhat_tau); round(B, 3)
## [1] 31.298
A marketing research firm estimates the proportion of potential customers preferring a certain brand of lipstick by “randomly” selecting 100 women who come by their booth in a shopping mall. Of the 100 sampled, 65 women stated a preference for brand A.
\[ \hat{p}=\frac{1}{n}\sum_{i\in S}{y_i} \]
\[ B =Z_c*\sqrt{\hat{V}_{\hat{p}}} \]
\[ Z_c \approx2; \hat{V}_{\hat{p}}=\text{FPC}\frac{\hat{p}\hat{q}}{n-1} \]
\[ \text{FPC}=1-\frac{n}{N} \]
“potential customers preferring a certain brand of lipstick” – which is a poorly defined target population
No, this is a convenient sample – “selecting 100 women who come by their booth in a shopping mall” – self selection bias.
not everyone is a customer. This is not a probability sample–cant apply probabalistic methods to a non-prob. samp. – further they will have been biased by the entertainment of those there
A chain of department stores is interested in estimating the proportion of accounts receivable that are delinquent. The chain consists of four stores. To reduce the cost of sampling, stratified random sampling is used, with each store as a stratum. Because no information on population proportions is available before sampling, proportional allocation is used. From the accompanying table, estimate p, the proportion of delinquent accounts for the chain, and place a bound on the error of estimation.
rm(list=ls())
N1 <- 65
N2 <- 42
N3 <- 93
N4 <- 25
n1 <- 14
n2 <- 9
n3 <- 21
n4 <- 6
numb_delinquent_accounts <- c(4,2,8,1)
Ni <- c(N1, N2, N3, N4)
N <- sum(Ni); N
## [1] 225
ni <- c(n1,n2,n3,n4)
pi <- numb_delinquent_accounts/ni; round(pi, 3)
## [1] 0.286 0.222 0.381 0.167
qi <- 1-pi
phat <- (1/N)*sum(pi*Ni); round(phat, 3)
## [1] 0.3
FPCi <- (1 - ni/Ni); round(FPCi, 3)
## [1] 0.785 0.786 0.774 0.760
Vhat_pst <- (1/N^2)*sum((pi*qi*FPCi*Ni^2)/(ni-1))
Z_c <- 2
B <- 2*sqrt(Vhat_pst); round(B, 3)
## [1] 0.117
\[ \text{Lec. Formulation : } \]
\[ \hat{\text{V}}(\hat{\tau_y})=\tau_x^2 \hat{\text{V}}(r) \\ \hat{\text{V}}(r)= \text{FPC}*\frac{1}{\mu_x^2}\frac{s_r^2}{n}\\ s^2_r=\frac{1}{n-1}\sum_{i\in S}(y_i - rx_i)\\ \text{FPC}=\frac{N-n}{N-1}\approx\frac{N-n}{N}\approx 1-\frac{n}{N} \]
x <- c(.3, .5, .4, .9, .7, .2, .6, .5, .8, .4, .8, .6)
y <- c(6, 9, 7, 19, 15, 5,12, 9, 20, 9, 18, 13)
T_x <- 75
sm_y <- sum(y)
sm_x <- sum(x)
df <- data.frame(x=x, y=y)
n <- 12
N <- 250
r <- sm_y/sm_x
T_y <- r*T_x
round(T_y, 2) # estimation
## [1] 1589.55
FPC <- 1-n/N
sr2 <- (1/(n-1))*sum((y-r*x)^2)
mu_x <- T_x/N
V_r <- FPC*(1/mu_x^2)*(sr2/n)
V_Ty <- V_r * T_x^2
B <- 2*sqrt(V_Ty)
round(B, 2) # Bound
## [1] 186.32
ci_Ty <- T_y + c(-1, 1)*B
ci_Ty # conf intvl
## [1] 1403.235 1775.870
T_y2 <- mean(y)*N
round(T_y2, 2) # est
## [1] 2958.33
V_Ty2 <-(var(y)/n)*FPC*N^2
B2 <- 2*sqrt(V_Ty2)
round(B2, 2) # bound
## [1] 730.13
# plt
plot(x,y)
abline(a=0, b = r, col = "green")
ci_Ty2 <- T_y2 + c(-1, 1)*B2
ci_Ty2 # conf intvl
## [1] 2228.199 3688.468
tab <- data.frame(
Estimate = c("T_y", "V_Ty", "B","T_y2", "V_Ty2", "B2"),
Value = c(T_y, V_Ty, B, T_y2, V_Ty2, B2)
)
tab
Given the origin-intercept, with slope r model good fit, it is clear that our ratio est is very likely better. Further given the fact that our est. bounds are dramatically smaller with our ratio est.
Further notice :
\[ N\bar{y} =\tau_x*r \iff \bar{y}=r\mu_x \\ \text{ or,}\\ N\bar{y} =\tau_x*r \iff \bar{x}=\mu_x \]
proof :
\[ N\bar{y} =\tau_xr \\ N\bar{y} =(N\mu_x)r \\ \boxed{\bar{y}=\mu_xr} \\ \bar{y}=\mu_x \frac{\bar{y}}{\bar{x}} \\ 1=\mu_x \frac{1}{\bar{x}}\\ \boxed{\bar{x}=\mu_x} \]
rm(list=ls()) #del gloabal env obj
x <- c("25,100", "32,200", "29,600", "35,000", "34,400", "26,500", "28,700", "28,200", "34,600", "32,700", "31,500", "30,600", "27,700", "28,500")
x <- gsub(",", "", x) |> as.numeric() # rep "," with empty space, then make it a num-vec
y <- c(3800, 5100, 4200, 6200, 5800, 4100, 3900, 3600, 3800, 4100, 4500, 5100, 4200, 4000)
df <- data.frame(x=x, y=y)
df
plot(x,y)
Recall :
\[ \hat{\text{V}}(r)= \text{FPC}*\frac{1}{\mu_x^2}\frac{s_r^2}{n}\\ s^2_r=\frac{1}{n-1}\sum_{i\in S}(y_i - rx_i)\\ \text{FPC}=\frac{N-n}{N-1}\approx\frac{N-n}{N}\approx 1-\frac{n}{N} \]
n <- 14
N <- 150
r <- mean(y)/mean(x)
round(r, 3)
## [1] 0.147
FPC <- 1-n/N
Sr2 <- (1/(n-1))*sum((y-r*x)^2)
V_r <- (Sr2/n)*(1/mean(x)^2)*FPC #est. of mu_x
B <- 2*sqrt(V_r)
round(B, 4)
## [1] 0.0102
ci <- r + c(-1,1)*B
ci
## [1] 0.1364745 0.1569654
recall :
\[ \hat{\text{V}}(\hat{\tau_y})=\tau_x^2 \hat{\text{V}}(r) \\ \hat{\text{V}}(r)= \text{FPC}*\frac{1}{\mu_x^2}\frac{s_r^2}{n}\\ s^2_r=\frac{1}{n-1}\sum_{i\in S}(y_i - rx_i)\\ \text{FPC}=\frac{N-n}{N-1}\approx\frac{N-n}{N}\approx 1-\frac{n}{N} \]
rm(list=ls())
x <- c(550, 720, 1500, 1020, 620, 980, 928, 1200, 1350, 1750, 670, 729, 1530)
y <- c(610, 780, 1600, 1030, 600, 1050, 977, 1440, 1570, 2210, 980, 865, 1710)
df <- data.frame(x=x, y=y)
df
plot(x,y)
T_x <- 128200
n <- 13
N <- 123
FPC <- 1-n/N
mu_x <- T_x/N
r <- sum(y)/sum(x)
T_y <- T_x * r
round(T_y, 2) # est
## [1] 145943.8
y_hat <- r*x
sr2 <- (1/(n-1))*(sum((y-y_hat)^2))
V_r <- (sr2/n)*FPC*(1/mu_x^2)
V_Ty <- V_r*T_x^2
B <- 2*sqrt(V_Ty)
round(B, 2) # slightly off 7353.67
## [1] 7353.67
mu_y <- mu_x*r
round(mu_y, 2)
## [1] 1186.53
V_muy <- (mu_x^2)*V_r
B <- 2*sqrt(V_muy)
round(B, 2)
## [1] 59.79
rm(list=ls())
x <- c(14.3, 15.7, 17.8, 17.5, 13.2, 18.8, 17.6, 14.3, 14.9, 17.9, 19.2)
y <- c(15.2, 16.1, 18.1, 17.6, 14.5, 19.4, 17.5, 14.1, 15.2, 18.1, 19.5)
N <- 763
n <- 11
mu_x <- 17.2
r <- sum(y)/sum(x)
r
## [1] 1.022627
plot(x,y)
abline(a=0, b=r, col = "red") # pretty decent fit
mu_y <- r*mu_x
round(mu_y, 2)
## [1] 17.59
FPC <- 1-n/N
sr2 <- (1/(n-1))*sum((y-r*x)^2)
V_r <- (sr2/n)*FPC*(1/mu_x^2)
V_muy <- (sr2/n)*FPC #notice they cancel out (mu_x^2)
B <- 2*sqrt(V_muy)
round(B, 3)
## [1] 0.271
(Aside : Notice we skip Ch10 as it is not discussed in the course)