Let W = women’s height and M = men’s height. From the information given in the question, we know the following relationship
\[\begin{align} W = 0.92\cdot M \end{align}\]
If we plot this equation (relationship), we will have the following figure:
## [1] The linear correlation coefficient is 1
We can see all the data points could be connected by a straight line, so there is a perfect and positive linear correlation between women’s height and men’s height, and the linear correlation coefficient, r=1
We first want to test if there is any significant difference between the percentage in 1970 and in 2005:
\[\begin{align} H_0: p_{1970} - p_{2005} = 0 \\ H_a: p_{1970} - p_{2005} \ne 0 \end{align}\]
We were also given
\(\hat{p}_{1970} = 59%\), \(\hat{p}_{2005} = 35%\), and \(n_{1975}=n_{2005}=1000\)
As the result, we know \(\hat{p}_C = \frac{1000\cdot 0.59 + 1000\cdot 0.35}{1000+1000}=0.47\)
Because two samples are independent and \(n_{1970}\cdot\hat{p}_{1970}\), \(n_{1970}\cdot\left(1-\hat{p}_{1970}\right)\), \(n_{2005}\cdot\hat{p}_{2005}\), and \(n_{2005}\cdot\left(1-\hat{p}_{2005}\right)\) are all at least \(10\), we can perform a 2-sample proportion z test to test our hypotheses stated above.
\[\begin{align} z &= \frac{\hat{p}_{1970}-\hat{p}_{2005}-0}{\sqrt{\hat{p}_C \cdot \left(1-\hat{p}_C\right)\cdot\left(\frac{1}{n_{1975}} + \frac{1}{n_{2005}}\right)}} \\ &= \frac{.59-.35-0}{\sqrt{0.47\cdot(1-0.47)\cdot\left(\frac{1}{1000}+\frac{1}{1000}\right)}} \\ &= 10.7525 \end{align}\]
Since \(z=10.7525\) is way greater than \(3\), which implies the p value is \(< 0.001\), so the difference is hard to be explained by chance.
n1975 <- n2005 <- 1000
phat1975 <- .59
phat2005 <- .35
# just use the z formula
z.prop = function(x1,x2,n1,n2){
numerator = (x1/n1) - (x2/n2)
p.common = (x1+x2) / (n1+n2)
denominator = sqrt(p.common * (1-p.common) * (1/n1 + 1/n2))
z.prop.ris = numerator / denominator
return(list(p_common = p.common,Diff_Percentage=numerator,
SE=denominator,z_score = z.prop.ris))
}
z.prop(n1975*phat1975, n2005*phat2005, n1975, n2005)## $p_common
## [1] 0.47
##
## $Diff_Percentage
## [1] 0.24
##
## $SE
## [1] 0.02232039
##
## $z_score
## [1] 10.7525
cord.x <- c(-3,seq(-3,3,0.01),3)
# cord.xl <- c(-3,seq(-3,-2,0.01),-2)
cord.y <- c(0,dnorm(seq(-3,3,0.01)),0)
# cord.yl <- c(0,dnorm(seq(-3,-2,0.01)),0)
curve(dnorm(x,0,1),xlim=c(-11,11),
main='Standard Normal',
xlab="z",
ylab="Density")
axis(side=1, at=c(-11:11))
abline(v=z.prop(n1975*phat1975, n2005*phat2005, n1975, n2005)$z_score, col="red")
polygon(cord.x,cord.y,col='skyblue')Also because the z score is large, so it also implies practical significance in the difference.
From the question, we were given the total number of form A and form B are:
\[\begin{align} n_A = 112 + 84 = 196 \\ n_B = 84 + 17 = 101 \\ \end{align}\]
Then for form A and form B, the percentage favored radiation are:
\[\begin{align} \hat{p}_A = \frac{84}{196} \\ \hat{p}_B = \frac{17}{101} \\ \end{align}\]
As the result, we know \(\hat{p}_C = \frac{84 + 17}{196+101}=\frac{101}{297}\)
Now we want to test
\[\begin{align} H_0: p_{A} - p_{B} = 0 \\ H_a: p_{A} - p_{B} \ne 0 \end{align}\]
Because two samples are independent and \(n_{A}\cdot\hat{p}_{A}\), \(n_{A}\cdot\left(1-\hat{p}_{A}\right)\), \(n_{B}\cdot\hat{p}_{B}\), and \(n_{B}\cdot\left(1-\hat{p}_{B}\right)\) are all at least \(10\), we can perform a 2-sample proportion z test to test our hypotheses stated above.
\[\begin{align} z &= \frac{\hat{p}_{A}-\hat{p}_{B}-0}{\sqrt{\hat{p}_C \cdot \left(1-\hat{p}_C\right)\cdot\left(\frac{1}{n_{A}} + \frac{1}{n_{B}}\right)}} \\ &= \frac{\frac{84}{196}-\frac{17}{101}-0}{\sqrt{\frac{101}{297}\cdot(1-\frac{101}{297})\cdot\left(\frac{1}{196}+\frac{1}{101}\right)}} \\ &= 4.485147 \end{align}\]
Since \(z=4.485147\) is way greater than \(3\), which implies the p value is \(< 0.01\), so the difference is hard to be explained by chance.
## $p_common
## [1] 0.3400673
##
## $Diff_Percentage
## [1] 0.2602546
##
## $SE
## [1] 0.05802589
##
## $z_score
## [1] 4.485147
cord.x <- c(-3,seq(-3,3,0.01),3)
# cord.xl <- c(-3,seq(-3,-2,0.01),-2)
cord.y <- c(0,dnorm(seq(-3,3,0.01)),0)
# cord.yl <- c(0,dnorm(seq(-3,-2,0.01)),0)
curve(dnorm(x,0,1),xlim=c(-5,5),
main='Standard Normal',
xlab="z",
ylab="Density")
axis(side=1, at=c(-11:11))
abline(v=z.prop(xa, xb, na, nb)$z_score, col="red")
polygon(cord.x,cord.y,col='skyblue')Note, the solution states that \(z=5\) because the percentages was rounded before calculation of z
Note 2, also in the solution, the percentages were who favored surgery, but the hint in the question implies that we should use the percentage favored radiation