INFERENCE ON PROPORTIONS
- Suppose we are interested in investigating the cognitive abilities of children weighing less than 1500 grams at birth. Although their birth weights are extremely low, many of these children exhibit normal growth patterns during the first year of life. A small group does not. These children suffer from perinatal growth failure, a condition that prevents them from developing properly. One indicator of perinatal growth failure is that during the first several months of life, the infant has a head circumference measurement that is far below normal. We would like to examine the relationship between perinatal growth failure and subsequent cognitive ability. In particular, we wish to estimate the proportion of children suffering from this condition who, when they reach 8 years of age, have intelligence quotient (IQ) scores that are below 70. In the general population, IQ scores are scaled to have mean 100; a score less than 70 suggests a deficiency in cognitive ability. To estimate the proportion of children with IQs in this range, a random sample of 33 infants with perinatal growth failure was chosen. At the age of 8, eight children have scores below 70.
- Find a point estimate for the true proportion p.
p=8/33=0.24
- Construct and interpret a 99% confidence interval for p after first checking that the normal approximation is appropriate.
n(1-p) > 10 - 33(1-.242) = 25.014 >10
np > 10 - 33(.242) = 8 > 5
npq= (33)(0.24)(0.76) = 6.08 >5
Normal approximation is appropriate for this usage.
99% CI: p +/- 2.575√ [ p(1−p) 𝑛𝑛 ] .242 +/- 2.575 √ [ .242(1−.242) 33 ]
(0.050,0.434)
- Although we do not know the true value of p for this population, we do know that 3.2% of the children who exhibited normal growth in the perinatal period have IQ scores below 70 when they reach school age. We would like to know whether this is also true of the children who suffered from perinatal growth failure. Since we are concerned with deviations that could occur in either direction, conduct a two-sided test at the 0.01 level of significance.
See below
- What are your null and alternative hypothesis?
Null: p = p0=0.032
Alternative: p ≠ p0= 0.032
- What is the value of your test statistic?
Z= (0.242− .032) √[.032(1−.032)]/ 33
Z= .21/.0306
Z= 6.85
- What is the p-value of your test
2*pnorm(6.789, lower.tail=F)
## [1] 1.129134e-11
- What do you conclude?
We can conclude that the proportion of children with perinatal growth failure who at age 8 have an IQ below 70 is different from children with normal growth in the perinatal period.
INFERENCE FOR CONTINGENCY TABLES (2 TABLES)
- Use the chi-square test to evaluate the null hypothesis that the population proportions of students who drove while drinking are the same in the two calendar years.
p1 = the proportions of students who drove while drinking in 1984
p2 = the proportions of students who drove while drinking in 1987
The null and alternative hypothesis are: H0: p1 = p2 HA: p1 does not equal p2
According to the observed table, O11 = 1250, O12 = 1387, O21 = 991, O22 =1666 n1 = 2637, n2 = 2657 m1 = 2241, m2 = 3053, N = 5294
The value in the expected table would be: E11 = (2637 x 2241)/5294 = 1116 E12 = (2637 x 3053)/5294 = 1521 E21 = (2657 x 2241)/5294 = 1125 E22 = (2657 x 3053)/5294 = 1532 ∴ X^2 = (1250-1116)2 /1116 + (1387-1521)2 /1521 +(991-1125)2 /1125 + (1666- 1532)2 /1532 = 55.5765
The p-value is
pchisq(55.58, 1, lower.tail=F)
## [1] 8.973193e-14
X^2 has 1 degree of freedom. The 95% cutoff is 3.84. Seeing as 55.5765 > 3.84, and p-value < 0.05 we can reject the null hypothesis.
There is a difference between the proportion of students who drove while drinking in 1984 and 1987.
- What is the null and alternative Hypothesis
H0: p1 = p2
HA: p1 does not equal p2
- Calculate the test statistic
X^2 = (1250-1116)2 /1116 + (1387-1521)2 /1521 +(991-1125)2 /1125 + (1666- 1532)2 /1532 = 55.5765
- Report the p-value
pchisq(55.58, 1, lower.tail=F)
## [1] 8.973193e-14
- What do you conclude about the behavior of college students?
There is a difference between the proportion of students who drove while drinking in 1984 and 1987.
- Again test the null hypothesis that the proportions of students who drove while drinking are identical for the two calendar years. This time, use the method based on the normal approximation to the binomial distribution that was presented in Section 14.6. Do you reach the same conclusion?
Ho= the proportion of drunk drivers was the same for both calendar years
Ha= the proportion of drunk drivers was not the same for both calendar years
1984 - 1250/2637 = .474 1987 - 991/2657 = .373
p = (n1p1+n2p2)/(n1+n2) = (1250 + 991)/5294 = 0.423
q = 0.577
Z = (.474−.373)/√0.423∗(0.577) [( 1/2637 ) + ( 1/2657 )] = 547.68
The area under the standard normal curve to the right of the 547.68 is less than 0.0001, Therefore we can reject the null hypothesis. Therefore the same conclusion is drawn, there is a difference between the population proportions of students who drove while drinking in 1984, and in 1987.
- Construct a 95% confidence interval for the true difference in population proportions.
0.474 - 0.373 +/- 1.96√ [ 0.474(1−0.474)/2637 + 0.373(1−0.373)/2657 ]
= (.075, 0.127)
- Does the 95% confidence interval contain the value 0? Would you have expected that it would?
The 95% confidence interval does not contain 0. And yes this would have been expected.
- In France, a study was conducted to investigate potential risk factors for ectopic pregnancy. Of the 279 women who had experienced ectopic pregnancy, 28 had suffered from pelvic inflammatory disease. Of the 279 women who had not, 6 had suffered from pelvic inflammatory disease.
- Construct a 2 × 2 contingency table for these data.
See Attached excel document
- We are interested in examining data from a study (discussed in the text) that investigates the effect of carbon monoxide exposure on patients with coronary artery disease, where baseline measurements of pulmonary function were examined across medical centers. Another characteristic that you might wish to investigate is age. The relevant measurements are saved on the course website in a data set called cad.dta. Values of age are saved under the variable name age and indicators of center are saved under center.
- What is the overall mean age (i.e., What is x ?)
x = n1x1 +n2x2 +n3x3/ n1+n2+n3
= [22(62.55) + 18(63.28) + 23 (60.83)]/ (22+18+23)
= 62.13
- How many centers (groups)?
k = 3, therefore there are 3 centers.
- What is the mean age and size of each group? tab center, sum(age)
Group one mean: 62.55 size =22
Group two mean: 63.28 size =18
Group three mean: 60.83 size= 23
- Examine the histogram and boxplots of age for each center. Why is ANOVA an appropriate method for analyzing this data?
cad <- read.csv("cad.csv")
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.3
ggplot(cad, aes(x=age)) + geom_histogram(fill = "blue", colour = "green") + facet_grid(center ~.)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#box plot
age <- cad$age
center <- cad$center
boxplot(age ~ center)

Anova is good at analyzing this data set because there are more than 2 groups being compared. This analysis helps us determine if there is any significant difference between the means of the groups.
- What are the null and alternative hypotheses? What is alpha?
Ho: μ1= μ2= μ3
Ha: At least one of the means of the 3 groups is different
𝑎= 0.05/( 3 2) = 0.017
- Calculate the estimate of the within-group variance? BY HAND/R
= [(22−1)∗ 8.67^2]+[(18−1)∗ 7.79^2]+[(23−1)∗ 8.00^2]/ (22+18+23-3) = 66.97
- Calculate the estimate of the between-groups variance? BY HAND/R
[(62.55−62.13)^2 22]+[(63.28−62.13)^2 18] +[(60.83−62.13)^2 23]/ (3−1) = 33.3
- Calculate the value of the F-test statistic? BY HAND/R
F= s2B/S2W
F= 33.28/66.97 =.4969
- What distribution does the test statistic follow? Give the degrees of freedom for an F distribution
K=3
n=22+18+23=63
F distribution 3-1=2 and 63-3= 60 degrees of freedom
F(2,60)
At .01 level of significance, we have an F value of 4.98. Therefore P>0.01 and the test statistic follows the F distribution. 2 and 60 degrees of freedom.
- What is the p-value of the test?
pf(.4969,2,60,lower.tail=FALSE)
## [1] 0.6108953
P=.061
- Draw a conclusion for the test
The null hypothesis cannot be rejected. Therefore the mean ages of the patients in the various three centers are the same.
- Confirm your answers in R. Show the code and the corresponding output please
See above