table(ncbirths$lowbirthweight,ncbirths$habit)
##
## nonsmoker smoker
## low 92 18
## not low 781 108
Question 1:
Based on the above, what is our parameter of interest? What would be a point estimate of this parameter of interest?
Parameter of Interest = Birth weight among smoking mothers vs non-smoking mothers.
Point Estimate = Low Birth Weight of non-smoking monthers/Total - ow Birth Weight of smoking mothers/ total
Compute a 90% confidence interval for the difference in proportion of babies born with low birth weight between non-smoking mothers and smoking mothers ***
prop.test(table(ncbirths$habit, ncbirths$lowbirthweight),conf.level = .90, correct=FALSE)
##
## 2-sample test for equality of proportions without continuity
## correction
##
## data: table(ncbirths$habit, ncbirths$lowbirthweight)
## X-squared = 1.578, df = 1, p-value = 0.2091
## alternative hypothesis: two.sided
## 90 percent confidence interval:
## -0.09152407 0.01657725
## sample estimates:
## prop 1 prop 2
## 0.1053837 0.1428571
Question 2:
Using the data, compute the following:
p1 = 92/873
p1
## [1] 0.1053837
p2 = 18/126
p2
## [1] 0.1428571
p_hat = p1 - p2
p_hat
P Hat = [1] -0.03747341
z = qnorm(.95, 0, 1)
z
Z Score = [1] 1.644854
Question 3:
Check the assumptions for the sampling distribution of \(\hat{p_{1}}\) - \(\hat{p_{2}}\) to be normal. In other words, check the conditions necessary to construct a confidence interval for \(p_{1}\) - \(p_{2}\). Recall, these conditions are (1) independence within groups, (2) independence between groups, and (3) success-failure condition in BOTH groups.
Yes, Its a random sample
Yes, Its a random sample
Yes, \(n_{1}\)\(\hat{p_{1}}\)\(\ge\) 10 and \(n_{1}\)(1-\(\hat{p_{1}}\))\(\ge\) 10 AND \(n_{2}\)\(\hat{p_{2}}\)\(\ge\) 10 and \(n_{2}\)(1-\(\hat{p_{2}}\))\(\ge\) 10
Sample 1 92 \(\ge\) 10 and 781 \(\ge\) 10 Sample 2 18 \(\ge\) 10 and 108 \(\ge\) 10
Question 4:
Calculate the standard error for the sampling distribution of \(\hat{p_{1}}\) - \(\hat{p_{2}}\) . Then, compute the 90% confidence interval for \(p_{1}\) - \(p_{2}\).
se = sqrt((p1*(1-p1)/873) + (p2*(1-p2)/126))
se
Z Score = [1] 0.03286047
p_hat + z * se
Z Score = [1] 0.01657725
p_hat - z * se
Z Score = [1] -0.09152407
Question 5:
Interpret the confidence interval you computed in Question 5 given the context of the data.
Now suppose we’d like to formally test if there is a difference between the proportion of babies born with low birth weight to non-smoking and smoking mothers. We will conduct our hypothesis test using a significance level of \(\alpha\) = 0.1.
Question 6:
State the null and alternative hypotheses, if we are interested in comparing the proportion of babies born with low birth weight between non-smoking and smoking mothers.
Question 7:
Compute the pooled proportion of babies born with low birth weight between non-smoking and smoking mothers. Explain why we use a pooled proportion.
pooled = ((18+92)/(126+873))
pooled
## [1] 0.1101101
Question 8:
Using the pooled proportion computed in Question 7, check the conditions necessary to use the normal distribution to perform a hypothesis test. Show all your work.
pooled * 781
## [1] 85.996
(1- pooled) * 781
## [1] 695.004
pooled * 108
## [1] 11.89189
(1- pooled) * 108
## [1] 96.10811
Question 9:
se1 = sqrt((pooled*(1-pooled)/873) + (pooled*(1-pooled)/126))
se1
## [1] 0.02983129
z1 = (p_hat - 0)/se1
z1
## [1] -1.256178
pvalue = pnorm(z1, mean = 0, sd = 1)
pvalue
## [1] 0.1045258
Question 10:
Provide an appropriate visualization for your data. (Look at the Week 2 slides).
EXTRA CREDIT (2 points): Use the ggplot2() or plot_ly R packages to create visualizations. You will need to look up how to do this (you may refer to the R demo posted in the Week 3 module).
ANSWER
ggplot(data = ncbirths, aes(habit, lowbirthweight)) +
geom_col() +
facet_grid(~lowbirthweight)
Question 11:
Exercise 6.19 in the OpenIntro 4rth edition textbook (page 225).