Question 1: What is our parameter of interest? Answer: The parameter of interest is the difference in proportions of babies born with low birth weight by mothers who are non-smokers (p1) - proportion of of babies born with low birth weight by mothers who are smokers (p2).
What would be a point estimate of this parameter of interest? Answer: \(\hat{p_{1}}\) - \(\hat{p_{2}}\) = 0.105 - 0.142 = -0.037 Point Estimate is -0.037
p.hat = \(\hat{p_{1}}\)
load("C:/Users/Maisey/Downloads/ncbirths.rda")
summary(ncbirths$lowbirthweight)
## low not low
## 111 889
summary(ncbirths$habit)
## nonsmoker smoker NA's
## 873 126 1
table(ncbirths$lowbirthweight, ncbirths$habit)
##
## nonsmoker smoker
## low 92 18
## not low 781 108
table(ncbirths$habit, ncbirths$lowbirthweight)
##
## low not low
## nonsmoker 92 781
## smoker 18 108
Question 2: Using the data, compute the following: A) The sample proportion of babies born with low birth weight among non-smoking women ((p_1 ) ̂)
Answer: 92/873 = 0.105
The sample proportion of babies born with low birth weight among smoking women ((p_2 ) ̂) Answer: 18/126 = 0.142
The point estimate for p_1-p_2, the difference in population proportions of babies born with low birth weight between smoking and non-smoking women. Answer: 0.105-0.142 = -0.037
The z* needed for a 90% confidence interval. Answer: (-0.037-0)\0.144 = -0.260
2*pnorm(-0.010, mean = 0, sd = 1)
## [1] 0.9920213
Question 3: Check the assumptions for the sampling distribution of (p_1 ) ̂-(p_2 ) ̂ to be normal. In other words, check the conditions necessary to construct a confidence interval for p_1-p_2. Recall, these conditions are (1) independence within groups, (2) independence between groups, and (3) success-failure condition in BOTH groups.
Answer: p.hat pooled = (92+18)/(873+126) = 0.110 P1 = Success: 873(92/873) = 92 > 10 [TRUE]; Failure:873(1-(92/873)) = 781 > 10 [TRUE] P2 = Success: 126(18/126) = 18 > 10 [TRUE]; Failure: 126(1-(18/126)) = 108 > 10 [TRUE]
Question 4: Calculate the standard error for the sampling distribution of (p_1 ) ̂-(p_2 ) ̂. Then, compute the 90% confidence interval for p_1-p_2.
Answer: P.hat1-p.hat2 +- Z * SE = (-0.037) +- (-0.260) * 0.144 = (-0.043, 0.032)
Question 5: Interpret the confidence interval you computed in Question 4 given the context of the data.
Answer: We are 90% confident that the difference in proportion of babies of low birth weight born to mothers who are non-smokers and mothers who are smokers is between -0.043 and 0.032.
Question 6: State the null and alternative hypotheses, if we are interested in comparing the proportion of babies born with low birth weight between non-smoking and smoking mothers.
State the hypotheses in words and with statistical notation. Answer: H-naught:p1-p2 = 0, H-A:p1-p2 ≠ 0; For the null hypothesis, all the babies that are born with low birth weight, there is no difference in proportions between non-smoking and smoking mothers. With the alternative hypothesis, babies born with low birth weight, there is a difference in proportion between non-smoking and smoking mothers.
Why is the null rather than the alternative hypothesis a statement of equality? Answer: The null is a statement about the population and is shown to be incorrect beyond a reasonable doubt. An alternative hypothesis is a claim about the population. Based on probability laws, we can only talk in terms of absolute certainties.
Question 7: Compute the pooled proportion of babies born with low birth weight between non-smoking and smoking mothers. Explain why we use a pooled proportion.
Answer: p.hat pooled = (92+18)/(873+126) = 0.110; We use a pooled proportion in the z-test for two proportions to construct an estimate for both population proportions. For a hypothesis test, we use this to estimate the standard error. Then, we could use the standard error to calculate the z-test statistic.
Question 8: Using the pooled proportion computed in Question 7, check the conditions necessary to use the normal distribution to perform a hypothesis test. Show all your work.
Answer: p.hat pooled = (92+18)/(873+126) = 0.110 P1 = Success: 0.110873 = 96 > 10 [TRUE]; Failure: (1-0.110)873 = 777 > 10 [TRUE] P2 = Success: 0.110126 = 14 > 10 [TRUE]; Failure: (1-0.110)126 = 112 > 10 [TRUE] Point Estimate = -0.037, SE = 0.144, Z-score = -0.26 P-Value = 0.795
Question 9: a. Compute the standard error using the pooled proportion computed in Question 7.
Answer: p.hat pooled = 0.110 SE = p.pooled(1-p.pooled)/873 + p.pooled(1-p.pooled)/126 = 0.144
Answer: Z-score = point estimate - null/SE = -0.037-0/0.144 = -0.26
Answer: p-value = 0.795 #See code chunk below
Answer: P-Value is 0.795. There is a 80% chance of seeing the observed sample statistic or one more extreme if there truly was no difference between the two groups.
Answer: At significance level of 0.1, we fail to reject the null hypothesis and conclude there is no evidence in our data to suggest the proportion of babies with a low birth weight born by mothers that are non-smokers differs from those born by mothers that are smokers.
2*pnorm(-0.26, mean= 0, sd = 1)
## [1] 0.7948638
Question 10: Provide an appropriate visualization for your data. (Look at the Week 2 slides).
table(ncbirths$habit, ncbirths$lowbirthweight)
##
## low not low
## nonsmoker 92 781
## smoker 18 108
barplot(table(ncbirths$habit, ncbirths$lowbirthweight))
Question 11: Exercise 6.19 in the OpenIntro 4rth edition textbook (page 225).
6.19 Gender and color preference. A study asked 1,924 male and 3,666 female undergraduate college students their favorite color. A 95% confidence interval for the difference between the proportions of males and females whose favorite color is black (pmale pfemale) was calculated to be (0.02, 0.06). Based on this information, determine if the following statements are true or false, and explain your reasoning for each statement you identify as false.
We are 95% confident that the true proportion of males whose favorite color is black is 2% lower to 6% higher than the true proportion of females whose favorite color is black.
We are 95% confident that the true proportion of males whose favorite color is black is 2% to 6% higher than the true proportion of females whose favorite color is black.
95% of random samples will produce 95% confidence intervals that include the true difference between the population proportions of males and females whose favorite color is black.
We can conclude that there is a significant difference between the proportions of males and females whose favorite color is black and that the difference between the two sample proportions is too large to plausibly be due to chance.
The 95% confidence interval for (pfemale-pmale) cannot be calculated with only the information given in this exercise.
Answer: a) False - the confidence interval contains no negative values. b) True c) True d) True e) False - this statement did not change anything, it just re-ordered the values from the original values.