Question 1:
Based on the above, what is our parameter of interest? What would be a point estimate of this parameter of interest?
• lowbirthweight: whether the baby was classified as low birth weight (low) or not (not low) • habit: status of the mother as a nonsmoker or smoker
Our parameter of interest is the difference between p1 (smoker mothers with a low birth weight baby) and p2 (nonsmoker mothers with a low birth weight baby)
Question 2:
Using the data, compute the following:
$\hat{p_{1}}$ p.hat1=0.105
$\hat{p_{2}}$ p.hat2=0.142
Point of estimate=-0.037
4)The z* needed for a 90% confidence interval.
z* for 90% is 1.644
##
###select the variables lowbirthweight, habit where habit="smoker
Smoker<-subset(ncbirths,habit=="smoker",select=c(lowbirthweight, habit))
###select the variables lowbirthweight, habit where habit="nonsmoker
nonsmoker<-subset(ncbirths,habit=="nonsmoker",select=c(lowbirthweight, habit))
##count how many obervations are
##The na.omit() function removes the rows in the variable that are missing
n2<-length(na.omit(Smoker$lowbirthweight))
n1<-length(na.omit(nonsmoker$lowbirthweight))
##count how many low birth weight babies are low
k2<-length(which(Smoker$lowbirthweight== "low"))
k1<-length(which(nonsmoker$lowbirthweight== "low"))
##calculate the sample proportion
p.hat1<-k1/n1
p.hat2<-k2/n2
##difference between p1 and p2
p.est<-p.hat1-p.hat2
p.est
## [1] -0.03747341
##calcula z*
##find z para 90%
z<-qnorm(.95)
z
## [1] 1.644854
Question 3:
Check the assumptions for the sampling distribution of (\(\hat{p_{1}}\) ) ̂-(\(\hat{p_{2}}\) ) ̂ to be normal. In other words, check the conditions necessary to construct a confidence interval for p_1-p_2. Recall, these conditions are (1) independence within groups, (1) independence within groups, and (3) success-failure condition in BOTH groups.
1) independence within groups
We assume the mothers who are non-smokers and smokers are random sample, so we can assume attitudes of mothers who are non-smokers in the sample are independent pf each other, and mothers who are smokers are independent of each other.
2)independence within groups
The sampled of mothers who are non-smokers and smokers are independent of each other because we took a random sample.
3) success-failure condition in BOTH groups
Both groups have success-failure condition TRUE
##
##checking the conditions for p1 and p2
##mothers who are non-smoker
NSmokerSus<-(n1*p.hat1)>=10
NSmokerF<-(n1*(1-p.hat1))>=10
NSmokerSus
## [1] TRUE
NSmokerF
## [1] TRUE
##mothers who are smokers
SmokerSus<-(n2*p.hat2)>=10
SmokerF<-(n2*(1-p.hat2))>=10
SmokerSus
## [1] TRUE
SmokerF
## [1] TRUE
Question 4:
Calculate the standard error for the sampling distribution of (\(\hat{p_{1}}\)) -(\(\hat{p_{2}}\). Then, compute the 90% confidence interval for p_1-p_2.
The confidence interval is (-0.091, 0.016)
##
## estimate the standard error.
SE=sqrt((p.hat1*(1-p.hat1)/n1)+(p.hat2*(1-p.hat2)/n2))
##Calculate de confidence interval
CI1<-p.est+(z*SE)
CI2<-p.est-(z*SE)
CI1
## [1] 0.01657725
CI2
## [1] -0.09152407
Question 5:
Interpret the confidence interval you computed in Question 5 given the context of the data.
We are 90% confident that the difference in proportions of smoker mothers with a low birth weight baby and non-smoker mothers with a low birth weight baby is between -0.091 and 0.016.
Question 6:
State the null and alternative hypotheses, if we are interested in comparing the proportion of babies born with low birth weight between non-smoking and smoking mothers.
\(H_{0}\): pSmoker - pNSmoker=0 \(H_{A}\): pSmoker - pNSmoker=!0
Why is the null rather than the alternative hypothesis a statement of equality?
The null hypothesis has to be specific.
Question 7:
Compute the pooled proportion of babies born with low birth weight between non-smoking and smoking mothers. Explain why we use a pooled proportion.
We calculate the pooled proportion to test if there is no difference between proportion 1 and proportion 2.
##calculate the pooled proportion
ppooled=(k1+k2)/(n1+n2)
ppooled
## [1] 0.1101101
Question 8:
Using the pooled proportion computed in Question 7, check the conditions necessary to use the normal distribution to perform a hypothesis test. Show all your work.
Both groups have success-failure condition TRUE
##checking the pooled success-failure conditions
##mothers who are non-smoker
PooNSmokSus<-(n1*ppooled)>=10
PooNSmokF<-(n1*(1-ppooled))>=10
PooNSmokSus
## [1] TRUE
PooNSmokF
## [1] TRUE
##mothers who are smokers
PooSmokSus<-(n2*ppooled)>=10
PooSmokF<-(n2*(1-ppooled))>=10
PooSmokSus
## [1] TRUE
PooSmokF
## [1] TRUE
Question 9:
a. Compute the standard error using the pooled proportion computed in Question 7.
SE = 0.0298
b. Calculate your Z-statistic/test statistic.
Z-statistic = -1.256
c. Compute the associated p-value.
p-value = 0.209
d. Report your conclusion from the hypothesis test based on the given significance level above and include the confidence interval and p-value. State your conclusion in the context of the data.
The p-value 0.209 means that there is a 20.9% chance of seeing our observed sample statistic or one more extreme if there truly was no difference between the two groups. At α = .01, we fail to reject the null hypothesis, and conclude there is no evidence in our data to suggest the proportion of babies whit low birth weight and non-smoker mother differs from the proportion of babies whit low birth weight and smoker mother.
e. Define what the p-value means in context. p-value in the context means the chances of seeing samples statistic with no difference in proportions between the 2 groups.
P-value a description of the strength of the evidence against the null hypothesis and in support of the alternative hypothesis.
##Calculating the standard error - pooled proportion
SEPoo=sqrt((ppooled*(1-ppooled)/n1)+(ppooled*(1-ppooled)/n2))
##calculating the z-statistic or test statistic
zest<-(p.est-0)/SEPoo
zest
## [1] -1.256178
##calculating the P-value
PValue<-2*(pnorm(zest,mean=0, sd=1))
PValue
## [1] 0.2090515
##A P-value less than the significance level means we reject the null hypothesis
##Calculate de confidence interval
CIPoo1<-PValue+(zest*SEPoo)
CIPoo2<-PValue-(zest*SEPoo)
CIPoo1
## [1] 0.1715781
CIPoo2
## [1] 0.2465249
Question 10:
Provide an appropriate visualization for your data. (Look at the Week 2 slides).
EXTRA CREDIT (2 points): Use the ggplot2() or plot_ly R packages to create visualizations. You will need to look up how to do this (you may refer to the R demo posted in the Week 3 module).
##Visualization of quantity - Low birth weight babies x smoker/non-smoker mother
library(plotly)
Mothers <- c("Smoker mother", "Non-Smoker mother")
NWeith <- c((n2-k2)/n2*100,
(n1-k1)/n1*100)
LWeith <- c(k2/n2*100,
k1/n1*100)
data <- data.frame(Mothers, NWeith, LWeith)
fig <- plot_ly(data, x = ~Mothers, y = ~NWeith, type = 'bar', name = 'Babies-Normal Weith')
fig <- fig %>% add_trace(y = ~LWeith, name = 'Babies-Low weith')
fig <- fig %>% layout(yaxis = list(title = '%'), barmode = 'stack')
fig <- fig %>% layout(title = "% Low birth weight babies x smoker/non-smoker mother",
xaxis = list(title = ""),
yaxis = list(title = ""))
##fig <- plot_ly(data, x = ~x, y = ~y, type = 'bar',
## text = y, textposition = 'auto',
## marker = list(color = 'rgb(158,202,225)',
## line = list(color = 'rgb(8,48,107)', width = 1.5)))
fig
library(plotly)
x <- c("Smoker mother", "Non-Smoker mother")
y <- c(format((n2-k2),digits=2), format((n1-k1), digits=2))
y2 <- c(k2, k1)
text <- c('Babies-Normal Weith', 'Babies-Low weith')
data <- data.frame(x, y, y2, text)
fig <- data %>% plot_ly()
fig <- fig %>% add_trace(x = ~x, y = ~y, type = 'bar',name = 'Babies-Normal weith',
text = y, textposition = 'auto',
marker = list(color = 'rgb(158,245,225)',
line = list(color = 'rgb(8,48,107)', width = 1.5)))
fig <- fig %>% add_trace(x = ~x, y = ~y2, type = 'bar',
text = y2, textposition = 'auto',name = 'Babies-Low Weith',
marker = list(color = 'rgb(58,200,225)',
line = list(color = 'rgb(8,48,107)', width = 1.5)))
fig <- fig %>% layout(title = "Low birth weight babies x smoker/non-smoker mother",
barmode = 'group',
xaxis = list(title = ""),
yaxis = list(title = ""))
fig
Question 11:
Exercise 6.19 in the OpenIntro 4rth edition textbook (page 225).
False - 2% lower to 6% higher doesn’t malke sence.
True
True
True
False - it just inverted the p female with p male.