R Code Week 6

Question 1:

Based on the above, what is our parameter of interest? What would be a point estimate of this parameter of interest?

• lowbirthweight: whether the baby was classified as low birth weight (low) or not (not low) • habit: status of the mother as a nonsmoker or smoker

Our parameter of interest is the difference between p1 (smoker mothers with a low birth weight baby) and p2 (nonsmoker mothers with a low birth weight baby)

Question 2:

Using the data, compute the following:

The sample proportion of babies born with low birth weight among non-smoking women ($\hat{p_{1}}$)

$\hat{p_{1}}$ p.hat1=0.105

The sample proportion of babies born with low birth weight among smoking women ($\hat{p_{2}}$)

$\hat{p_{2}}$ p.hat2=0.142

The point estimate for p_1-p_2, the difference in population proportions of babies born with low birth weight between smoking and non-smoking women.

Point of estimate=-0.037

4)The z* needed for a 90% confidence interval.

z* for 90% is 1.644

##
###select the variables lowbirthweight, habit where habit="smoker
Smoker<-subset(ncbirths,habit=="smoker",select=c(lowbirthweight, habit))
###select the variables lowbirthweight, habit where habit="nonsmoker
nonsmoker<-subset(ncbirths,habit=="nonsmoker",select=c(lowbirthweight, habit))
##count how many obervations are
##The na.omit() function removes the rows in the variable that are missing
n2<-length(na.omit(Smoker$lowbirthweight))
n1<-length(na.omit(nonsmoker$lowbirthweight))

##count how many low birth weight babies are low
k2<-length(which(Smoker$lowbirthweight== "low"))
k1<-length(which(nonsmoker$lowbirthweight== "low"))

##calculate the sample proportion
p.hat1<-k1/n1
p.hat2<-k2/n2
##difference between p1 and p2
p.est<-p.hat1-p.hat2
p.est

## [1] -0.03747341

##calcula z*



##find z para 90%
z<-qnorm(.95)
z

## [1] 1.644854

Question 3:

Check the assumptions for the sampling distribution of ($\hat{p_{1}}$ ) ̂-($\hat{p_{2}}$ ) ̂ to be normal. In other words, check the conditions necessary to construct a confidence interval for p_1-p_2. Recall, these conditions are (1) independence within groups, (1) independence within groups, and (3) success-failure condition in BOTH groups.

1) independence within groups

We assume the mothers who are non-smokers and smokers are random sample, so we can assume attitudes of mothers who are non-smokers in the sample are independent pf each other, and mothers who are smokers are independent of each other.

2)independence within groups

The sampled of mothers who are non-smokers and smokers are independent of each other because we took a random sample.

3) success-failure condition in BOTH groups

Both groups have success-failure condition TRUE

##
##checking the conditions for p1 and p2
##mothers who are non-smoker

NSmokerSus<-(n1*p.hat1)>=10
NSmokerF<-(n1*(1-p.hat1))>=10
NSmokerSus

## [1] TRUE

NSmokerF

## [1] TRUE

##mothers who are smokers
SmokerSus<-(n2*p.hat2)>=10
SmokerF<-(n2*(1-p.hat2))>=10
SmokerSus

## [1] TRUE

SmokerF

## [1] TRUE

Question 4:

Calculate the standard error for the sampling distribution of ($\hat{p_{1}}$) -($\hat{p_{2}}$. Then, compute the 90% confidence interval for p_1-p_2.

The confidence interval is (-0.091, 0.016)

##
## estimate the standard error.
SE=sqrt((p.hat1*(1-p.hat1)/n1)+(p.hat2*(1-p.hat2)/n2))


##Calculate de confidence interval
CI1<-p.est+(z*SE)
CI2<-p.est-(z*SE)
CI1

## [1] 0.01657725

CI2

## [1] -0.09152407

Question 5:

Interpret the confidence interval you computed in Question 5 given the context of the data.

We are 90% confident that the difference in proportions of smoker mothers with a low birth weight baby and non-smoker mothers with a low birth weight baby is between -0.091 and 0.016.

Question 6:

State the null and alternative hypotheses, if we are interested in comparing the proportion of babies born with low birth weight between non-smoking and smoking mothers.

State the hypotheses in words and with statistical notation. Null hypotheses: Proportion of babies whit low birth weight and non-smoker mother is equal to the Proportion of babies whit low birth weight and smoker mother Alternative hypotheses:Proportion of babies whit low birth weight and non-smoker mother is not equal to the Proportion of babies whit low birth weight and smoker mother

$H_{0}$: pSmoker - pNSmoker=0 $H_{A}$: pSmoker - pNSmoker=!0

Why is the null rather than the alternative hypothesis a statement of equality?

The null hypothesis has to be specific.

Question 7:

Compute the pooled proportion of babies born with low birth weight between non-smoking and smoking mothers. Explain why we use a pooled proportion.

We calculate the pooled proportion to test if there is no difference between proportion 1 and proportion 2.

##calculate the pooled proportion
ppooled=(k1+k2)/(n1+n2)
ppooled

## [1] 0.1101101

Question 8:

Using the pooled proportion computed in Question 7, check the conditions necessary to use the normal distribution to perform a hypothesis test. Show all your work.

Both groups have success-failure condition TRUE

##checking the pooled success-failure conditions
##mothers who are non-smoker

PooNSmokSus<-(n1*ppooled)>=10
PooNSmokF<-(n1*(1-ppooled))>=10
PooNSmokSus

## [1] TRUE

PooNSmokF

## [1] TRUE

##mothers who are smokers
PooSmokSus<-(n2*ppooled)>=10
PooSmokF<-(n2*(1-ppooled))>=10
PooSmokSus

## [1] TRUE

PooSmokF

## [1] TRUE

Question 9:

a. Compute the standard error using the pooled proportion computed in Question 7.

SE = 0.0298

b. Calculate your Z-statistic/test statistic.

Z-statistic = -1.256

c. Compute the associated p-value.

p-value = 0.209

d. Report your conclusion from the hypothesis test based on the given significance level above and include the confidence interval and p-value. State your conclusion in the context of the data.

The p-value 0.209 means that there is a 20.9% chance of seeing our observed sample statistic or one more extreme if there truly was no difference between the two groups. At α = .01, we fail to reject the null hypothesis, and conclude there is no evidence in our data to suggest the proportion of babies whit low birth weight and non-smoker mother differs from the proportion of babies whit low birth weight and smoker mother.

e. Define what the p-value means in context. p-value in the context means the chances of seeing samples statistic with no difference in proportions between the 2 groups.

P-value a description of the strength of the evidence against the null hypothesis and in support of the alternative hypothesis.

##Calculating the standard error - pooled proportion
SEPoo=sqrt((ppooled*(1-ppooled)/n1)+(ppooled*(1-ppooled)/n2))

##calculating the z-statistic or test statistic
zest<-(p.est-0)/SEPoo
zest

## [1] -1.256178

##calculating the P-value
PValue<-2*(pnorm(zest,mean=0, sd=1))
PValue

## [1] 0.2090515

##A P-value less than the significance level means we reject the null hypothesis

##Calculate de confidence interval
CIPoo1<-PValue+(zest*SEPoo)
CIPoo2<-PValue-(zest*SEPoo)
CIPoo1

## [1] 0.1715781

CIPoo2

## [1] 0.2465249

Question 10:

Provide an appropriate visualization for your data. (Look at the Week 2 slides).

EXTRA CREDIT (2 points): Use the ggplot2() or plot_ly R packages to create visualizations. You will need to look up how to do this (you may refer to the R demo posted in the Week 3 module).

##Visualization of quantity - Low birth weight babies x smoker/non-smoker mother

library(plotly)

Mothers <- c("Smoker mother", "Non-Smoker mother")
NWeith <- c((n2-k2)/n2*100, 
            (n1-k1)/n1*100)
LWeith <- c(k2/n2*100, 
            k1/n1*100)
data <- data.frame(Mothers, NWeith, LWeith)

fig <- plot_ly(data, x = ~Mothers, y = ~NWeith, type = 'bar', name = 'Babies-Normal Weith')

fig <- fig %>% add_trace(y = ~LWeith, name = 'Babies-Low weith')

fig <- fig %>% layout(yaxis = list(title = '%'), barmode = 'stack')

fig <- fig %>% layout(title = "% Low birth weight babies x smoker/non-smoker mother",
         xaxis = list(title = ""),
         yaxis = list(title = ""))
##fig <- plot_ly(data, x = ~x, y = ~y, type = 'bar',
##             text = y, textposition = 'auto',
##             marker = list(color = 'rgb(158,202,225)',
##                           line = list(color = 'rgb(8,48,107)', width = 1.5)))

fig

library(plotly)

x <- c("Smoker mother", "Non-Smoker mother")
y <- c(format((n2-k2),digits=2), format((n1-k1), digits=2))
y2 <- c(k2, k1)
text <- c('Babies-Normal Weith', 'Babies-Low weith')
data <- data.frame(x, y, y2, text)

fig <- data %>% plot_ly()
fig <- fig %>% add_trace(x = ~x, y = ~y, type = 'bar',name = 'Babies-Normal weith',
             text = y, textposition = 'auto',
             marker = list(color = 'rgb(158,245,225)',
                           line = list(color = 'rgb(8,48,107)', width = 1.5)))
fig <- fig %>% add_trace(x = ~x, y = ~y2, type = 'bar',
            text = y2, textposition = 'auto',name = 'Babies-Low Weith',
            marker = list(color = 'rgb(58,200,225)',
                          line = list(color = 'rgb(8,48,107)', width = 1.5)))
fig <- fig %>% layout(title = "Low birth weight babies x smoker/non-smoker mother",
         barmode = 'group',
         xaxis = list(title = ""),
         yaxis = list(title = ""))

fig

Question 11:

Exercise 6.19 in the OpenIntro 4rth edition textbook (page 225).

False - 2% lower to 6% higher doesn’t malke sence.
True
True
True
False - it just inverted the p female with p male.

R Code Week 6

Andreya Kuerten

3/18/2021