Homework7

1.Provide a table AND a visualization for this data. What is the observed count for females who were using nicotine gum at the 1st annual visit? The observed count for females who were using nicotine gum at the 1st annual visit are 387

mytable<-table(lhs$AGENDER,lhs$AV1GUM)
mytable

##    
##        0    1
##   F 1708  387
##   M 2942  551

####Step 1b: Create and save the table of the two variables & view it
mytable2<-matrix(c(387,2095-387,551,3493-551),#specifying the cell values
                 nrow=2,#specifying the number of rows
                 ncol=2,#specifying the number of columns
                 byrow=TRUE,#create the matrix by rows
                 dimnames=list(c("Female", "Male"),
                               c("Used nicotine gum", "Did not use nicotine gum")))
mytable2

##        Used nicotine gum Did not use nicotine gum
## Female               387                     1708
## Male                 551                     2942

#View(mytable2)
barplot(mytable2)

2. Write what the null and alternative hypotheses are in the context of the question.

H0 : There is no difference between male and female who were using nicotine gum and did not use nicotine gum at the 1st annual visit.What we saw in our data was simply due to random chance.

H1: There is a difference between male and females who were using nicotine gum and did not use nicotine gum at the 1st annual visit.

3.Choose a significance level and justify why you chose this significance level. What is the test statistic and the degrees of freedom from the Chi-squared test of independence?

We have a random samples.

We meet our large condition for expected value.

Independence.

The Chi-squared test statistic(X-squared) = 6.8252

The degrees of freedom from the Chi-squared test of independence = 1

What is the resulting p-value from that test? p-value = 0.008988

State your conclusion in the context of this question. If an association was found,consider whether you can make a causal statement about the association and state your conclusions accordingly.

Lets make our significant level = 0.05.

p_value = P(X-squared>6)>10% so we have a strong evidence to reject the null hypothesis.

0.9244194 is small value. It is highly improbable of seeing data like ours or one more extreme if there truly was no relationship between male and female who were using nicotine gum and did not use nicotine gum at the 1st annual visit.

The probability of observing our χ2 statistic or one more extreme if there is difference between male and female who were using nicotine gum and did not use nicotine gum at the 1st annual visit is true, is above our significance level of female who were using nicotine gum . Thus, we have sufficient evidence to conclude there is difference between females who were using nicotine gum and did not use nicotine gum at the 1st annual visit based on our p-value 0.009 < our significant level 0.05 so we reject the null hypothesis.

mychi.test<-chisq.test(mytable, correct=FALSE)
mychi.test

## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 6.8252, df = 1, p-value = 0.008988

#View(mychi.test$expected)
mychi.test$expected

##    
##            0        1
##   F 1743.334 351.6661
##   M 2906.666 586.3339

# computing p_value
pchisq(0.009, df = 1, lower.tail = F)

## [1] 0.9244194

4.What is the expected count for females who were using nicotine gum at the 1st annual visit?

1743.334

Why are we interested in the expected counts (think about how this step relates to the null hypothesis and the process of testing theories)?

The expected counts for each cell are the counts we would see if there was no relationship between the two variables under study. The expected counts based on our data tells us there is there is difference between females who were using nicotine gum and did not use nicotine gum at the 1st annual visit so there is no relationship in our data.

mychi.test$expected

##    
##            0        1
##   F 1743.334 351.6661
##   M 2906.666 586.3339

5.Does your data meet the conditions to use the Chi-square test? Explain why or why not. Yes it meet and the p_value from Chi-square test is almost the same Fisher’s exact test because we have same table as well as same value.

What is p-value from Fisher’s exact test? p-value = 0.009644

fisher.test(mytable)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  mytable
## p-value = 0.009644
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.7148097 0.9565290
## sample estimates:
## odds ratio 
##  0.8266191

6.What does the sampling distribution show us (the spread of our data or the spread of possible sample statistics)? It shows the spread of possible sample statistics because the the sampling distribution shows us the range of possible outcomes that can occur for a population.

7. Can we observe the true sampling distribution? Why or why not?

We can not observe the true sampling distribution, because we only have one s

To assess how sample statistics vary, you need to look at the sampling distribution.

8. What sampling distribution are we interested in when we conduct a hypothesis test? Why is this?

p_hat

9.If the central limit theorem conditions met, are we saying that our data is normal or that the sampling distribution is normal?

Sampling distribution is normal.

10.Why do we check the CLT conditions and compute the standard error by plugging in pˆ when constructing confidence intervals, but by plugging in p0(the null value) when doing hypothesis testing?

p_hat is used to calculate confidence interval and this will be about point estimate.

p0(the null value in hypothesis) is refer actual value in the population and we use it because we want to for the true value.