lhs<-read.csv(file =file.choose(), header = TRUE)
mytable<-table(lhs$AGENDER,lhs$AV1GUM)
mytable
##    
##        0    1
##   F 1708  387
##   M 2942  551
mytable2<-matrix(c(387,2095-387,551,3493-551),
     nrow=2,
     ncol=2,
     byrow=TRUE,
     dimnames=list(c("Female", "Male"),
                   c("Used nicotine gum", "Did not use nicotine gum")))
mytable2
##        Used nicotine gum Did not use nicotine gum
## Female               387                     1708
## Male                 551                     2942
mychi.test<-chisq.test(mytable, correct=FALSE)
mychi.test
## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 6.8252, df = 1, p-value = 0.008988
mychi.test$expected
##    
##            0        1
##   F 1743.334 351.6661
##   M 2906.666 586.3339
fisher.test(mytable)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  mytable
## p-value = 0.009644
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.7148097 0.9565290
## sample estimates:
## odds ratio 
##  0.8266191

1.Provide a table AND a visualization for this data. What is the observed count for females who wereusing nicotine gum at the 1st annual visit?

mytable<-table(lhs$AGENDER,lhs$AV1GUM)
mytable
##    
##        0    1
##   F 1708  387
##   M 2942  551
library(ggplot2)

dat<-data.frame(table(lhs$AGENDER, lhs$AV1GUM))
names(dat)<-c("Gender", "AV1GUM", "Count")
ggplot(data = dat, aes(x=Gender, y=Count,fill=AV1GUM)) + geom_bar(stat = "identity") + labs(fill = "Average Nicotine Use In 1st Year")

2. Write what the null and alternative hypotheses are in the context of the question. Is there a relationship between nicotine gum use and Gender? Null Hypothesis = the proportion of females using nicotine gum = proportion of males using nicotine gum phat.female - phat.male = 0

Alternative Hypothesis = the proportion of males using nicotine gum to females using nicotine gum phat.female - phat.male!= 0

3.Choose a significance level and justify why you chose this significance level.

I choose 0.02, i would like to be close enough to reject the hypothesis

What is the test statisticand the degrees of freedom from the Chi-squared test of independence? Test Statistic : 6.8252 Df = 1

What is the resultingp-valuefrom that test? p-value = 0.008988

State your conclusion in the context of this question. If an association was found,consider whether you can make a causal statement about the association and state your conclusionsaccordingly.

The data for which our null hypothesis is true is 0.008988. My significance level is 0.02. I fail to reject the null hypothesis. Meaning i cannot conclude that there is no difference in proportion based on gender between people that use nicotine gum. I cannot conclude that there is a relationship between gender and the users of nicotine gum at One year.

mychi.test<-chisq.test(mytable, correct=FALSE)
mychi.test
## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 6.8252, df = 1, p-value = 0.008988

4.What is the expected count for females who were using nicotine gum at the 1st annual visit? Why arewe interested in the expected counts (think about how this step relates to the null hypothesis and theprocess of testing theories)?

The expexted count is: 1743.334 We are interested because we need to compare our data to the null hypothesis.

expected <-mychi.test$expected
expected
##    
##            0        1
##   F 1743.334 351.6661
##   M 2906.666 586.3339

5.Does your data meet the conditions to use the Chi-square test? Explain why or why not. What isp-value from Fisher’s exact test?

Yes, participants are random sampled. Fisher’s exact test isp value: p-value = 0.009644

fisher<-fisher.test(mytable)
fisher
## 
##  Fisher's Exact Test for Count Data
## 
## data:  mytable
## p-value = 0.009644
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.7148097 0.9565290
## sample estimates:
## odds ratio 
##  0.8266191

CONCEPTUAL QUESTIONS (confidence intervals, sampling distributions andinference for a single proportion)Assume we are discussing the sampling distribution for a sample proportion.

6.What does the sampling distribution show us (the spread of our data or the spread of possible sample statistics)? It shows us the spread of possible sample statistics.

  1. Can we observe the true sampling distribution? Why or why not? No, we would need the full population, and we only have a sample.

  2. What sampling distribution are we interested in when we conduct a hypothesis test? Why is this? P0, It helps us determine wether it is a null hypothesis or not.

9.If the central limit theorem conditions met, are we saying that our data is normal or that the sampling distribution is normal? Sampling distibution is normal.

10.Why do we check the CLT conditions and compute the standard error by plugging inˆpwhen constructingconfidence intervals, but by plugging inp0(the null value) when doing hypothesis testing?

p-hat is used for CI because we are dealing with sample statistic, it si known and we are trying to use it to determine where our true population paramenter lies. For Null hypothesis we use p0 because it is based off the population,and it is unknown so p-hat cannot be used.