Chi-Squared Test and Fisher’s Exact Test in R

The goal of this assignment is to learn how to • obtain observed counts and expected counts in a 2x2 table, and • obtain inferential results for comparing two categorical variables (e.g., Chi-squared test of independence).You can embed an R code chunk like this:

Question 1

Provide a table AND a visualization for this data. What is the observed count for females who were using nicotine gum at the 1st annual visit?

Used nicotine gum Did not use nicotine gum

Female 387 1708

Male 551 2942

The females who were using nicotine gum at the 1st annual visit is 387

##
####Step 1a: Create and save the table of the two variables & view it
mytable<-table(lhs$AGENDER,lhs$AV1GUM)
####View the table that you just created
mytable
##    
##        0    1
##   F 1708  387
##   M 2942  551
####Step 1b: Create and save the table of the two variables & view it
mytable2<-matrix(
c(387,2095-387,551,3493-551), #specifying the cell values
nrow=2, #specifying the number of rows
ncol=2, #specifying the number of columns
byrow=TRUE, #create the matrix by rows
dimnames=list(c("Female", "Male"),
c("Used nicotine gum", "Did not use nicotine gum")))
####View the table that you just created
mytable2
##        Used nicotine gum Did not use nicotine gum
## Female               387                     1708
## Male                 551                     2942

Question 2

Write what the null and alternative hypotheses are in the context of the question.

Question 3

Choose a significance level and justify why you chose this significance level.

What is the test statistic and the degrees of freedom from the Chi-squared test of independence?

What is the resulting p-value from that test?

State your conclusion in the context of this question. If an association was found, consider whether you can make a causal statement about the association and state your conclusions accordingly.

## Steps for Chi-Sqaured Test

##Step 1Prepare: Create your two-way table. Choose your significance level. Define your hypotheses.
##significant level is .01
## table is mytable
## hypotesis are
##$H_{0}$: pMales - pFemales=0 
##$H_{A}$: pMales - pFemales=!0

##Step 2: Check: Check the assumptions. You will need to compute the expected counts ##here.Independence and Expected counts all ≥5
##Sus1<-mychi.test$expected [1]>=5
##Sus2<-mychi.test$expected [2]>=5
##Sus3<-mychi.test$expected [3]>=5
##Sus4<-mychi.test$expected [4]>=5

##Step 3: Calculate the chi-squared test statistic. Compute the associated p-value. Compare the p-value to the significance level.

mychi.test<-chisq.test(mytable, correct=FALSE)
####View the test results
mychi.test
## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 6.8252, df = 1, p-value = 0.008988
##to see what values are saved in the chisq.test
names(mychi.test)
## [1] "statistic" "parameter" "p.value"   "method"    "data.name" "observed" 
## [7] "expected"  "residuals" "stdres"
##To obtain expected counts for the table. 
mychi.test$p.expected
## NULL
#p-value
mychi.test$p.value
## [1] 0.008988105
##Step 4: Make a conclusion based on the p-value and significance level. State your conclusion in the context of the data. 

Question 4:

What is the expected count for females who were using nicotine gum at the 1st annual visit?

Why are we interested in the expected counts (think about how this step relates to the null hypothesis and the process of testing theories)?

Question 5

Does your data meet the conditions to use the Chi-square test? Explain why or why not.

What is p-value from Fisher’s exact test?

##Step 2: Check: Check the assumptions. You will need to compute the expected counts ##here.Independence and Expected counts all ≥5
Sus1<-mychi.test$expected [1]>=5
Sus2<-mychi.test$expected [2]>=5
Sus3<-mychi.test$expected [3]>=5
Sus4<-mychi.test$expected [4]>=5

##Fisher’s Exact test - alternative to the Chi-square test when the conditions are not met.
##To carry out Fisher’s Exact test, use the fisher.test() function, specifying the name of the object
##that contains the 2x2 table (e.g., mytable, mytable2):

fisher.test(mytable)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  mytable
## p-value = 0.009644
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.7148097 0.9565290
## sample estimates:
## odds ratio 
##  0.8266191

CONCEPTUAL QUESTIONS (confidence intervals, sampling distributions and inference for a single proportion) Assume we are discussing the sampling distribution for a sample proportion.

Question 6:

What does the sampling distribution show us (the spread of our data or the spread of possible sample statistics)?

Question 7:

Can we observe the true sampling distribution? Why or why not?

Question 8:

What sampling distribution are we interested in when we conduct a hypothesis test? Why is this?

Question 9:

If the central limit theorem conditions met, are we saying that our data is normal or that the sampling distribution is normal?

The sampling distribution is normal.

Question 10:

Why do we check the CLT conditions and compute the standard error by plugging in p^ when constructing confidence intervals, but by plugging in po (the null value) when doing hypothesis testing?