The Chi Square Test for Independence literally tests whether P(AB) is “close enough” to being equal to P(A) x P(B). For problem 6.29 in OpenStats, evaluate whether P(AB)=P(A)xP(B) strictly holds by calculating the marginal probabilities for each outcome and multiplying them together then comparing them to the joint probabilities. Should you expect strict equality to hold? Why or why not? Then conduct the Chi Square test using R. Post your solutions and code (screen shots are fine).

##Problem 6.29 Offshore drilling, Part I. A 2010 survey asked 827 randomly sampled registered voters in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of California? Or do you not know enough to say?” Below is the distribution of responses, separated based on whether or not the respondent graduated from college.

College Grad Yes No Support 154 132 Oppose 180 126 Do not know 104 131 Total 438 389

## Let A represent college graduate, B supporting drilling
df <- data.frame(Yes= c(154, 180, 104), No= c(132, 126, 131))
df
##   Yes  No
## 1 154 132
## 2 180 126
## 3 104 131
A <- (438/827)
NotA <- 1-A
B <- ((154+132)/827)
NotB <- 1-B

##Joint Probability comparisons
#P(AB)
PofAB <- A*B
JointAB <- 154/827
JointAB
## [1] 0.1862152
PofAB
## [1] 0.1831594
#P(AB')
PofaANDNotb <- A*NotB
JointANotB <- ((180+104)/827)
PofaANDNotb
## [1] 0.3464658
JointANotB
## [1] 0.3434099
#P(A'B)
PofNotaANDb <- NotA*B
JointNotAB <- (132/827)
PofNotaANDb
## [1] 0.1626689
JointNotAB
## [1] 0.1596131
#P(A'B')
PofNotaANDNotb <- NotA*NotB
JointNotANotB <- ((126+131)/827)
PofNotaANDNotb
## [1] 0.3077059
JointNotANotB
## [1] 0.3107618
##This is the same calculations as above except this time I am separating the Oppose and Do Not Know. 
NotB1 <- (180+126)/827 ##Oppose
NotB2 <- (104+131)/827 ##Do Not Know

#P(AB1')
PofaANDNotb1 <- A*NotB1
JointANotB1 <- ((180)/827)
PofaANDNotb1
## [1] 0.1959677
JointANotB1
## [1] 0.2176542
#P(AB2')
PofaANDNotb2 <- A*NotB2
JointANotB2 <- ((104)/827)
PofaANDNotb2
## [1] 0.1504981
JointANotB2
## [1] 0.1257557
#P(A'B1')
PofNotaANDNotb1 <- NotA*NotB1
JointNotANotB1 <- ((126)/827)
PofNotaANDNotb1
## [1] 0.1740444
JointNotANotB1
## [1] 0.1523579
#P(A'B2')
PofNotaANDNotb2 <- NotA*NotB2
JointNotANotB2 <- ((131)/827)
PofNotaANDNotb2
## [1] 0.1336615
JointNotANotB2
## [1] 0.1584039
##Chi Square Test
CHI <- chisq.test(df)
CHI
## 
##  Pearson's Chi-squared test
## 
## data:  df
## X-squared = 11.461, df = 2, p-value = 0.003246

When we multiplied the marginal probabilities of being a college graduate (or not) and supporting drilling (or not), it seems as though P(AB)=P(A)xP(B) would hold up. They appear to be close enough to equaling one another. If we were to round to 2 decimal places most of them would equal each other, and this is the case for all of them if we were to round to 1 decimal place. But, this did change when I separated the categories “Oppose” and “Do Not Know.” Perhaps having the “Do Not Know” section makes it less clear whether or not being a college graduate is independent of supporting drilling. Based on the chisq test the p-value is low so we can reject the null hypothesis of independence meaning there is statistical significance to the variables not being independent.

## Not really a part of the discussion. I was just curious to see what happened if I took out the "Do Not Know" section.
## Let A represent college graduate, B supporting drilling
df <- data.frame(Yes= c(154, 180), No= c(132, 126))
df
##   Yes  No
## 1 154 132
## 2 180 126
A <- ((154+180)/592)
NotA <- 1-A
B <- ((154+132)/592)
NotB <- 1-B

##Joint Probability comparisons
#P(AB)
PofAB <- A*B
JointAB <- 154/592
JointAB
## [1] 0.2601351
PofAB
## [1] 0.2725644
#P(AB')
PofaANDNotb <- A*NotB
JointANotB <- ((180)/592)
PofaANDNotb
## [1] 0.2916248
JointANotB
## [1] 0.3040541
#P(A'B)
PofNotaANDb <- NotA*B
JointNotAB <- (132/592)
PofNotaANDb
## [1] 0.2105437
JointNotAB
## [1] 0.222973
#P(A'B')
PofNotaANDNotb <- NotA*NotB
JointNotANotB <- ((126)/592)
PofNotaANDNotb
## [1] 0.2252671
JointNotANotB
## [1] 0.2128378
CHI <- chisq.test(df)
CHI
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  df
## X-squared = 1.294, df = 1, p-value = 0.2553