Independence

Let X and Y be random variables. Then X and Y are independent if for any sets A and B,

\[P(X \in A , Y \in B) = P(X \in A)P(Y \in B).\]

Two events A and B are independent if

\[P(A \cap B) = P(A)P(B)\]

Two events _A_ and _B_ are _independent_ **if** (and only if)

\[P(B|A) = P(B)\]

## Example

Suppose we take a random sample of $n = 100$ online adults and find 71 use Facebook, 18 use Twitter, and 15 use both.


```
##        Facebook
## Twitter Yes No Sum
##     Yes  15  3  18
##     No   56 26  82
##     Sum  71 29 100
```

1. Are using Facebook and using Twitter mutually exclusive?
  1. Are Facebooking and tweeting independent?
  
  ## Partial Solution
  
  Let _A_ = respondent uses Facebook and _B_ = respondent uses Twitter.

We know that $P(A) = \frac{71}{100}=0.71$, $P(B) = \frac{18}{100}=0.18$, and $P(A\cap B)= \frac{15}{100}=0.15$.

Since $P(A \cap B)\neq 0$, the events _A_ and _B_ are not mutually exclusive.

Recall that _A_ and _B_ are independent if $P(A \cap B)=P(A)P(B)$.  Since $P(A \cap B) = 0.15 \neq P(A)P(B)=0.71\times 0.18 = 0.1278,$ the events _A_ and _B_ are not independent.

## Are Facebook usage and Twitter usage Independent?

What do we expect in each cell?
  
  
  ```
  ##        Facebook
  ## Twitter Yes No
  ##     Yes  15  3
  ##     No   56 26
  ```
  
  ```
  ##        Facebook
  ## Twitter   Yes    No
  ##     Yes 12.78  5.22
  ##     No  58.22 23.78
  ```

## Test Statistic

$$TS = \sum_{\text{all cells}}\frac{O - E}{E}$$
  
  * Create a tidy data set.


```r
on <- c(15, 3, 56, 26)
ONL <- matrix(data = on, nrow = 2, byrow = TRUE)
dimnames(ONL) <- list(Twitter = c("Yes", "No"), Facebook = c("Yes", "No"))
ONLT <- as.table(ONL)
ONLTDF <- as.data.frame(ONLT)
TDF <- vcdExtra::expand.dft(ONLTDF)
dim(TDF)
```

```
## [1] 100   2
```

## Tidy Data

## Data ```r T1 <- xtabs(~Twitter + Facebook, data = TDF) T1 ``` ``` ## Facebook ## Twitter No Yes ## No 26 56 ## Yes 3 15 ``` ```r chisq.test(T1, correct = FALSE) ``` ``` ## ## Pearson's Chi-squared test ## ## data: T1 ## X-squared = 1.6217, df = 1, p-value = 0.2029 ``` ## Graph Code ```r library(tidyverse) TDF %>% ggplot(aes(x = Twitter, fill = Facebook)) + geom_bar(position = "fill") + labs(y = "Fraction") ``` ## Graph <img src="Probability_Slides_files/figure-html/unnamed-chunk-7-1.png" width="672" /> ## Randomization Test ```r obs.stat <- chisq.test(T1, correct = FALSE)$stat obs.stat ``` ``` ## X-squared ## 1.621673 ``` ```r set.seed(123) sims <- 10^4 - 1 ts <- numeric(sims) for(i in 1:sims){ ts[i] <- chisq.test(xtabs(~Twitter + sample(Facebook), data = TDF), correct = FALSE)$stat } pvalue <- (sum(ts >= obs.stat) + 1)/(sims + 1) pvalue ``` ``` ## [1] 0.2546 ``` ## Randomiztion Distribution Code ```r library(ggplot2) ggplot(data = data.frame(x = ts), aes(x = x)) + geom_density(fill = "peachpuff", adjust = 3) + theme_bw() + stat_function(fun = dchisq, args = list(df = 1), color = "red") + geom_vline(xintercept = obs.stat, linetype = "dashed", color = "lightblue") + xlim(0, 8) ``` ## Randomization Distribution <img src="Probability_Slides_files/figure-html/unnamed-chunk-10-1.png" width="672" />