Let X and Y be random variables. Then X and Y are independent if for any sets A and B,
\[P(X \in A , Y \in B) = P(X \in A)P(Y \in B).\]
Two events A and B are independent if
\[P(A \cap B) = P(A)P(B)\]
Two events _A_ and _B_ are _independent_ **if** (and only if)
\[P(B|A) = P(B)\]
## Example
Suppose we take a random sample of $n = 100$ online adults and find 71 use Facebook, 18 use Twitter, and 15 use both.
```
## Facebook
## Twitter Yes No Sum
## Yes 15 3 18
## No 56 26 82
## Sum 71 29 100
```
1. Are using Facebook and using Twitter mutually exclusive?
1. Are Facebooking and tweeting independent?
## Partial Solution
Let _A_ = respondent uses Facebook and _B_ = respondent uses Twitter.
We know that $P(A) = \frac{71}{100}=0.71$, $P(B) = \frac{18}{100}=0.18$, and $P(A\cap B)= \frac{15}{100}=0.15$.
Since $P(A \cap B)\neq 0$, the events _A_ and _B_ are not mutually exclusive.
Recall that _A_ and _B_ are independent if $P(A \cap B)=P(A)P(B)$. Since $P(A \cap B) = 0.15 \neq P(A)P(B)=0.71\times 0.18 = 0.1278,$ the events _A_ and _B_ are not independent.
## Are Facebook usage and Twitter usage Independent?
What do we expect in each cell?
```
## Facebook
## Twitter Yes No
## Yes 15 3
## No 56 26
```
```
## Facebook
## Twitter Yes No
## Yes 12.78 5.22
## No 58.22 23.78
```
## Test Statistic
$$TS = \sum_{\text{all cells}}\frac{O - E}{E}$$
* Create a tidy data set.
```r
on <- c(15, 3, 56, 26)
ONL <- matrix(data = on, nrow = 2, byrow = TRUE)
dimnames(ONL) <- list(Twitter = c("Yes", "No"), Facebook = c("Yes", "No"))
ONLT <- as.table(ONL)
ONLTDF <- as.data.frame(ONLT)
TDF <- vcdExtra::expand.dft(ONLTDF)
dim(TDF)
```
```
## [1] 100 2
```
## Tidy Data
## Data
```r
T1 <- xtabs(~Twitter + Facebook, data = TDF)
T1
```
```
## Facebook
## Twitter No Yes
## No 26 56
## Yes 3 15
```
```r
chisq.test(T1, correct = FALSE)
```
```
##
## Pearson's Chi-squared test
##
## data: T1
## X-squared = 1.6217, df = 1, p-value = 0.2029
```
## Graph Code
```r
library(tidyverse)
TDF %>%
ggplot(aes(x = Twitter, fill = Facebook)) +
geom_bar(position = "fill") +
labs(y = "Fraction")
```
## Graph
<img src="Probability_Slides_files/figure-html/unnamed-chunk-7-1.png" width="672" />
## Randomization Test
```r
obs.stat <- chisq.test(T1, correct = FALSE)$stat
obs.stat
```
```
## X-squared
## 1.621673
```
```r
set.seed(123)
sims <- 10^4 - 1
ts <- numeric(sims)
for(i in 1:sims){
ts[i] <- chisq.test(xtabs(~Twitter + sample(Facebook), data = TDF),
correct = FALSE)$stat
}
pvalue <- (sum(ts >= obs.stat) + 1)/(sims + 1)
pvalue
```
```
## [1] 0.2546
```
## Randomiztion Distribution Code
```r
library(ggplot2)
ggplot(data = data.frame(x = ts), aes(x = x)) +
geom_density(fill = "peachpuff", adjust = 3) +
theme_bw() +
stat_function(fun = dchisq, args = list(df = 1), color = "red") +
geom_vline(xintercept = obs.stat, linetype = "dashed",
color = "lightblue") +
xlim(0, 8)
```
## Randomization Distribution
<img src="Probability_Slides_files/figure-html/unnamed-chunk-10-1.png" width="672" />