set.seed(123)
N = 9
X = runif(10000,1, N)
Y = rnorm(10000,(N+1)/2,(N+1)/2)
x = median(X)
y = quantile(Y,0.25)[[1]]
Probability X > 5.0 given that X>2.94
paste0('P(X>x | X>y) = ',mean(X[X>y]>x))
## [1] "P(X>x | X>y) = 0.546567555749891"
Probability that X > 5 and Y > 1.56 These two are independent, so this is just P(X>x)P(Y>y) Since we know x is the median of X, and y is the first quartile of Y, this should be about 0.5 0.75 - 0.375 Let’s check if that is what we get:
paste0('P(X>x, Y>y) = ',mean(X>x)*mean(Y>y))
## [1] "P(X>x, Y>y) = 0.375"
Probability X < 5.0 given that X>2.94
paste0('P(X<x | X>y) = ',mean(X[X>y]<x))
## [1] "P(X<x | X>y) = 0.453432444250109"
I realized after looking at the next part that I misunderstood this. I left the following since I did it. Basically I’m just checking if the joint vs marginal match at different values of x and y.
percent_list = c(0.2,0.4,0.6,0.8)
x_list = round(quantile(X,percent_list), 2)
y_list = round(quantile(Y,percent_list), 2)
prob_table = expand.grid(x_list,y_list)
names(prob_table) = c('x','y')
prob_table$marginal = apply(prob_table, 1, function(pt) round(mean(X>pt['x'])*mean(Y>pt['y']),3))
prob_table$joint = apply(prob_table, 1, function(pt) round(mean(X>pt['x'] & Y>pt['y']), 3))
prob_table
Looks pretty close to me!
I think this is actually what I was supposed to do:
cont_table = matrix(
c(
sum(X>x & Y>y),
sum(X<=x & Y>y),
sum(X>x & Y<=y),
sum(X<=x & Y<=y)
), nrow = 2, ncol = 2, byrow = TRUE,
dimnames = list(c('Ytrue','Yfalse'), c('Xtrue','Xfalse'))
)
cont_table
## Xtrue Xfalse
## Ytrue 3756 3744
## Yfalse 1244 1256
fisher.test(cont_table)
##
## Fisher's Exact Test for Count Data
##
## data: cont_table
## p-value = 0.7995
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.9242273 1.1100187
## sample estimates:
## odds ratio
## 1.012883
chisq.test(cont_table)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: cont_table
## X-squared = 0.064533, df = 1, p-value = 0.7995