HW7_Porter

Question 3

Run cor.test() on the correlation between “area” and “perm” in the rock data set and interpret the results. Note that you will have to use the “\(” accessor to get at each of the two variables (like this: rock\)area). Make sure that you interpret both the confidence interval and the p-value that is generated by cor.test().

cor.test(rock$area,rock$perm)

## 
##  Pearson's product-moment correlation
## 
## data:  rock$area and rock$perm
## t = -2.9305, df = 46, p-value = 0.005254
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6118206 -0.1267915
## sample estimates:
##       cor 
## -0.396637

Analysis:

The 95% confidence interval for rho ranged from -.61 to -.13. This is a fairly narrow range that doesn’t intersect 0, so if this is confidence interval isn’t an outlier, this sample has a possibility to be negatively correlated. This is further evident given how low the p-value, .0052, is compared with the alpha of .05. Given this information, we would reject the null hypothesis that the true correlation = 0. We can see that rock area and rock permeability have a moderate inverse correlation of -.40.

Question 4

Create a copy of the bfCorTest() custom function presented in this chapter. Don’t forget to “source” it (meaning that you have to run the code that defines the function one time to make R aware of it). Conduct a Bayesian analysis of the correlation between “area” and “perm” in the rock data set.

#install.packages(“BayesFactor”)
library(BayesFactor)

## Loading required package: coda

## Loading required package: Matrix

## ************
## Welcome to BayesFactor 0.9.12-4.2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
## 
## Type BFManual() to open the manual.
## ************

bfCorTest <- function (x,y) # Get r from BayesFactor
{
zx <- scale(x) # Standardize X
zy <- scale(y) # Standardize Y
zData <- data.frame(x=zx,rhoNot0=zy) # Put in a data frame
bfOut <- generalTestBF(x ~ rhoNot0, data=zData) # linear coefficient
mcmcOut <- posterior(bfOut,iterations=10000) # posterior samples
print(summary(mcmcOut[,'rhoNot0'])) # Show the HDI for r
return(bfOut) # Return Bayes factor object
}
bfCorTest(rock$area, rock$perm)

## 
## Iterations = 1:10000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##      -0.348443       0.137369       0.001374       0.001532 
## 
## 2. Quantiles for each variable:
## 
##     2.5%      25%      50%      75%    97.5% 
## -0.62030 -0.44033 -0.34858 -0.25577 -0.07943

## Bayes factor analysis
## --------------
## [1] rhoNot0 : 8.072781 ±0%
## 
## Against denominator:
##   Intercept only 
## ---
## Bayes factor type: BFlinearModel, JZS

Analysis:

The output of the Bayesian method indicates a mean correlation in the posterior distribution of rho of -.35. This is slightly larger than the -.40 produced by the correlation test. The High Density Interval for the Bayesian Method is slightly wider, ranging from -.60 to -.09, but we this range gives us more certainty of the true correlation. The Bayes Factor, rhoNot0, also supports the alternative hypothesis 8:1.

Question 8

Not unexpectedly, there is a data set in R that contains these data. The data set is called UCBAdmissions and you can access the department mentioned above like this: UCBAdmissions[ , ,1]. Make sure you put two commas before the 1: this is a three dimensional contingency table that we are subsetting down to two dimensions. Run chisq.test() on this subset of the data set and make sense of the results.

ucbDF <- UCBAdmissions[,,1]
ucbDF

##           Gender
## Admit      Male Female
##   Admitted  512     89
##   Rejected  313     19

chisq.test(ucbDF,correct = FALSE)

## 
##  Pearson's Chi-squared test
## 
## data:  ucbDF
## X-squared = 17.248, df = 1, p-value = 3.28e-05

Analysis:

This chi-squared test evaluates a 2X2 table regarding the admittance of students by gender at UC Berkeley. The reported value is 17.25 with 1 degree of freedom. The degree of freedom gives insight into how many objects may vary to perform this test. In this instance, since it was a 2 X 2 table, only 1 variable could be manipulated. The extremely low p-value of 3.28e-05 indicates that two factors aren’t independent and that a male applicants are more likely to be rejected than woman from UC Berkley

Question 9

Use contingencyTableBF() to conduct a Bayes factor analysis on the UCB admissions data. Report and interpret the Bayes factor.

ctUCOUT <- contingencyTableBF(ucbDF, sampleType = 'poisson')
ctUCOUT

## Bayes factor analysis
## --------------
## [1] Non-indep. (a=1) : 1111.64 ±0%
## 
## Against denominator:
##   Null, independence, a = 1 
## ---
## Bayes factor type: BFcontingencyTable, poisson

Analysis:

The Bayes Factor further proves that the two factors aren’t independent from one another. The strong non-indep indicates a strong 1111:1 support of the alternative hypothesis.

Question 10

Using the UCBA data, run contingencyTableBF() with posterior sampling. Use the results to calculate a 95% HDI of the difference in proportions between the columns.

cnPostOut <- contingencyTableBF(ucbDF, sampleType = 'poisson', posterior = TRUE, iterations = 10000)
summary(cnPostOut)

## 
## Iterations = 1:10000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##               Mean     SD Naive SE Time-series SE
## lambda[1,1] 510.62 22.674  0.22674        0.22262
## lambda[2,1] 312.22 17.637  0.17637        0.17637
## lambda[1,2]  89.54  9.529  0.09529        0.09529
## lambda[2,2]  19.96  4.465  0.04465        0.04465
## 
## 2. Quantiles for each variable:
## 
##               2.5%    25%    50%    75%  97.5%
## lambda[1,1] 468.40 494.91 510.05 525.87 556.39
## lambda[2,1] 278.48 300.16 311.89 323.91 347.98
## lambda[1,2]  71.95  83.01  89.13  95.63 109.52
## lambda[2,2]  12.14  16.77  19.62  22.77  29.71

maleProp <- cnPostOut[,'lambda[1,1]']/cnPostOut[,'lambda[2,1]']
femaleProp <- cnPostOut[,'lambda[2,1]']/cnPostOut[,'lambda[2,2]']
diffPop <- maleProp - femaleProp
hist(diffPop)
abline(v=quantile(diffPop,c(0.025)), col='black')
abline(v=quantile(diffPop,c(0.975)), col='black')

mean(diffPop)

## [1] -14.83673

Analysis: The mean difference in the porportion of the columns is -14.9.Each of the Lambda values represents a factor in the original data set(Male accepted, male rejected, female accepted, female rejected).While the mean values returns a value similar to the original data, the HDI indicates ranges of each variable. The male accepted ranges from 468 - 555 at 95%. The male rejected could be as low as 279 and as high as 348 at 95%. The female accepted can range from 72 - 109 and the female rejected can be as low as 12 or as high as 30 @ 95% HDI.

HW7_Porter

Damon Porter

November 18, 2019

Question 3

Question 4

Question 8

Question 9

Question 10