Biserial correlation

Introduction

Correlations were introduced in a previous post ( http://rpubs.com/juanhklopper/assumptions_for_parametric_tests ). Biserial correlation has one variable being dichotomous, i.e the sample space only contains two possible values. There are two types of biserial correlations depending on the type of the dichotomous variable. In point-biserial correlation the dichotomous variable is discrete. In biserial correlation some form of continuum is inferred by the dichotomous variable.

This distinction in the dichotomous variable can be subtle. A frequently used example for the former (point-biserial) is the existance of a disease. It is either present or it is not. As for the latter (biserial), consider the results of an examination. A student might pass or fail. That distinction usually has a numrical underpinning. A student that passes might have scraped by or have done really well.

Point-biserial correlation

As mentioned, this requires the categorical variable to be totally discrete. In the example below, two variables are created. The first yos refers to years of smoking. The second is disease and refers to the presence of a certain disease, with 0 indicating the absence and 1 indicating the presence of the disease.

yos <- sample(1:30, 100, replace = TRUE)
disease <- sample(0:1, 100, replace = TRUE)

The calculation uses the cor.test() command.

cor.test(yos, disease)

## 
##  Pearson's product-moment correlation
## 
## data:  yos and disease
## t = 0.85316, df = 98, p-value = 0.3957
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1124508  0.2776001
## sample estimates:
##        cor 
## 0.08586381

This is the same calculation as used for Pearson correlation! The direction (sign) of the correlation coefficient is based purely on the coding of the categorical variable, i.e. which is coded as 0 and which is coded as 1.

Biserial corerlation

Calculating the biserial correlation requires knowledge of the proportions of the sample space values. The code snippet below shows how this is accomplished using the prop.table() and table() commands.

prop.table(table(disease))

## disease
##    0    1 
## 0.53 0.47

These proportions are used in the conversion. The conversion can simply be calculated using the polyserial() command, though. It requires the polycor package.

library(polycor)
polyserial(yos, disease)

## [1] 0.1077252

Biserial correlation

Dr Juan H Klopper

Introduction

Point-biserial correlation

Biserial corerlation