Lecture 8 Are A and B connected?

Eamonn Mallon
25/09/2019

Todays' tests

  • Pearson's or Spearman's rank correlation (correlating two variables)
  • chi-squared test (testing for independence in contingency tables)

Correlation

plot of chunk unnamed-chunk-1

  • defined in terms of variance of x, variance of y and covariance of xy
  • covariance: the way the two vary together
  • \( r =\frac{cov(x,y)}{\sqrt{s_x^2s_y^2}} \)
  • lots of maths and finally;
  • \( r =\frac{SSXY}{\sqrt{SSX.SSY}} \)

Pearson's product-moment correlation (parametric)

  • \( r =\frac{n\sum x_iy_i-\sum x_i\sum y_i}{\sqrt{n \sum x_i^2 - (\sum x_i)^2} \sqrt{n \sum y_i^2 - (\sum y_i)^2}} \)
  • Assumptions
    • both variables should be normally distributed
    • linearity (straight line relationship between each of the two variables)

R code for a pearson's product-moment correlation (parametric)

cor.test(babies$gestation, babies$bwt)

    Pearson's product-moment correlation

data:  babies$gestation and babies$bwt
t = 15.609, df = 1221, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.3600303 0.4535398
sample estimates:
     cor 
0.407854 

The length of gestation correlates with a baby's weight (pearson: r = 0.408, t = 15.609, df =1221, p < 0.0001 )

Spearman rank correlation (non-parametric)

  • Based on ranks
  • monotonic (not necessarily linear)
  • \( \rho = 1 -\frac{6\sum d_i^2}{n(n^2-1)} \)
  • d is the difference between the ranks of corresponding variables

R code for a spearman rank correlation (non-parametric)

cor.test(babies$gestation, babies$bwt, method = "spearman")

    Spearman's rank correlation rho

data:  babies$gestation and babies$bwt
S = 181438572, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.4048838 

The length of gestation correlates with a baby's weight (spearman: rho = 0.405, n = 1223, p < 0.0001 )

Correlation does not mean causation

Chi squared contingency table

Blue.eyes Brown.eyes Row.totals
Fair hair 38 11 49
Dark hair 14 51 65
Column totals 52 62 114
  • counts (number of leaves, number of patients who dies etc.)
  • Is there an association between hair colour and eye colour?
  • Calculate expected value based on null hypothesis (Ho: There is no association between hair and eye colour)
  • expected = (row total x column total) / grand total
  • \( \chi^2 = \sum\frac{(O-E)^2}{E} \)
  • d.f. = (r-1) x (c-1)

R code for Chi squared contingency table

count <- matrix(c(38,14,11,51), nrow=2)
chisq.test(count)

    Pearson's Chi-squared test with Yates' continuity correction

data:  count
X-squared = 33.112, df = 1, p-value = 8.7e-09

Hair colour is associated with eye colour (\( \chi^2 \) = 33.112, d.f. = 1, p = \( 8.7 \times 10^{-9} \))

What you learned to do in the last couple of sessions

  • Are men and women different heights?
  • Is height correlated with weight?
  • Is hair colour associated with eye colour

  • These are examples of almost any biological question. Can you think of one that doesn't fit above? At a simple level, there isn't anything you can't ask with the tools you have

Next semester

  • Regression: How x changes y
  • Anova: Allows you to tell >2 sample means are different
  • Preview of linear models: one test to rule them all.