Lecture: Is my data normal?

Eamonn Mallon
03/09/2020

Is my data normal?

  • Parametric (normal data) tests are more powerful than non-parametric
  • So, you should use parametric tests if you can
  • The simplest and best way of doing this is by looking

Is my data normal?

plot of chunk unnamed-chunk-1

plot of chunk unnamed-chunk-2

Is my data normal?

  • Parametric (normal data) tests are more powerful than non-parametric
  • So, you should use parametric tests if you can
  • The simplest and best way of doing this is by looking
  • A slight wrinkle, its not that the data is normal but rather that the residuals are.

A slight detour: Residuals

plot of chunk unnamed-chunk-3

What to do if your data isn't normal?

  • Transform it
  • Use non-parametric tests

Transforming data

  • Applying a mathematical function to make the data/residuals fit a normal distribution
  • What! Surely thats dodgy?
    • Is converting feet into metres?
    • You are just changing the scale on which the data is measured.
  • Lots of transformations, but we'll look at log

Log transformation

plot of chunk unnamed-chunk-4

  • A log-transformation stretches out the left hand side (smaller values) of the distribution and squashes in the right hand side (larger values). This is obviously useful where the data set has a long tail to the right (right skewed)

Log transformation

plot of chunk unnamed-chunk-5

plot of chunk unnamed-chunk-6

Non-parametric tests

  • Usually based on ranks
  • Why is that less powerful?
    • Think about 5,10,1000
    • That becomes 1,2,3

Next Week

The wait is over, lets do some statistical tests including

  • t-test
  • Wilcoxon's test
  • Two types of correlations
  • chi-squared test