Lecture 7 Statistics: The Next Generation
Eamonn Mallon
31/01/2020
BS2004: Contemporary Techniques in Biological Data Analysis
- Optional 15 credit second year course
- 11 x 3 hours sessions plus 6 help sessions
- Second semester
Do I not know all the stats?
- What we taught you is a good basis
- If your experiments are simple, they will be fine
- multiway ANOVAs (with interactions), nested ANOVAs, ANCOVAs etc. etc
- You could learn these piecemeal as required or
BS2004: Contemporary Techniques in Biological Data Analysis
- Model formulae
- General and generalised linear models
- Geometrical approach
Model formulae
- 50 male squirrels' weight, 50 female squirrels' weight
- Does the weight of the squirrel depend on its sex?
- Model formula: WEIGHT=SEX
- In R:
WEIGHT~SEX
- ~ means “depends on” (Dependent variable LHS, Independent variables RHS)
General and generalised linear models
- t-tests, ANOVA, ANCOVA, and regressions are types of General linear models
- The difference between general and generalised linear models is simply how error is handled
- General linear models assume errors are independent and follow a normal distribution
- Generalized linear models can use a wide range of distributions
- i.e. your data doesn't have to be normal (bye bye non-parametric tests)
- lm is the R command for General linear models
- glm is the R command for Generalised linear models
Geometrical approach
- You can just learn to do tests and not know how they work; dangerous and unsatisfying
- The way to explain why a test works is to give the mathematical proof
- This maths isn't important in the everyday use of stats
- The maths is not accessible to most users
- So in BS2004 we are going to use a different approach
Geometrical approach

- Any three points in n-dimensional space can be represented in 2 dimensions
A geometrical representation of an ANOVA
- Remember back to the ANOVA lecture
- SSY= SSE + SSA
- Imagine we have 30 data points of yield (3 levels of fertiliser with 10 replicates each)
- In 30 dimensional space, each point is represented by 30 coordinates
- Point Y represents the data,
- so the 30 coordinates describing this point are the 30 measurements of yield
- Point M represents the grant mean,
- so the 30 coordinates describing this point have all the same value (the grant mean)
- Point F represents the treatment means,
- so the 30 coordinates describing this point, the first ten equal the mean of treament A, the next ten the treatment B mean, and the last 10 the mean of treatment C
A geometrical representation of an ANOVA
Pythagorus' theorem

\[ d_1^2=d_2^2+d_3^2 \]
or
\[ SSY =SSE + SSA \]
A geometrical representation of an ANOVA
A geometrical representation of an ANOVA