SCIE807 Advanced Ecological Data Analysis

February 26, 2019

Welcome to SCIE807 Advanced Ecological Data Analysis

What to expect from this course?

Exponential increase in stats and R skills
Coding till your eyes are bleeding! 😜

Learning outcomes:

Evaluate appropriate quantitative analysis methods
Critically analyse ecological data using statistical software
Apply skills to experimental design and data presentation
Present work at the appropriate academic standard

Week 1

SCIE807 Advanced Ecological Data Analysis

Week 1

SCIE807 Advanced Ecological Data Analysis

Week 1

SCIE807 Advanced Ecological Data Analysis

Week 1

Time table

Week 1

Types of variables

Week 1

Linear models and their specifications

Week 1

Linear models and their specifications

Analysis of variance and linear regression are closely related linear models and can both be specified with the same function (lm) using a formula notation. This approach uses the '~' symbol (tilde), which reads 'as a function of' and connects the left hand side (LHS) of the formula with the right hand side (RHS), i.e. it links the response variable to the explanatory variable(s). Multiple explanatory variables are separated by + signs.

lm(y ~ x, data = ...)

lm(y ~ x1 + x2, data = ...)

Week 1

Linear models and their specifications

Interactive effects of the explantory variables on the predictor, so-called interactions are specified using the colon ':'. To save ourselves some coding, we can simply use the shorthand asterisk '*' notation to include all main effects and interactions.

lm(y ~ x1 + x2 + x1:x2, data = ...)

## Can be written as:
lm(y ~ x1 * x2, data = ...)

## ...even with more than two explanatory variables
lm(y ~ x1 * x2 * x3, data = ...)

Week 1

The principle of test statistics

A test statistic (t-value, F-value, …) can be regarded a signal-to-noise ratio. The higher the ratio, the easier it becomes to detect a true signal. E.g. the t-statistic of a t-test boils down to the difference in group means (signal) divided by the pooled standard error (noise).

Week 1

Analysis of variance (ANOVA)

## Using the built-in data set PlantGrowth
data(PlantGrowth)
str(PlantGrowth)

## 'data.frame':    30 obs. of  2 variables:
##  $ weight: num  4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
##  $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...

summary(PlantGrowth)

##      weight       group   
##  Min.   :3.590   ctrl:10  
##  1st Qu.:4.550   trt1:10  
##  Median :5.155   trt2:10  
##  Mean   :5.073            
##  3rd Qu.:5.530            
##  Max.   :6.310

Week 1

Analysis of variance (ANOVA)

Always carry out sanity checks (str, summary) and plot your data before you get carried away with statistical modelling. What do you conclude looking at the boxplot below?

Week 1

Analysis of variance (ANOVA)

If we carry out an ANOVA with the aov command, we obtain a summary stating an overall P-value for the predictor variable. So, it only indicates whether the predictor variable had a statistically significant effect but it does not tell us where those differences lie, i.e. which of the levels of the predictor variable differ significantly from each other. They could all differ signficantly from each other but a single significant difference between any two of the factor levels is enough to give an overall significance for the effect of the predictor variable.

##             Df Sum Sq Mean Sq F value Pr(>F)  
## group        2  3.766  1.8832   4.846 0.0159 *
## Residuals   27 10.492  0.3886                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Week 1