Seminar Week 7

class: center, middle, inverse, title-slide

.title[
# Seminar Week 7
]
.subtitle[
## Recap of Statistical Modelling Techniques
]
.author[
### Ingmar Staude
]

---

# Recap of Statistical Models

What statistical tools do you know by now?

- t-test (compare two groups)  
- Simple linear regression (one predictor, one outcome)  
- ANOVA (compare more than two groups)  
- ANCOVA (compare groups while controlling for covariates)  
- Separate-slope model (different slopes per group)  
- Multiple regression (multiple predictors)

---
  
# Which Model?
  
You want to know the relationship between predator abundance and prey population size.

```r
lm(prey_abundance ~ predator_abundance)
```

---
  
# Which Model?
  
Are plant communities in novel ecosystems less diverse on average than in historical reference ecosystems?  
--

```r
t.test(novel, reference, alternative = "less")
```

```
## 
##  Welch Two Sample t-test
## 
## data:  novel and reference
## t = -6.3206, df = 37.082, p-value = 1.146e-07
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -3.974375
## sample estimates:
## mean of x mean of y 
##  12.42487  17.84623
```

---

# Alternative model
  
Can we answer the same question with a different model?

```r
model <- lm(diversity ~ ecosystem, data = d)
```

<small>

```r
summary(model)
```

```
## 
## Call:
## lm(formula = diversity ~ ecosystem, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3247 -1.8907 -0.0649  1.9166  4.9359 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         12.4249     0.6065  20.486  < 2e-16 ***
## ecosystemReference   5.4214     0.8577   6.321 2.07e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.712 on 38 degrees of freedom
## Multiple R-squared:  0.5125, Adjusted R-squared:  0.4997 
## F-statistic: 39.95 on 1 and 38 DF,  p-value: 2.07e-07
```
<small>

---

# Which model?
  
You want to test whether planting native wildflowers increases pollinator visits in gardens, whilst controlling for surrounding green space. 
- Treatment = gardens with wildflower mix  
- Control = gardens without wildflower mix
- Plus surrounding green space as covariate

```r
lm(pollinator_visits ~ treatment + garden_size)
```

---

# Which model?
  
Is the average time until a conservation policy shows measurable ecological effects longer than a legislative period?

```r
t.test(policy_delay, mu = 4)   # assuming 4 years as the legislative period
```

```
## 
##  One Sample t-test
## 
## data:  policy_delay
## t = 5.249, df = 19, p-value = 4.573e-05
## alternative hypothesis: true mean is not equal to 4
## 95 percent confidence interval:
##  5.372805 7.193690
## sample estimates:
## mean of x 
##  6.283248
```

---

# Alternative model
  
Can we answer the same question with a different model?

```r
model <- lm(policy_delay ~ 1)
summary(model)
```

```
## 
## Call:
## lm(formula = policy_delay ~ 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2165 -1.2704 -0.0433  0.8142  3.2906 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.283      0.435   14.45 1.07e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.945 on 19 degrees of freedom
```

---

# Which model?
  
Does plant species richness influence ecosystem productivity *differently* under nutrient enrichment?

```r
lm(biomass ~ richness * treatment)
```

---

# Goodness of fit

You’re told: **SSE = 10,000**, **TSS = 100,000**.

- Compute R². What does it mean in plain language?
- Is that enough to claim the model is “good”? Why / why not?
- What does a good metric need to take into account?

---

# Dummy variables

A categorical variable has three groups (A, B, C). Write an ANOVA model for this predictor in math form using dummy variables.

---