This script analyzes the bird behavior data collected by UCI E166L students. Observers walked towards birds at 1 m / s and note the bird’s behavior (perching or other active behaviors, including singing or foraging), the bird’s location relative to cover (concealed or open), their starting distance, and the flight initiation distance (FID) when the bird flies away.
Location: Birds that are out in the open are more vulnerable to detection by predators and will fly sooner (larger FID)
Behavior : Birds that are singing or foraging are gaining fitness and will fly later (smaller FID)
Starting distance: Birds that detect a predator a long ways off incur a cost of vigilance and will fly sooner (larger FID)
Read in the dataset. Then check the structure of the data and tally the number of observations in each group.
'data.frame': 252 obs. of 6 variables:
$ quarter : Factor w/ 2 levels "F2019","S2020": 1 1 1 1 1 1 1 1 1 1 ...
$ observer: Factor w/ 23 levels "adguerra","Angrybird",..: 17 17 17 17 17 17 17 17 17 17 ...
$ location: Factor w/ 2 levels "concealed","open": 2 2 1 2 1 2 2 2 2 2 ...
$ behavior: Factor w/ 2 levels "active","perching": 2 2 2 2 1 2 1 1 2 2 ...
$ start : num 41.2 32.2 41.3 79.7 14 ...
$ FID : num 20.2 24.8 31.5 33.5 5.6 ...
# A tibble: 4 x 3
location behavior n
<fct> <fct> <int>
1 concealed active 28
2 concealed perching 30
3 open active 89
4 open perching 105
There are over three times as many birds observed in the open versus concealed. Do you think this is a representative sample of all the birds in the study areas?
Birds might be more visible from farther away if they are out in the open or moving. Did the starting distance (where the bird was spotted) vary depending on the bird’s location or behavior?
YOUR.DATA.SUMMARY <- birds %>%
group_by(location, behavior) %>%
summarize(n= n(),
mean = mean(start),
sd = sd(start),
se = sd(start)/sqrt(n()))
YOUR.DATA.SUMMARY# A tibble: 4 x 6
# Groups: location [2]
location behavior n mean sd se
<fct> <fct> <int> <dbl> <dbl> <dbl>
1 concealed active 28 19.7 10.1 1.92
2 concealed perching 30 22.4 13.9 2.53
3 open active 89 19.8 12.1 1.29
4 open perching 105 21.1 12.6 1.23
ggplot(birds, aes(y=start, x=location, fill=behavior)) +
labs(y = "Starting distance (m)", x= "Location", fill="Behavior") +
geom_boxplot() +
geom_point(position=position_dodge(width=0.75))It looks like the medians for each group were all about 20 m, but a few birds were spotted from much farther away (> 50 m).
Run an analysis of covariance (ANCOVA) testing for effect of two categorical IVs (location, behavior) and one continuous IV (starting distance) on a continuous DV (FID).
library(car)
options(contrasts = c("contr.treatment", "contr.poly")) # we are using contr.treatment to make interpreting coefficients easier. You usually need to use contr.sum for type III sums of squares.
bird.MODEL <- lm(FID ~ start + location + behavior, data=birds)
Anova(bird.MODEL, type = 3) %>%
pander(justify="right", digits=2)| Â | Sum Sq | Df | F value | Pr(>F) |
|---|---|---|---|---|
| (Intercept) | 113 | 1 | 4.4 | 0.038 |
| start | 6565 | 1 | 253 | 9.2e-40 |
| location | 80 | 1 | 3.1 | 0.08 |
| behavior | 38 | 1 | 1.5 | 0.23 |
| Residuals | 6431 | 248 | NA | NA |
The ANCOVA shows that FID differs significantly with starting distance but not behavior or location. But, what are the directions and sizes of the effects?
To find the effect sizes and directions we can extract the estimated coefficients of the model.
(Intercept) start locationopen behaviorperching
1.9133157 0.4171241 -1.3384204 0.7837935
Here is an explanation of what each coefficient represents, the interpretation, whether the effect was statistically significant or not (see ANCOVA table above), and whether the direction of the effect (regardless of whether it was significant) matched our prediction.
| Value | Description | Interpretation | Significant | MatchedPrediction | |
|---|---|---|---|---|---|
| (Intercept) | 1.91 | FID intercept for concealed & other (m) | TRUE | NA | |
| start | 0.42 | Slope of all lines (m FID / m starting distance) | Bird FID increased with starting distance | TRUE | TRUE |
| locationopen | -1.34 | Effect of open versus concealed (m) | Exposed birds flew later than concealed birds (FID decreased this many m) | FALSE | FALSE |
| behaviorperching | 0.78 | Effect of active versus perching (m) | Active birds flew later than perching birds (FID decreased this many m) | FALSE | TRUE |
Why do you think FID increases with starting distance?
What is the R2 of the ANCOVA model? This is the proportion of the variance in the dependent variable that is predictable from the independent variables.
[1] 0.5142093
The R2 is fairly good for ecological data.
Make a scatter plot of FID versus starting distance, with point shapes and linetype set by location and color set by behavior. Then add linear fits for each group. Add a black line that shows the FID if the bird flew away right as you starting walking towards it (the maximum FID).
birds$FID.predict <- predict(bird.MODEL)
ggplot(birds, aes(x = start, y = FID, color = behavior,
shape=location, linetype=location)) +
labs(y="Flight initiation distance (m)", x="Starting distance (m)",
color="Behavior", shape="Location", linetype="Location") +
geom_point(size=2)+
geom_line(aes(y=FID.predict)) +
geom_abline(slope=1, intercept = 0)+
coord_cartesian(xlim=c(0,NA), ylim=c(0,NA)) +
theme(legend.key.width=unit(3,"line"))