1 Overview

This script analyzes the bird behavior data collected by UCI E166L students. Observers walked towards birds at 1 m / s and note the bird’s behavior (perching or other active behaviors, including singing or foraging), the bird’s location relative to cover (concealed or open), their starting distance, and the flight initiation distance (FID) when the bird flies away.

1.1 Predictions

  • Location: Birds that are out in the open are more vulnerable to detection by predators and will fly sooner (larger FID)

  • Behavior : Birds that are singing or foraging are gaining fitness and will fly later (smaller FID)

  • Starting distance: Birds that detect a predator a long ways off incur a cost of vigilance and will fly sooner (larger FID)

2 Read data

Read in the dataset. Then check the structure of the data and tally the number of observations in each group.

'data.frame':   252 obs. of  6 variables:
 $ quarter : Factor w/ 2 levels "F2019","S2020": 1 1 1 1 1 1 1 1 1 1 ...
 $ observer: Factor w/ 23 levels "adguerra","Angrybird",..: 17 17 17 17 17 17 17 17 17 17 ...
 $ location: Factor w/ 2 levels "concealed","open": 2 2 1 2 1 2 2 2 2 2 ...
 $ behavior: Factor w/ 2 levels "active","perching": 2 2 2 2 1 2 1 1 2 2 ...
 $ start   : num  41.2 32.2 41.3 79.7 14 ...
 $ FID     : num  20.2 24.8 31.5 33.5 5.6 ...
# A tibble: 4 x 3
  location  behavior     n
  <fct>     <fct>    <int>
1 concealed active      28
2 concealed perching    30
3 open      active      89
4 open      perching   105

There are over three times as many birds observed in the open versus concealed. Do you think this is a representative sample of all the birds in the study areas?

2.1 Starting distance

Birds might be more visible from farther away if they are out in the open or moving. Did the starting distance (where the bird was spotted) vary depending on the bird’s location or behavior?

# A tibble: 4 x 6
# Groups:   location [2]
  location  behavior     n  mean    sd    se
  <fct>     <fct>    <int> <dbl> <dbl> <dbl>
1 concealed active      28  19.7  10.1  1.92
2 concealed perching    30  22.4  13.9  2.53
3 open      active      89  19.8  12.1  1.29
4 open      perching   105  21.1  12.6  1.23

It looks like the medians for each group were all about 20 m, but a few birds were spotted from much farther away (> 50 m).

3 Statistics

Run an analysis of covariance (ANCOVA) testing for effect of two categorical IVs (location, behavior) and one continuous IV (starting distance) on a continuous DV (FID).

Anova Table (Type III tests)
  Sum Sq Df F value Pr(>F)
(Intercept) 113 1 4.4 0.038
start 6565 1 253 9.2e-40
location 80 1 3.1 0.08
behavior 38 1 1.5 0.23
Residuals 6431 248 NA NA

The ANCOVA shows that FID differs significantly with starting distance but not behavior or location. But, what are the directions and sizes of the effects?

3.1 Effect sizes and directions

To find the effect sizes and directions we can extract the estimated coefficients of the model.

     (Intercept)            start     locationopen behaviorperching 
       1.9133157        0.4171241       -1.3384204        0.7837935 

Here is an explanation of what each coefficient represents, the interpretation, whether the effect was statistically significant or not (see ANCOVA table above), and whether the direction of the effect (regardless of whether it was significant) matched our prediction.

Value Description Interpretation Significant MatchedPrediction
(Intercept) 1.91 FID intercept for concealed & other (m) TRUE NA
start 0.42 Slope of all lines (m FID / m starting distance) Bird FID increased with starting distance TRUE TRUE
locationopen -1.34 Effect of open versus concealed (m) Exposed birds flew later than concealed birds (FID decreased this many m) FALSE FALSE
behaviorperching 0.78 Effect of active versus perching (m) Active birds flew later than perching birds (FID decreased this many m) FALSE TRUE

Why do you think FID increases with starting distance?

3.2 Coefficient of determination

What is the R2 of the ANCOVA model? This is the proportion of the variance in the dependent variable that is predictable from the independent variables.

[1] 0.5142093

The R2 is fairly good for ecological data.