An exploration into the factors of Ovenbird count

Author

Kristina Busby, Lillian Stone, and Kathleen Strachota

library(tidyverse)

library(broom)
library(data.table)
library(performance)
library(car) 
library(rsample)
library(rstatix)

library(lubridate)
library(ggsci)
library(patchwork)

library(sjPlot)
library(sjmisc)
library(sjlabelled)

eBird_Data <- read.csv("/Users/kathleenstrachota/Desktop/Mount Holyoke/BIOSTATISTICS/Final/eBird_Data.csv")


longbird <- eBird_Data %>%
  pivot_longer(Nov_Oven:Mar_mins, names_to = 'name', values_to = 'value') %>%
  separate (name, into=c('month','var'), sep = '_') %>%
  pivot_wider(names_from = var, values_from = value) %>%
  drop_na('Oven')

longbird2<- longbird %>%
  filter(Oven!=0) %>%
  filter(mins!=0)

head(longbird2)

# A tibble: 6 × 17
  Site     Row_Labels LATITUDE LONGITUDE  Elev NDVI_diff FIPS  ISO2  ISO3     UN
  <chr>    <chr>         <dbl>     <dbl> <int>     <dbl> <chr> <chr> <chr> <int>
1 L1009484 L1009484       12.0     -86.2   454   -0.214  NU    NI    NIC     558
2 L1009484 L1009484       12.0     -86.2   454   -0.214  NU    NI    NIC     558
3 L1009551 L1009551       13.2     -86.1  1291   -0.0738 NU    NI    NIC     558
4 L1009551 L1009551       13.2     -86.1  1291   -0.0738 NU    NI    NIC     558
5 L1009551 L1009551       13.2     -86.1  1291   -0.0738 NU    NI    NIC     558
6 L1009551 L1009551       13.2     -86.1  1291   -0.0738 NU    NI    NIC     558
# ℹ 7 more variables: NAME <chr>, AREA <int>, POP2005 <int>, month <chr>,
#   Oven <int>, num <int>, mins <int>

Introduction

We had initial interests in analyzing climatic patterns and migratory birds, we found a data set connected to an article that discussed the migratory patterns of Ovenbirds across their breeding and non-breeding ranges. The research aimed to look at the population dynamics of this species across time, place, and season to inform future research in ecology, evolution and conservation of the Ovenbird. They stress that there is a lack of information on the migratory patterns of small songbirds and that the tracking data gained from experiments such as this, helps learn about the population dynamics which, in turn, helps conservation efforts. A portion of their data was available in an Excel spreadsheet. We chose columns that included information about the country and month in which bird counts were done by citizen scientists (many of which had low sample sizes), latitude and longitude, along with NDVI which was a proxy for ground cover (greening or browning). The article states that geolocators were used along with light-level to determine location. This helped the researchers map approximate migratory routes. They found that Ovenbirds migrate from Central America, up through the East Coast of the United States to Saskatchewan Canada and Maine.

The original data set had a lot of repeats so we converted the data from wide format to long format, which gave us a final table with 657 entries. We used a top down approach to find if there was any significance in any of the models we could run. Finding no statistical results nor ones that didn’t violate assumptions, we chose hypotheses to include in a detailed manner in this data report. Here, we will present the extent of our attempts and why we felt that the data yielded no results.

Our first hypothesis was to see if there was any effect of NDVI or month on mean bird count. Because month is associated with NDVI, as NDVI difference increases, we expect the average number of birds to follow the same trend. Second, we tested the effect of elevation in the US on mean bird count. We used longitude and latitude to analyze if there was a difference in NDVI difference across latitude and longitude. We expected that as latitude values approached the equator, there would be a higher NDVI difference. As for longitude, we expected there to be an NDVI difference that would vary across the same latitude, representative of different biomes across the world and even just across the Americas. Next, we did two ANOVAs. First, we looked at whether month mattered in the mean count of Ovenbirds. We expected to see that mean bird count varied by month. Second, we wanted to see if the country where the count was recorded had an effect on the month in which the counts took place. We ran more hypotheses than stated here but we are going to discuss them in our methods section, explaining why they didn’t work and how they violated assumptions. The results section is made up of hypotheses that we did further statistical tests on and as such were able to utilize a variety of different tests, that unfortunately, still yielded no results.

Hallworth, M, T., Scott S, T., Van Wilgenburg, S, L., Hobson, K, A., Marra, P, P. 2015. Migratory connectivity of a Neotropical migratory songbird revealed by archival light level geolocators. Ecological Applications (25)2: 336-347.

Definitions

NDVI_diff: NDVI Differences are NDVI values between November and March. Negative (-) values indicate browning, Positive values (+) indicate greening
Oven: is bird count, done by citizen scientists with clipboards
LONGITUDE: Longitude of checklist location (WGS84)
LATITUDE: Latitude of checklist location (WGS84)
NAME: Country
month: Month

Methods

Effects of NDVI and month on mean bird count

\(H_0:\) There is no effect of NDVI or month on mean bird count

\(H_A:\) Because month is associated with NDVI, as NDVI difference increases we expect for the average number of birds to follow the same trend

meancount <- longbird2 %>%
  group_by(Oven, NDVI_diff, month) %>%
  drop_na(Oven) %>%
  summarise(meancount=mean(Oven), sd=sd(Oven), n=n(), se=sd/sqrt(n)) %>%
  drop_na(sd)

`summarise()` has grouped output by 'Oven', 'NDVI_diff'. You can override using
the `.groups` argument.

lm1<-lm(Oven~NDVI_diff + month, data=meancount)
check_model(lm1)

Variable `Component` is not in your data frame :/

Here we created a linear model to study the effects of NDVI and month on mean count. We used a linear model as opposed to an ANOVA because we had multiple numerical values. We decided to use an additive model instead of an interactive one because the interactive model violated collinearity with a high VIF.

When running an assumption check, we found that all assumptions were at least mostly upheld. Reference lines were close to flat and horizontal and data were mostly conformed to said reference lines; there is nothing that stands out as particularly alarming.

Effect of elevation in the US on mean bird count

\(H_0:\) There is no significant effect of elevation in the United States on mean bird count

\(H_A:\) There is a significant effect of elevation in the United States on mean bird count.

longbird_US<-longbird2 %>%
filter(NAME=='United States')

longbirdmean_US<-longbird_US %>%
group_by(Elev) %>%
dplyr::summarize(mean=mean(Oven), error=sd(Oven), n=n(), se=error/sqrt(n))

lmUS <- lm(Oven~Elev, data=longbird_US)

ggplot(data=longbirdmean_US, aes(x=Elev, y=mean))+
  geom_point()+
  geom_errorbar(aes(x=Elev, ymin=mean-se, ymax=mean+se))+
  theme_bw()+
  labs(x='Elevation in the US', y='Mean Ovenbird count')+
  theme(plot.title=element_text(size=10))

Figure 1. Mean oven bird count by elevation in the US with error bars of one standard error

Here we created a linear model to study the effects of elevation in the United States on mean bird count. We used a linear model as opposed to an ANOVA because we had multiple numerical values. Originally, we had aimed to look at how elevation and country affected mean bird count but found very small sample sizes in all countries except for the United States, and thus changed our hypothesis. However, we found an abundance of sample sizes of 1 even within this data because the increments in elevation are so small that many values of elevation only have one sample (Figure 1). Therefore, we decided not to run any statistical analyses.

Effect of longitude on NDVI difference

\(H_0:\) There is no effect of longitude on NDVI difference

\(H_A:\) Across the same latitude, the NDVI difference would vary across longitude indicating different biomes. we would expect more negative values of NDVI difference in areas with a desert

longitude <- lm(NDVI_diff ~ LONGITUDE, longbird2)
check_model(longitude)

For this hypothesis we once again used a linear model to study the effects of longitude on NDVI difference because both of our variables were numeric.

After running the assumption check we did not find any large violations, though none of the assumptions were particularly followed either. The lines for linearity, homogeneity of variance, and normality of residuals are all slightly skewed and none of them are entirely flat or horizontal.

Effect of latitude on NDVI difference

\(H_0:\) There is no effect of latitude on NDVI difference

\(H_A:\) As the latitude approach the equator (lower degrees) we expect for there to be a higher NDVI difference

latitude <-lm(NDVI_diff ~ LATITUDE, longbird2)
check_model(latitude)

To study the effects of latitude on NDVI difference we used a linear model, using this approach because both values associated with this hypothesis were numerical and we wanted to study the relationship between the values.

According to the assumption check, the assumption of homogeneity of variance is not entirely violated but is close to being so. The reference line is considerably sloped and has data points unevenly clustered in different areas. The assumptions of linearity and normality of residuals are also once again not fully upheld but not fully violated either. The linearity reference line is not fully flat and horizontal and the dots do not perfectly fall along the normality of residuals reference line.

Interactive effects of latitude and longitude on NDVI difference

\(H_0:\) There is no interactive effect of latitude and longitude on NDVI

\(H_A:\) As latitude and longitude are correlated, the NDVI difference across the world will be affected as latitudes and longitudes change

lm(NDVI_diff ~ LONGITUDE*LATITUDE, longbird2)


Call:
lm(formula = NDVI_diff ~ LONGITUDE * LATITUDE, data = longbird2)

Coefficients:
       (Intercept)           LONGITUDE            LATITUDE  LONGITUDE:LATITUDE  
        -0.6872386          -0.0073710           0.0330498           0.0003969

ggplot(data = longbird2, aes(y = NDVI_diff, color = LONGITUDE, x = LATITUDE))+
  geom_point() +
  geom_smooth(method = "lm") +
theme_classic()+
  labs(x='NDVI difference', y='Latitude')+
  theme(plot.title=element_text(size=10))

`geom_smooth()` using formula = 'y ~ x'

Warning: The following aesthetics were dropped during statistical transformation: colour
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

check_model(lm(NDVI_diff ~ LONGITUDE*LATITUDE, longbird2))

Variable `Component` is not in your data frame :/

For this linear model we used the same numerical values as those used in hypotheses 3a and 3b to test the interactive effects of latitude and longitude on NDVI difference. Using the interactive model, we were able to see how latitude and longitude separately affected NDVI difference as well as their effects on each other.

We chose not to run statistical tests on this hypothesis because, when running the assumption test, we found a very high VIF, which violates noncollinearity. In addition, multiple of the reference lines associated with different assumptions are not entirely flat and horizontal.

1 way ANOVA seeing if month mattered in mean count of Oven birds

\(H_0:\) All means are the same

\(H_A:\) The mean bird count for one month is different

cm <- aov(Oven ~ month, longbird2)
check_model(cm)

shapiro.test(residuals(cm))


    Shapiro-Wilk normality test

data:  residuals(cm)
W = 0.48219, p-value < 2.2e-16

longbird2 %>%
  group_by(month) %>%
  identify_outliers(Oven)

# A tibble: 71 × 19
   month Site    Row_Labels LATITUDE LONGITUDE  Elev NDVI_diff FIPS  ISO2  ISO3 
   <chr> <chr>   <chr>         <dbl>     <dbl> <int>     <dbl> <chr> <chr> <chr>
 1 Dec   L10285… L1028554       18.0     -67.2     8   -0.117  RQ    PR    PRI  
 2 Dec   L10434… L1043414       18.5     -66.5     3   -0.0482 RQ    PR    PRI  
 3 Dec   L11909… L1190987       14.0     -87.0   784   -0.178  HO    HN    HND  
 4 Dec   L127407 L127407        26.4     -81.6    11    0.0266 US    US    USA  
 5 Dec   L12875… L1287546       16.1     -88.9    21   -0.0177 BH    BZ    BLZ  
 6 Dec   L13070… L1307043       18.4     -70.0     7   -0.144  DR    DO    DOM  
 7 Dec   L13384… L1338421       16.8     -93.1  1155   -0.242  MX    MX    MEX  
 8 Dec   L18110… L1811098       18.5     -89.6   277   -0.133  MX    MX    MEX  
 9 Dec   L18390… L1839062       20.9     -86.9     6   -0.0650 MX    MX    MEX  
10 Dec   L18396… L1839600       14.5     -87.1   770   -0.153  HO    HN    HND  
# ℹ 61 more rows
# ℹ 9 more variables: UN <int>, NAME <chr>, AREA <int>, POP2005 <int>,
#   Oven <int>, num <int>, mins <int>, is.outlier <lgl>, is.extreme <lgl>

For this hypothesis we used a 1-way ANOVA to see how mean bird count differs by month. We used an ANOVA test as opposed to a linear model because we wanted to compare means within the data, which is an ANOVA function.

When checking ANOVA assumptions, we found that we cannot assume normality of residuals (Shapiro-Wilk test p-value < 2.2e-16). In addition, there were multiple extreme outliers within the data and other assumptions such as homogeneity of variance were not perfectly upheld. We decided to run statistical analyses on this hypothesis despite the violations because we wanted to see what would happen and if the tests would yield any notable results.

2 way ANOVA with count by Country and month

\(H_0:\) There is no effect of country or month or the interaction between the two on the mean bird count

\(H_A:\) There is an effect of country or month or the interaction between the two on the mean bird count

morethan1 <- longbird2 %>%
  group_by(NAME) %>%
  filter(NAME == 'Bahamas'| NAME == 'Belize' | NAME == 'Cayman Islands'| NAME == 'Costa Rica'| NAME == 'Cuba'| NAME == 'Dominican Republic'| NAME == 'El Salvador'| NAME == 'Guatemala'| NAME == 'Honduras'| NAME == 'Jamaica'| NAME == 'Mexico'| NAME == 'Nicaragua'| NAME == 'Puerto Rico'| NAME == 'United States')
morethan1

# A tibble: 651 × 17
# Groups:   NAME [14]
   Site    Row_Labels LATITUDE LONGITUDE  Elev NDVI_diff FIPS  ISO2  ISO3     UN
   <chr>   <chr>         <dbl>     <dbl> <int>     <dbl> <chr> <chr> <chr> <int>
 1 L10094… L1009484       12.0     -86.2   454   -0.214  NU    NI    NIC     558
 2 L10094… L1009484       12.0     -86.2   454   -0.214  NU    NI    NIC     558
 3 L10095… L1009551       13.2     -86.1  1291   -0.0738 NU    NI    NIC     558
 4 L10095… L1009551       13.2     -86.1  1291   -0.0738 NU    NI    NIC     558
 5 L10095… L1009551       13.2     -86.1  1291   -0.0738 NU    NI    NIC     558
 6 L10095… L1009551       13.2     -86.1  1291   -0.0738 NU    NI    NIC     558
 7 L10139… L1013998       26.4     -81.6    10    0.0266 US    US    USA     840
 8 L10139… L1013998       26.4     -81.6    10    0.0266 US    US    USA     840
 9 L10139… L1013998       26.4     -81.6    10    0.0266 US    US    USA     840
10 L10139… L1013998       26.4     -81.6    10    0.0266 US    US    USA     840
# ℹ 641 more rows
# ℹ 7 more variables: NAME <chr>, AREA <int>, POP2005 <int>, month <chr>,
#   Oven <int>, num <int>, mins <int>

all <- aov(Oven ~ NAME*month, morethan1)

shapiro.test(residuals(aov(Oven ~ NAME*month, morethan1)))


    Shapiro-Wilk normality test

data:  residuals(aov(Oven ~ NAME * month, morethan1))
W = 0.63295, p-value < 2.2e-16

morethan1 %>%
  group_by(NAME, month) %>%
  identify_outliers(Oven)

# A tibble: 43 × 19
   NAME    month Site  Row_Labels LATITUDE LONGITUDE  Elev NDVI_diff FIPS  ISO2 
   <chr>   <chr> <chr> <chr>         <dbl>     <dbl> <int>     <dbl> <chr> <chr>
 1 Bahamas Dec   L660… L660619       25.1      -77.3    12   -0.0984 BF    BS   
 2 Bahamas Feb   L143… L1430887      24.1      -74.5     6   -0.109  BF    BS   
 3 Bahamas Jan   L158… L1588633      24.1      -74.5     8   -0.0846 BF    BS   
 4 Bahamas Mar   L111… L1114787      25.0      -77.3     7   -0.132  BF    BS   
 5 Belize  Feb   L280… L280864       17.5      -89.1   122   -0.0285 BH    BZ   
 6 Belize  Mar   L758… L758817       17.8      -88.7    14   -0.0709 BH    BZ   
 7 Cayman… Nov   L845… L845400       19.3      -81.2     7   -0.0935 CJ    KY   
 8 Costa … Dec   L183… L1830851      10.8      -85.4   550   -0.0856 CS    CR   
 9 Costa … Dec   L441… L441071       10.6      -83.5    17    0.118  CS    CR   
10 Costa … Nov   L447… L447761        9.79     -84.6    40    0.0591 CS    CR   
# ℹ 33 more rows
# ℹ 9 more variables: ISO3 <chr>, UN <int>, AREA <int>, POP2005 <int>,
#   Oven <int>, num <int>, mins <int>, is.outlier <lgl>, is.extreme <lgl>

We ran a 2-way ANOVA for this hypothesis in order to see if country and month had an affect on mean bird count as well as on each other. Instead of a linear model, we used an ANOVA because the majority of our values for this hypothesis were categorical instead of numerical.

Since we are interested in the mean count, we decided to filter the data set to include only countries that had a sample size larger than 3. When checking ANOVA assumptions, we found that we cannot assume normality of residuals (Shapiro-Wilk test p-value < 2.2e-16). In addition, there were multiple extreme outliers within the data. We decided to run statistical analyses on this hypothesis despite the violations because we wanted to see what would happen and if the tests would yield any notable results.

Results and Discussion

Linear Models

meancount <- longbird2 %>%
  group_by(Oven, NDVI_diff, month) %>%
  drop_na(Oven) %>%
  summarise(meancount=mean(Oven), sd=sd(Oven), n=n(), se=sd/sqrt(n)) %>%
  drop_na(sd)

`summarise()` has grouped output by 'Oven', 'NDVI_diff'. You can override using
the `.groups` argument.

lm1plot<-lm(Oven~NDVI_diff*month, data=meancount) %>%
  augment() %>%
  ggplot(aes(x=NDVI_diff, y=Oven, color=month)) +
  geom_point()+
  geom_jitter(aes(x=NDVI_diff, y=Oven, color=month), size = 1, alpha = 0.3)+
  theme_classic()+
  geom_smooth(method="lm", se=FALSE)+
   labs(x='NDVI difference', y='Mean Ovenbird count')+
  theme(plot.title=element_text(size=10)) +
  scale_color_npg()

lm1plot

`geom_smooth()` using formula = 'y ~ x'

Figure 2. Mean count of Ovenbirds by NDVI difference (negative values indicate browning and positive values indicate greening). The color of each line corresponds to a different month.

Table 1. Linear model summary table for effects of NDVI difference and month on mean bird count

summary(lm1)


Call:
lm(formula = Oven ~ NDVI_diff + month, data = meancount)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.0885 -0.4779 -0.2622  0.1328  4.4051 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.24699    0.27679   4.505 2.63e-05 ***
NDVI_diff   -1.22547    2.12401  -0.577    0.566    
monthFeb    -0.29069    0.33126  -0.878    0.383    
monthJan     0.21039    0.28521   0.738    0.463    
monthMar    -0.03428    0.30287  -0.113    0.910    
monthNov     0.49412    0.37728   1.310    0.195    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8066 on 69 degrees of freedom
Multiple R-squared:  0.09826,   Adjusted R-squared:  0.03292 
F-statistic: 1.504 on 5 and 69 DF,  p-value: 0.2

As seen in Table 1, the p-value for the model as a whole is 0.2 which is greater than our significance level (\(\alpha = 0.05\)) and only 2% of the variance of the data is described by this model (\(R^2 = 0.02\)). We fail to reject the null hypothesis that there is no effect of NDVI_diffor month on mean bird count. If there were statistically significant effects of month or NDVI difference on the mean number of birds seen, we would expect the trend lines to be more spread out rather than overlapping and to have more dramatic slope (Figure 2). We expected for there to be a difference in mean bird count across different months and that there would have been a more variable bird count in terms of the NDVI difference. At higher values of greening, higher NDVI differences, we thought to see more birds. Crossing this data hypothesis with the idea that the more green, the more likely the season is spring, summer or fall, we expected more birds to be counted in the months of March than November, but this was not the case. Since we do not see this, our results indicate that neither month nor difference in vegetation greenness has an effect on the mean number of birds seen.

longitude <- lm(NDVI_diff ~ LONGITUDE, longbird2)
ggplot(data = longbird2, aes(y = NDVI_diff, x = LONGITUDE))+
  geom_point() +
  geom_smooth(method = "lm") +
theme_classic()+
  labs(x='Longitude', y='NDVI difference')+
  theme(plot.title=element_text(size=10))

`geom_smooth()` using formula = 'y ~ x'

Figure 3. NDVI difference by longitude.

Table 2. Linear model summary table for effect of longitude on NDVI difference

tab_model(longitude)

	NDVI diff
Predictors	Estimates	CI	p
(Intercept)	-0.06	-0.12 – -0.00	0.049
LONGITUDE	0.00	-0.00 – 0.00	0.817
Observations	657
R² / R² adjusted	0.000 / -0.001

When we ran a linear model of NDVI difference by longitude, we got a p-value of 0.817. Since this is much higher than our significance threshold (\(\alpha = 0.05\)), we fail to reject the null hypothesis and conclude that there is no statistically significant effect of longitude on the difference in NDVI. This is consistent with the coefficient estimate that shows only a slight difference from zero (Table 2). The graph shows an almost horizontal trendline (Figure 3). These results indicate that longitude is not a good predictor of the difference in vegetation greenness. Across a consistent latitude, the NDVI difference would vary across longitudes, possibly having lower NDVI difference values at biomes such as a desert, where we would expect less greening overall (lower NDVI difference). But here we see that there is no particular trend followed, thus lacking correlation between longitude and the difference NDVI.

latitude <-lm(NDVI_diff ~ LATITUDE, longbird2)
ggplot(data = longbird2, aes(y = NDVI_diff, x = LATITUDE))+
  geom_point() +
  geom_smooth(method = "lm") +
theme_classic()+
  labs(x='Latitude', y='NDVI difference')+
  theme(plot.title=element_text(size=10))

`geom_smooth()` using formula = 'y ~ x'

Figure 4. NDVI difference by latitude.

Table 3. Linear model summary table for effect of latitude on NDVI difference

tab_model(latitude)

	NDVI diff
Predictors	Estimates	CI	p
(Intercept)	-0.08	-0.10 – -0.05	<0.001
LATITUDE	0.00	-0.00 – 0.00	0.538
Observations	657
R² / R² adjusted	0.001 / -0.001

After running the statistical analysis on our third hypothesis we got a p-value of 0.538. We therefore fail to reject our null hypothesis. We conclude that there is no statistically significant effect of latitude on the difference in NDVI which indicates that latitude is not a good predictor of the difference in vegetation greenness. This is consistent with the graph which shows an almost horizontal trendline indicating that there is no linear relationship between NDVI difference and latitude (Figure 4). We had thought that there would be an effect of latitude on NDVI difference because as we approach the equator, an area made up of more tropical biomes, there would be a higher NDVI difference value indicative of more greening. However, this does not seem to be the case.

ANOVA

b_mcountm <- longbird2 %>%
  group_by(month) %>%
  summarise(mean_count = mean(Oven), sd = sd(Oven), n = n(), se = sd/sqrt(n))

p1 <- ggplot(data = b_mcountm, aes(y = mean_count, x = month, color = month)) +
  geom_point() +
  geom_jitter(data = longbird2, aes(x = month, y = Oven), size = 0.3) +
  geom_errorbar(data = b_mcountm, aes(x = month, ymin = mean_count-se, ymax = mean_count+se), width = 0.2) +
  labs(y = "Mean Ovenbird count", x = "Month") +
  theme_bw() +
  scale_color_npg()

p1

Figure 5. Mean bird count by month with raw data behind. There are error bars of one standard error and each color represents a month.

p2 <- ggplot(data = b_mcountm, aes(y = mean_count, x = month, color = month)) +
  geom_point() +
  geom_errorbar(data = b_mcountm, aes(x = month, ymin = mean_count-se, ymax = mean_count+se), width = 0.2) +
  labs(y = "Mean Ovenbird count", x = "Month") +
  theme_bw() +
  scale_color_npg()
p2

Figure 6. Mean bird count by month with error bars of 1 standard error. Each color represents a different month.

Table 4. ANOVA summary table for effects of difference in mean count by month

summary(cm)

             Df Sum Sq Mean Sq F value Pr(>F)
month         4     58   14.38    1.25  0.288
Residuals   652   7500   11.50

The one-way ANOVA resulted in a p-value of 0.288. Therefore we can not reject the null as there is no significant difference of mean count of Ovenbirds between months. This result is consistent with the visual results we see in Figure 6: all error bars are overlapping. With this information, we cannot draw any conclusion regarding the average number of Ovenbirds seen based on what month it is. As our data spans only a few months out of the year, during the non-breeding season, it is hard to know what to expect. With the breeding season spanning April and May, we thought maybe to see more Ovenbirds in March, right before the breeding season, rather than in the winter months. This was not the case. With the data we had available, we could not determine a significant correlation between mean bird count and month.

longbirdmean_nm<-morethan1 %>%
group_by(NAME, month) %>%
dplyr::summarize(mean=mean(Oven), error=sd(Oven), n=n(), se=error/sqrt(n))

`summarise()` has grouped output by 'NAME'. You can override using the
`.groups` argument.

ggplot(longbirdmean_nm, aes(x=NAME, y=mean, color=month))+
  facet_grid(month ~ . )+
  geom_point()+
  theme_bw()+
  geom_errorbar(aes(x=NAME, ymin=mean-se, ymax=mean+se))+
  theme(axis.text.x=element_text(angle=90, vjust=0.5, size=14))+
  labs(x='Country', y='Mean Ovenbird Count')+
  theme(plot.title=element_text(size=10)) +
  scale_color_npg()

Figure 7. Mean bird count by country and month with error bars of one standard error.

Table 5. ANOVA summary table for effects of difference in mean count by month and country

summary(all)

             Df Sum Sq Mean Sq F value   Pr(>F)    
NAME         13    903   69.47   6.523 1.14e-11 ***
month         4     19    4.78   0.449    0.773    
NAME:month   47    383    8.15   0.766    0.872    
Residuals   586   6241   10.65                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Table 6. Tukey honestly significant difference test

stat <- as.data.frame(TukeyHSD(all)$NAME)
stat <- stat %>%
  arrange(`p adj`)
stat2 <- stat[c(1:15),]
stat2

                                   diff        lwr        upr        p adj
United States-Jamaica         -5.805926 -8.1277701 -3.4840818 5.021458e-10
Jamaica-Costa Rica             6.630000  3.8773587  9.3826413 5.024087e-10
Mexico-Jamaica                -5.298367 -7.7609507 -2.8357840 6.255477e-10
Jamaica-Guatemala              6.237143  3.2129440  9.2613417 1.413136e-09
Jamaica-Cuba                   5.393514  2.5480934  8.2389337 3.237259e-08
Jamaica-Belize                 4.965106  2.2444810  7.6857317 1.332033e-07
Jamaica-Bahamas                5.110769  2.2949219  7.9266165 1.623015e-07
Puerto Rico-Jamaica           -5.202581 -8.1569486 -2.2482127 4.658774e-07
Jamaica-Cayman Islands         6.022857  2.3540937  9.6916205 4.349089e-06
Jamaica-Dominican Republic     4.746667  1.7703980  7.7229354 1.006139e-05
Jamaica-El Salvador            6.480000  2.3676938 10.5923062 1.407224e-05
Jamaica-Honduras               4.780000  1.4828198  8.0771802 1.174439e-04
Nicaragua-Jamaica             -4.963333 -8.8231054 -1.1035613 1.428532e-03
Dominican Republic-Costa Rica  1.883333 -0.7189245  4.4855911 4.542056e-01
Costa Rica-Belize             -1.664894 -3.9704028  0.6406155 4.580858e-01

When we ran the ANOVA, we saw that the p-values for month and the interaction between country and month were greater than 0.05 but that country alone was significant (p-value = 1.11e-11). We ran a post-hoc Tukey honestly significant difference test to determine which countries had significantly different mean bird counts (Table 6). The graph shows no clear trend and most all error bars overlapping, however you can see that some countries seemed to, in general, have a higher mean bird count than others (Figure 7). We had hoped to see that over winter months, more birds would be seen in countries in the Southern Hemisphere, as that is when Ovenbirds travel down to Central America but this is not the case. We can therefore conclude that there is not a correlation between month and mean bird count, and that the small sample sizes we had showed little variation in mean bird count across months by country. But that some countries had significantly different average bird counts from other countries.

Conclusion

Despite our attempts, all of our tests either violated assumptions and/or had no statistical significance: we were never able to reject a null hypothesis. We performed a variety of linear models and ANOVAs, using a top-down approach. Attempting both interactive and additive effects, we were unable to yield or discuss any results, all of our expectations were contradicted although we did our best to explain what we thought and where the data fell short. If we had had better sample sizes we might have had a chance to better interpret our results.