library(tidyverse)
library(broom)
library(data.table)
library(performance)
library(car)
library(rsample)
library(rstatix)
library(lubridate)
library(ggsci)
library(patchwork)
library(sjPlot)
library(sjmisc)
library(sjlabelled)An exploration into the factors of Ovenbird count
eBird_Data <- read.csv("/Users/kathleenstrachota/Desktop/Mount Holyoke/BIOSTATISTICS/Final/eBird_Data.csv")
longbird <- eBird_Data %>%
pivot_longer(Nov_Oven:Mar_mins, names_to = 'name', values_to = 'value') %>%
separate (name, into=c('month','var'), sep = '_') %>%
pivot_wider(names_from = var, values_from = value) %>%
drop_na('Oven')
longbird2<- longbird %>%
filter(Oven!=0) %>%
filter(mins!=0)
head(longbird2)# A tibble: 6 × 17
Site Row_Labels LATITUDE LONGITUDE Elev NDVI_diff FIPS ISO2 ISO3 UN
<chr> <chr> <dbl> <dbl> <int> <dbl> <chr> <chr> <chr> <int>
1 L1009484 L1009484 12.0 -86.2 454 -0.214 NU NI NIC 558
2 L1009484 L1009484 12.0 -86.2 454 -0.214 NU NI NIC 558
3 L1009551 L1009551 13.2 -86.1 1291 -0.0738 NU NI NIC 558
4 L1009551 L1009551 13.2 -86.1 1291 -0.0738 NU NI NIC 558
5 L1009551 L1009551 13.2 -86.1 1291 -0.0738 NU NI NIC 558
6 L1009551 L1009551 13.2 -86.1 1291 -0.0738 NU NI NIC 558
# ℹ 7 more variables: NAME <chr>, AREA <int>, POP2005 <int>, month <chr>,
# Oven <int>, num <int>, mins <int>
Introduction
We had initial interests in analyzing climatic patterns and migratory birds, we found a data set connected to an article that discussed the migratory patterns of Ovenbirds across their breeding and non-breeding ranges. The research aimed to look at the population dynamics of this species across time, place, and season to inform future research in ecology, evolution and conservation of the Ovenbird. They stress that there is a lack of information on the migratory patterns of small songbirds and that the tracking data gained from experiments such as this, helps learn about the population dynamics which, in turn, helps conservation efforts. A portion of their data was available in an Excel spreadsheet. We chose columns that included information about the country and month in which bird counts were done by citizen scientists (many of which had low sample sizes), latitude and longitude, along with NDVI which was a proxy for ground cover (greening or browning). The article states that geolocators were used along with light-level to determine location. This helped the researchers map approximate migratory routes. They found that Ovenbirds migrate from Central America, up through the East Coast of the United States to Saskatchewan Canada and Maine.
The original data set had a lot of repeats so we converted the data from wide format to long format, which gave us a final table with 657 entries. We used a top down approach to find if there was any significance in any of the models we could run. Finding no statistical results nor ones that didn’t violate assumptions, we chose hypotheses to include in a detailed manner in this data report. Here, we will present the extent of our attempts and why we felt that the data yielded no results.
Our first hypothesis was to see if there was any effect of NDVI or month on mean bird count. Because month is associated with NDVI, as NDVI difference increases, we expect the average number of birds to follow the same trend. Second, we tested the effect of elevation in the US on mean bird count. We used longitude and latitude to analyze if there was a difference in NDVI difference across latitude and longitude. We expected that as latitude values approached the equator, there would be a higher NDVI difference. As for longitude, we expected there to be an NDVI difference that would vary across the same latitude, representative of different biomes across the world and even just across the Americas. Next, we did two ANOVAs. First, we looked at whether month mattered in the mean count of Ovenbirds. We expected to see that mean bird count varied by month. Second, we wanted to see if the country where the count was recorded had an effect on the month in which the counts took place. We ran more hypotheses than stated here but we are going to discuss them in our methods section, explaining why they didn’t work and how they violated assumptions. The results section is made up of hypotheses that we did further statistical tests on and as such were able to utilize a variety of different tests, that unfortunately, still yielded no results.
Hallworth, M, T., Scott S, T., Van Wilgenburg, S, L., Hobson, K, A., Marra, P, P. 2015. Migratory connectivity of a Neotropical migratory songbird revealed by archival light level geolocators. Ecological Applications (25)2: 336-347.
Definitions
NDVI_diff: NDVI Differences are NDVI values between November and March. Negative (-) values indicate browning, Positive values (+) indicate greeningOven: is bird count, done by citizen scientists with clipboardsLONGITUDE: Longitude of checklist location (WGS84)LATITUDE: Latitude of checklist location (WGS84)NAME: Countrymonth: Month
Methods
Effects of NDVI and month on mean bird count
\(H_0:\) There is no effect of NDVI or month on mean bird count
\(H_A:\) Because month is associated with NDVI, as NDVI difference increases we expect for the average number of birds to follow the same trend
meancount <- longbird2 %>%
group_by(Oven, NDVI_diff, month) %>%
drop_na(Oven) %>%
summarise(meancount=mean(Oven), sd=sd(Oven), n=n(), se=sd/sqrt(n)) %>%
drop_na(sd)`summarise()` has grouped output by 'Oven', 'NDVI_diff'. You can override using
the `.groups` argument.
lm1<-lm(Oven~NDVI_diff + month, data=meancount)
check_model(lm1)Variable `Component` is not in your data frame :/
Here we created a linear model to study the effects of NDVI and month on mean count. We used a linear model as opposed to an ANOVA because we had multiple numerical values. We decided to use an additive model instead of an interactive one because the interactive model violated collinearity with a high VIF.
When running an assumption check, we found that all assumptions were at least mostly upheld. Reference lines were close to flat and horizontal and data were mostly conformed to said reference lines; there is nothing that stands out as particularly alarming.
Effect of elevation in the US on mean bird count
\(H_0:\) There is no significant effect of elevation in the United States on mean bird count
\(H_A:\) There is a significant effect of elevation in the United States on mean bird count.
longbird_US<-longbird2 %>%
filter(NAME=='United States')
longbirdmean_US<-longbird_US %>%
group_by(Elev) %>%
dplyr::summarize(mean=mean(Oven), error=sd(Oven), n=n(), se=error/sqrt(n))
lmUS <- lm(Oven~Elev, data=longbird_US)
ggplot(data=longbirdmean_US, aes(x=Elev, y=mean))+
geom_point()+
geom_errorbar(aes(x=Elev, ymin=mean-se, ymax=mean+se))+
theme_bw()+
labs(x='Elevation in the US', y='Mean Ovenbird count')+
theme(plot.title=element_text(size=10))Figure 1. Mean oven bird count by elevation in the US with error bars of one standard error
Here we created a linear model to study the effects of elevation in the United States on mean bird count. We used a linear model as opposed to an ANOVA because we had multiple numerical values. Originally, we had aimed to look at how elevation and country affected mean bird count but found very small sample sizes in all countries except for the United States, and thus changed our hypothesis. However, we found an abundance of sample sizes of 1 even within this data because the increments in elevation are so small that many values of elevation only have one sample (Figure 1). Therefore, we decided not to run any statistical analyses.
Effect of longitude on NDVI difference
\(H_0:\) There is no effect of longitude on NDVI difference
\(H_A:\) Across the same latitude, the NDVI difference would vary across longitude indicating different biomes. we would expect more negative values of NDVI difference in areas with a desert
longitude <- lm(NDVI_diff ~ LONGITUDE, longbird2)
check_model(longitude)For this hypothesis we once again used a linear model to study the effects of longitude on NDVI difference because both of our variables were numeric.
After running the assumption check we did not find any large violations, though none of the assumptions were particularly followed either. The lines for linearity, homogeneity of variance, and normality of residuals are all slightly skewed and none of them are entirely flat or horizontal.
Effect of latitude on NDVI difference
\(H_0:\) There is no effect of latitude on NDVI difference
\(H_A:\) As the latitude approach the equator (lower degrees) we expect for there to be a higher NDVI difference
latitude <-lm(NDVI_diff ~ LATITUDE, longbird2)
check_model(latitude)To study the effects of latitude on NDVI difference we used a linear model, using this approach because both values associated with this hypothesis were numerical and we wanted to study the relationship between the values.
According to the assumption check, the assumption of homogeneity of variance is not entirely violated but is close to being so. The reference line is considerably sloped and has data points unevenly clustered in different areas. The assumptions of linearity and normality of residuals are also once again not fully upheld but not fully violated either. The linearity reference line is not fully flat and horizontal and the dots do not perfectly fall along the normality of residuals reference line.
Interactive effects of latitude and longitude on NDVI difference
\(H_0:\) There is no interactive effect of latitude and longitude on NDVI
\(H_A:\) As latitude and longitude are correlated, the NDVI difference across the world will be affected as latitudes and longitudes change
lm(NDVI_diff ~ LONGITUDE*LATITUDE, longbird2)
Call:
lm(formula = NDVI_diff ~ LONGITUDE * LATITUDE, data = longbird2)
Coefficients:
(Intercept) LONGITUDE LATITUDE LONGITUDE:LATITUDE
-0.6872386 -0.0073710 0.0330498 0.0003969
ggplot(data = longbird2, aes(y = NDVI_diff, color = LONGITUDE, x = LATITUDE))+
geom_point() +
geom_smooth(method = "lm") +
theme_classic()+
labs(x='NDVI difference', y='Latitude')+
theme(plot.title=element_text(size=10))`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: colour
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
check_model(lm(NDVI_diff ~ LONGITUDE*LATITUDE, longbird2))Variable `Component` is not in your data frame :/
For this linear model we used the same numerical values as those used in hypotheses 3a and 3b to test the interactive effects of latitude and longitude on NDVI difference. Using the interactive model, we were able to see how latitude and longitude separately affected NDVI difference as well as their effects on each other.
We chose not to run statistical tests on this hypothesis because, when running the assumption test, we found a very high VIF, which violates noncollinearity. In addition, multiple of the reference lines associated with different assumptions are not entirely flat and horizontal.
1 way ANOVA seeing if month mattered in mean count of Oven birds
\(H_0:\) All means are the same
\(H_A:\) The mean bird count for one month is different
cm <- aov(Oven ~ month, longbird2)
check_model(cm)shapiro.test(residuals(cm))
Shapiro-Wilk normality test
data: residuals(cm)
W = 0.48219, p-value < 2.2e-16
longbird2 %>%
group_by(month) %>%
identify_outliers(Oven)# A tibble: 71 × 19
month Site Row_Labels LATITUDE LONGITUDE Elev NDVI_diff FIPS ISO2 ISO3
<chr> <chr> <chr> <dbl> <dbl> <int> <dbl> <chr> <chr> <chr>
1 Dec L10285… L1028554 18.0 -67.2 8 -0.117 RQ PR PRI
2 Dec L10434… L1043414 18.5 -66.5 3 -0.0482 RQ PR PRI
3 Dec L11909… L1190987 14.0 -87.0 784 -0.178 HO HN HND
4 Dec L127407 L127407 26.4 -81.6 11 0.0266 US US USA
5 Dec L12875… L1287546 16.1 -88.9 21 -0.0177 BH BZ BLZ
6 Dec L13070… L1307043 18.4 -70.0 7 -0.144 DR DO DOM
7 Dec L13384… L1338421 16.8 -93.1 1155 -0.242 MX MX MEX
8 Dec L18110… L1811098 18.5 -89.6 277 -0.133 MX MX MEX
9 Dec L18390… L1839062 20.9 -86.9 6 -0.0650 MX MX MEX
10 Dec L18396… L1839600 14.5 -87.1 770 -0.153 HO HN HND
# ℹ 61 more rows
# ℹ 9 more variables: UN <int>, NAME <chr>, AREA <int>, POP2005 <int>,
# Oven <int>, num <int>, mins <int>, is.outlier <lgl>, is.extreme <lgl>
For this hypothesis we used a 1-way ANOVA to see how mean bird count differs by month. We used an ANOVA test as opposed to a linear model because we wanted to compare means within the data, which is an ANOVA function.
When checking ANOVA assumptions, we found that we cannot assume normality of residuals (Shapiro-Wilk test p-value < 2.2e-16). In addition, there were multiple extreme outliers within the data and other assumptions such as homogeneity of variance were not perfectly upheld. We decided to run statistical analyses on this hypothesis despite the violations because we wanted to see what would happen and if the tests would yield any notable results.
2 way ANOVA with count by Country and month
\(H_0:\) There is no effect of country or month or the interaction between the two on the mean bird count
\(H_A:\) There is an effect of country or month or the interaction between the two on the mean bird count
morethan1 <- longbird2 %>%
group_by(NAME) %>%
filter(NAME == 'Bahamas'| NAME == 'Belize' | NAME == 'Cayman Islands'| NAME == 'Costa Rica'| NAME == 'Cuba'| NAME == 'Dominican Republic'| NAME == 'El Salvador'| NAME == 'Guatemala'| NAME == 'Honduras'| NAME == 'Jamaica'| NAME == 'Mexico'| NAME == 'Nicaragua'| NAME == 'Puerto Rico'| NAME == 'United States')
morethan1# A tibble: 651 × 17
# Groups: NAME [14]
Site Row_Labels LATITUDE LONGITUDE Elev NDVI_diff FIPS ISO2 ISO3 UN
<chr> <chr> <dbl> <dbl> <int> <dbl> <chr> <chr> <chr> <int>
1 L10094… L1009484 12.0 -86.2 454 -0.214 NU NI NIC 558
2 L10094… L1009484 12.0 -86.2 454 -0.214 NU NI NIC 558
3 L10095… L1009551 13.2 -86.1 1291 -0.0738 NU NI NIC 558
4 L10095… L1009551 13.2 -86.1 1291 -0.0738 NU NI NIC 558
5 L10095… L1009551 13.2 -86.1 1291 -0.0738 NU NI NIC 558
6 L10095… L1009551 13.2 -86.1 1291 -0.0738 NU NI NIC 558
7 L10139… L1013998 26.4 -81.6 10 0.0266 US US USA 840
8 L10139… L1013998 26.4 -81.6 10 0.0266 US US USA 840
9 L10139… L1013998 26.4 -81.6 10 0.0266 US US USA 840
10 L10139… L1013998 26.4 -81.6 10 0.0266 US US USA 840
# ℹ 641 more rows
# ℹ 7 more variables: NAME <chr>, AREA <int>, POP2005 <int>, month <chr>,
# Oven <int>, num <int>, mins <int>
all <- aov(Oven ~ NAME*month, morethan1)
shapiro.test(residuals(aov(Oven ~ NAME*month, morethan1)))
Shapiro-Wilk normality test
data: residuals(aov(Oven ~ NAME * month, morethan1))
W = 0.63295, p-value < 2.2e-16
morethan1 %>%
group_by(NAME, month) %>%
identify_outliers(Oven)# A tibble: 43 × 19
NAME month Site Row_Labels LATITUDE LONGITUDE Elev NDVI_diff FIPS ISO2
<chr> <chr> <chr> <chr> <dbl> <dbl> <int> <dbl> <chr> <chr>
1 Bahamas Dec L660… L660619 25.1 -77.3 12 -0.0984 BF BS
2 Bahamas Feb L143… L1430887 24.1 -74.5 6 -0.109 BF BS
3 Bahamas Jan L158… L1588633 24.1 -74.5 8 -0.0846 BF BS
4 Bahamas Mar L111… L1114787 25.0 -77.3 7 -0.132 BF BS
5 Belize Feb L280… L280864 17.5 -89.1 122 -0.0285 BH BZ
6 Belize Mar L758… L758817 17.8 -88.7 14 -0.0709 BH BZ
7 Cayman… Nov L845… L845400 19.3 -81.2 7 -0.0935 CJ KY
8 Costa … Dec L183… L1830851 10.8 -85.4 550 -0.0856 CS CR
9 Costa … Dec L441… L441071 10.6 -83.5 17 0.118 CS CR
10 Costa … Nov L447… L447761 9.79 -84.6 40 0.0591 CS CR
# ℹ 33 more rows
# ℹ 9 more variables: ISO3 <chr>, UN <int>, AREA <int>, POP2005 <int>,
# Oven <int>, num <int>, mins <int>, is.outlier <lgl>, is.extreme <lgl>
We ran a 2-way ANOVA for this hypothesis in order to see if country and month had an affect on mean bird count as well as on each other. Instead of a linear model, we used an ANOVA because the majority of our values for this hypothesis were categorical instead of numerical.
Since we are interested in the mean count, we decided to filter the data set to include only countries that had a sample size larger than 3. When checking ANOVA assumptions, we found that we cannot assume normality of residuals (Shapiro-Wilk test p-value < 2.2e-16). In addition, there were multiple extreme outliers within the data. We decided to run statistical analyses on this hypothesis despite the violations because we wanted to see what would happen and if the tests would yield any notable results.
Results and Discussion
Linear Models
meancount <- longbird2 %>%
group_by(Oven, NDVI_diff, month) %>%
drop_na(Oven) %>%
summarise(meancount=mean(Oven), sd=sd(Oven), n=n(), se=sd/sqrt(n)) %>%
drop_na(sd)`summarise()` has grouped output by 'Oven', 'NDVI_diff'. You can override using
the `.groups` argument.
lm1plot<-lm(Oven~NDVI_diff*month, data=meancount) %>%
augment() %>%
ggplot(aes(x=NDVI_diff, y=Oven, color=month)) +
geom_point()+
geom_jitter(aes(x=NDVI_diff, y=Oven, color=month), size = 1, alpha = 0.3)+
theme_classic()+
geom_smooth(method="lm", se=FALSE)+
labs(x='NDVI difference', y='Mean Ovenbird count')+
theme(plot.title=element_text(size=10)) +
scale_color_npg()
lm1plot`geom_smooth()` using formula = 'y ~ x'
Figure 2. Mean count of Ovenbirds by NDVI difference (negative values indicate browning and positive values indicate greening). The color of each line corresponds to a different month.
Table 1. Linear model summary table for effects of NDVI difference and month on mean bird count
summary(lm1)
Call:
lm(formula = Oven ~ NDVI_diff + month, data = meancount)
Residuals:
Min 1Q Median 3Q Max
-1.0885 -0.4779 -0.2622 0.1328 4.4051
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.24699 0.27679 4.505 2.63e-05 ***
NDVI_diff -1.22547 2.12401 -0.577 0.566
monthFeb -0.29069 0.33126 -0.878 0.383
monthJan 0.21039 0.28521 0.738 0.463
monthMar -0.03428 0.30287 -0.113 0.910
monthNov 0.49412 0.37728 1.310 0.195
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8066 on 69 degrees of freedom
Multiple R-squared: 0.09826, Adjusted R-squared: 0.03292
F-statistic: 1.504 on 5 and 69 DF, p-value: 0.2
As seen in Table 1, the p-value for the model as a whole is 0.2 which is greater than our significance level (\(\alpha = 0.05\)) and only 2% of the variance of the data is described by this model (\(R^2 = 0.02\)). We fail to reject the null hypothesis that there is no effect of NDVI_diffor month on mean bird count. If there were statistically significant effects of month or NDVI difference on the mean number of birds seen, we would expect the trend lines to be more spread out rather than overlapping and to have more dramatic slope (Figure 2). We expected for there to be a difference in mean bird count across different months and that there would have been a more variable bird count in terms of the NDVI difference. At higher values of greening, higher NDVI differences, we thought to see more birds. Crossing this data hypothesis with the idea that the more green, the more likely the season is spring, summer or fall, we expected more birds to be counted in the months of March than November, but this was not the case. Since we do not see this, our results indicate that neither month nor difference in vegetation greenness has an effect on the mean number of birds seen.
longitude <- lm(NDVI_diff ~ LONGITUDE, longbird2)
ggplot(data = longbird2, aes(y = NDVI_diff, x = LONGITUDE))+
geom_point() +
geom_smooth(method = "lm") +
theme_classic()+
labs(x='Longitude', y='NDVI difference')+
theme(plot.title=element_text(size=10))`geom_smooth()` using formula = 'y ~ x'
Figure 3. NDVI difference by longitude.
Table 2. Linear model summary table for effect of longitude on NDVI difference
tab_model(longitude)| NDVI diff | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | -0.06 | -0.12 – -0.00 | 0.049 |
| LONGITUDE | 0.00 | -0.00 – 0.00 | 0.817 |
| Observations | 657 | ||
| R2 / R2 adjusted | 0.000 / -0.001 | ||
When we ran a linear model of NDVI difference by longitude, we got a p-value of 0.817. Since this is much higher than our significance threshold (\(\alpha = 0.05\)), we fail to reject the null hypothesis and conclude that there is no statistically significant effect of longitude on the difference in NDVI. This is consistent with the coefficient estimate that shows only a slight difference from zero (Table 2). The graph shows an almost horizontal trendline (Figure 3). These results indicate that longitude is not a good predictor of the difference in vegetation greenness. Across a consistent latitude, the NDVI difference would vary across longitudes, possibly having lower NDVI difference values at biomes such as a desert, where we would expect less greening overall (lower NDVI difference). But here we see that there is no particular trend followed, thus lacking correlation between longitude and the difference NDVI.
latitude <-lm(NDVI_diff ~ LATITUDE, longbird2)
ggplot(data = longbird2, aes(y = NDVI_diff, x = LATITUDE))+
geom_point() +
geom_smooth(method = "lm") +
theme_classic()+
labs(x='Latitude', y='NDVI difference')+
theme(plot.title=element_text(size=10))`geom_smooth()` using formula = 'y ~ x'
Figure 4. NDVI difference by latitude.
Table 3. Linear model summary table for effect of latitude on NDVI difference
tab_model(latitude)| NDVI diff | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | -0.08 | -0.10 – -0.05 | <0.001 |
| LATITUDE | 0.00 | -0.00 – 0.00 | 0.538 |
| Observations | 657 | ||
| R2 / R2 adjusted | 0.001 / -0.001 | ||
After running the statistical analysis on our third hypothesis we got a p-value of 0.538. We therefore fail to reject our null hypothesis. We conclude that there is no statistically significant effect of latitude on the difference in NDVI which indicates that latitude is not a good predictor of the difference in vegetation greenness. This is consistent with the graph which shows an almost horizontal trendline indicating that there is no linear relationship between NDVI difference and latitude (Figure 4). We had thought that there would be an effect of latitude on NDVI difference because as we approach the equator, an area made up of more tropical biomes, there would be a higher NDVI difference value indicative of more greening. However, this does not seem to be the case.
ANOVA
b_mcountm <- longbird2 %>%
group_by(month) %>%
summarise(mean_count = mean(Oven), sd = sd(Oven), n = n(), se = sd/sqrt(n))
p1 <- ggplot(data = b_mcountm, aes(y = mean_count, x = month, color = month)) +
geom_point() +
geom_jitter(data = longbird2, aes(x = month, y = Oven), size = 0.3) +
geom_errorbar(data = b_mcountm, aes(x = month, ymin = mean_count-se, ymax = mean_count+se), width = 0.2) +
labs(y = "Mean Ovenbird count", x = "Month") +
theme_bw() +
scale_color_npg()
p1Figure 5. Mean bird count by month with raw data behind. There are error bars of one standard error and each color represents a month.
p2 <- ggplot(data = b_mcountm, aes(y = mean_count, x = month, color = month)) +
geom_point() +
geom_errorbar(data = b_mcountm, aes(x = month, ymin = mean_count-se, ymax = mean_count+se), width = 0.2) +
labs(y = "Mean Ovenbird count", x = "Month") +
theme_bw() +
scale_color_npg()
p2Figure 6. Mean bird count by month with error bars of 1 standard error. Each color represents a different month.
Table 4. ANOVA summary table for effects of difference in mean count by month
summary(cm) Df Sum Sq Mean Sq F value Pr(>F)
month 4 58 14.38 1.25 0.288
Residuals 652 7500 11.50
The one-way ANOVA resulted in a p-value of 0.288. Therefore we can not reject the null as there is no significant difference of mean count of Ovenbirds between months. This result is consistent with the visual results we see in Figure 6: all error bars are overlapping. With this information, we cannot draw any conclusion regarding the average number of Ovenbirds seen based on what month it is. As our data spans only a few months out of the year, during the non-breeding season, it is hard to know what to expect. With the breeding season spanning April and May, we thought maybe to see more Ovenbirds in March, right before the breeding season, rather than in the winter months. This was not the case. With the data we had available, we could not determine a significant correlation between mean bird count and month.
longbirdmean_nm<-morethan1 %>%
group_by(NAME, month) %>%
dplyr::summarize(mean=mean(Oven), error=sd(Oven), n=n(), se=error/sqrt(n))`summarise()` has grouped output by 'NAME'. You can override using the
`.groups` argument.
ggplot(longbirdmean_nm, aes(x=NAME, y=mean, color=month))+
facet_grid(month ~ . )+
geom_point()+
theme_bw()+
geom_errorbar(aes(x=NAME, ymin=mean-se, ymax=mean+se))+
theme(axis.text.x=element_text(angle=90, vjust=0.5, size=14))+
labs(x='Country', y='Mean Ovenbird Count')+
theme(plot.title=element_text(size=10)) +
scale_color_npg()Figure 7. Mean bird count by country and month with error bars of one standard error.
Table 5. ANOVA summary table for effects of difference in mean count by month and country
summary(all) Df Sum Sq Mean Sq F value Pr(>F)
NAME 13 903 69.47 6.523 1.14e-11 ***
month 4 19 4.78 0.449 0.773
NAME:month 47 383 8.15 0.766 0.872
Residuals 586 6241 10.65
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Table 6. Tukey honestly significant difference test
stat <- as.data.frame(TukeyHSD(all)$NAME)
stat <- stat %>%
arrange(`p adj`)
stat2 <- stat[c(1:15),]
stat2 diff lwr upr p adj
United States-Jamaica -5.805926 -8.1277701 -3.4840818 5.021458e-10
Jamaica-Costa Rica 6.630000 3.8773587 9.3826413 5.024087e-10
Mexico-Jamaica -5.298367 -7.7609507 -2.8357840 6.255477e-10
Jamaica-Guatemala 6.237143 3.2129440 9.2613417 1.413136e-09
Jamaica-Cuba 5.393514 2.5480934 8.2389337 3.237259e-08
Jamaica-Belize 4.965106 2.2444810 7.6857317 1.332033e-07
Jamaica-Bahamas 5.110769 2.2949219 7.9266165 1.623015e-07
Puerto Rico-Jamaica -5.202581 -8.1569486 -2.2482127 4.658774e-07
Jamaica-Cayman Islands 6.022857 2.3540937 9.6916205 4.349089e-06
Jamaica-Dominican Republic 4.746667 1.7703980 7.7229354 1.006139e-05
Jamaica-El Salvador 6.480000 2.3676938 10.5923062 1.407224e-05
Jamaica-Honduras 4.780000 1.4828198 8.0771802 1.174439e-04
Nicaragua-Jamaica -4.963333 -8.8231054 -1.1035613 1.428532e-03
Dominican Republic-Costa Rica 1.883333 -0.7189245 4.4855911 4.542056e-01
Costa Rica-Belize -1.664894 -3.9704028 0.6406155 4.580858e-01
When we ran the ANOVA, we saw that the p-values for month and the interaction between country and month were greater than 0.05 but that country alone was significant (p-value = 1.11e-11). We ran a post-hoc Tukey honestly significant difference test to determine which countries had significantly different mean bird counts (Table 6). The graph shows no clear trend and most all error bars overlapping, however you can see that some countries seemed to, in general, have a higher mean bird count than others (Figure 7). We had hoped to see that over winter months, more birds would be seen in countries in the Southern Hemisphere, as that is when Ovenbirds travel down to Central America but this is not the case. We can therefore conclude that there is not a correlation between month and mean bird count, and that the small sample sizes we had showed little variation in mean bird count across months by country. But that some countries had significantly different average bird counts from other countries.
Conclusion
Despite our attempts, all of our tests either violated assumptions and/or had no statistical significance: we were never able to reject a null hypothesis. We performed a variety of linear models and ANOVAs, using a top-down approach. Attempting both interactive and additive effects, we were unable to yield or discuss any results, all of our expectations were contradicted although we did our best to explain what we thought and where the data fell short. If we had had better sample sizes we might have had a chance to better interpret our results.