Make sure you have latest R and Rstudio installed before starting this process. These are the R packages that are required to complete the data cleaning and documentation using Rstudio.
Note: The above packages do not come with Rstudio installation, they need to be installed explictly, use the packages tab or just type install.packages(“package_name”).
Next load the R packages:
Reshaped data of Avia Monitoring
| Monitoring_Route | Date | eBird | eachCount | totalCount | H_S | variable | value | Weeks | WeekNoYear | Year | Month | MonthWithYear | season |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Blue Trail | 2018-06-27 | N | 14 | 10 | Seen | GBHE | 2 | 2018-W26 | W26 | 2018 | 6 | 2018-06 | summer |
| Blue Trail | 2016-07-13 | N | 28 | 14 | Heard | MAWR | 1 | 2016-W28 | W28 | 2016 | 7 | 2016-07 | summer |
| Purple Trail | 2017-08-28 | N | 11 | 6 | 2017-W35 | W35 | 2017 | 8 | 2017-08 | summer | |||
| Blue Trail | 2016-07-30 | N | 15 | 5 | Seen | GBHE | 4 | 2016-W30 | W30 | 2016 | 7 | 2016-07 | summer |
| Blue Trail | 2017-03-13 | N | 12 | 4 | Seen | CAGO | 8 | 2017-W11 | W11 | 2017 | 3 | 2017-03 | spring |
Aggregating by each species and week with no year
| WeekNoYear | variable | value |
|---|---|---|
| W14 | GBHE | 1 |
| W22 | WODU | 4 |
| W11 | CAGO | 11 |
| W26 | GBHE | 7 |
| W22 | WOTH | 3 |
Data from eBird
| Week_starting_on | species | Frequency | Total_checklists_submitted | Abundance | Birds_Per_Party_Hour | checklists_reporting_species | High_Count | checklists_reporting_species__1 | Totals | checklists_reporting_species__2 | Average_Count | checklists_reporting_species__3 | variable |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 01-07 | Canada Goose | 37.442681 | 5670 | 50.3238095 | 385.820322 | 2137 | 4500 | 2293 | 304769 | 2187 | 139.354824 | 2187 | CAGO |
| 10-21 | Great Blue Heron | 23.639091 | 4666 | 0.5816545 | 4.145696 | 1128 | 74 | 1144 | 2820 | 1137 | 2.480211 | 1137 | GBHE |
| 07-21 | Wood Thrush | 9.742288 | 5122 | 0.1805935 | 2.445683 | 511 | 12 | 539 | 971 | 527 | 1.842505 | 527 | WOTH |
| 07-31 | Wood Duck | 16.662159 | 3697 | 1.6418718 | 10.848080 | 618 | 86 | 624 | 6171 | 618 | 9.985437 | 618 | WODU |
| 02-07 | Wood Thrush | 0.000000 | 7679 | 0.0000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0.000000 | 0 | WOTH |
Data Summary:
This plot shows the total number of each species based on the csv file. As per the plot we can see that CAGO is the highest in number and WOTH is lowest in number. This refers to that CAGO are the most seen species and GBHE is the second highest in number. Also, the data shows that MAWR and WODU are almost similar in number.
Data Summary:
a, CAGO(Canada Goose) and GBHE(Great Blue Hero) are the same magnitude, MAWR(Marsh Wren), WOTH(Wood Thrush) and WODU(Wood Duck) are another size group in this sample.
b,2017 has the largest sample number, but 2016 and 2018 are not much less than 2017. Different species have different composition ratio.
c, The way of observation in this five species concentrates on Seen. MAWR and WOTH have a large proportion on Heard than Seen. MAWR has no sample on H&S.
d, Most of the proportion in the sample comes from Blue Trail, but most WOTH number comes from Red Trail.
e, Each species has its unique season to be observated.
With season
| variable | autumn | spring | summer | winter |
|---|---|---|---|---|
| CAGO | 22 | 63 | 1 | 31 |
| MAWR | 0 | 1 | 19 | 0 |
| GBHE | 1 | 18 | 85 | 0 |
| WODU | 5 | 3 | 16 | 0 |
| WOTH | 0 | 0 | 14 | 0 |
With Monitoring Route
| variable | Blue Trail | Green Trail | Purple Trail | Red Trail |
|---|---|---|---|---|
| CAGO | 105 | 6 | 2 | 4 |
| MAWR | 20 | 0 | 0 | 0 |
| GBHE | 103 | 1 | 0 | 0 |
| WODU | 24 | 0 | 0 | 0 |
| WOTH | 0 | 1 | 1 | 12 |
With H_S
| variable | H&S | Heard | Seen |
|---|---|---|---|
| CAGO | 2 | 12 | 103 |
| MAWR | 0 | 14 | 6 |
| GBHE | 10 | 10 | 84 |
| WODU | 6 | 0 | 18 |
| WOTH | 1 | 10 | 3 |
With Year
| variable | 2016 | 2017 | 2018 |
|---|---|---|---|
| CAGO | 24 | 72 | 21 |
| MAWR | 11 | 1 | 8 |
| GBHE | 30 | 33 | 41 |
| WODU | 13 | 9 | 2 |
| WOTH | 7 | 2 | 5 |
According to descriptive part, We can be sure each variable with each species in the sample is not homogeneous. But five species have their unique characters on each variable, and we may eplore more relationship on the variables.
The total number of observation in MAWR, WOTH and WODU is not much enough to well estimate the variables difference in the sample. We will only take a trying to test anova group defference on the two species(CAGO, GBHE).
#Randomized Block Design
fit <- aov(value ~ H_S + Year + Monitoring_Route + season, aviaAllu[variable == "CAGO",])
anova(fit)## Analysis of Variance Table
##
## Response: value
## Df Sum Sq Mean Sq F value Pr(>F)
## H_S 2 565.79 282.893 2.0096 0.2488
## Year 1 7.49 7.494 0.0532 0.8288
## Monitoring_Route 3 301.97 100.657 0.7150 0.5924
## season 3 86.87 28.956 0.2057 0.8876
## Residuals 4 563.10 140.774
As the result shows, we can not reject the null hypothesis on 5%, GAGO’s observation number in the four variables has no significant different.
#Randomized Block Design
fit <- aov(value ~ H_S + season + Year + Monitoring_Route, aviaAllu[variable == "GBHE",])
anova(fit)## Analysis of Variance Table
##
## Response: value
## Df Sum Sq Mean Sq F value Pr(>F)
## H_S 2 414.00 207.000 3.7619 0.08733 .
## season 2 356.36 178.182 3.2382 0.11122
## Year 1 15.99 15.988 0.2906 0.60926
## Monitoring_Route 1 61.50 61.499 1.1177 0.33112
## Residuals 6 330.15 55.025
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Still no variable’s p-value low than 5% in GBHE. But as the result of the fit and the alluvial diagram suggest, the independence variables have interaction.
# Two Way Factorial Design
fit <- aov(value ~ season * H_S, aviaAllu[variable == "GBHE",])
anova(fit)## Analysis of Variance Table
##
## Response: value
## Df Sum Sq Mean Sq F value Pr(>F)
## season 2 265.94 132.971 3.2609 0.11002
## H_S 2 504.42 252.210 6.1850 0.03484 *
## season:H_S 2 162.97 81.485 1.9983 0.21622
## Residuals 6 244.67 40.778
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With the involving of the interaction, the H_S variable seems to be significant this time. It means different season has different observation ways number.
In this part, we want to compare the OWC bird data against eBird data. We want to keep the same five indicator spieces as our sample, which are CAGO, MAWR, GNHE, WOTH and WODU. This part mainly focuses on the different observation number of bird species between OWC data and eBird data which we find in the eBird website based on the week. With the limited source and access to eBird data, we only find some plot from eBird website which are related with these five indicator species. Then, we will make some plot of the same species based on OWC bird data.
In order to develop side-by-side comparison of the same details, we use week as our x-axis and observation number as our y-axis, which are the same meaning as eBird data. Then, we can analysis the total number of speices based on the week and see the comparsion betwwen OWC data and eBird data.
Based on the above plots, we can see that there are more data aboout total number of species in eBird dataset. However, with AvianMonitoring OWC dataset, there are lack of the observation number of species. Even we cumulate the three years (2016,2017 and 2018), it still cannot be covered in each week because of lack of data. Thus, we cannot make a very clearly comparison between OWC and eBird data.
Setting the working directory and reading the cleaned data:
setwd("C:/Users/indra/Desktop/DTD_Final/OldWomanCreek/Deliverables/FinalDemo/") # setting the working dir
eagle_raw_data <- read.csv("BaldEagle.csv") # reading the csv data‘eagle_raw_data’ stores the whole dataset from csv file. We make data manupulations of the variable ‘eagle_raw_data’.
NestStatus = eagle_raw_data$NestStatus # select NestStatus from raw data
NestStatus.freq = table(NestStatus)
colors = c("red", "yellow", "green", "violet", "orange", "blue", "pink", "cyan")
barplot(NestStatus.freq, col=colors)theme_set(theme_bw())
neststatus_subset <- subset(eagle_raw_data, select = c("NestStatus", "Temp", "Wind.Velocity.mph.", "Date", "CloudCover"))
g <- ggplot(neststatus_subset, aes(x=Date,y=Temp))
g + geom_point(aes(col=NestStatus)) + geom_smooth(method="lm", se=F) + labs(subtitle="Time Vs Tempearature",
y="Temparature (F)",
x="Date",
title="Scatterplot",
caption = "Source: BaldEagle")Similarly, we want to
g <- ggplot(neststatus_subset, aes(x=Temp,y=Wind.Velocity.mph.))
g + geom_point(aes(col=NestStatus)) + geom_smooth(method="lm", se=F) + labs(subtitle="Temperature(F) Vs Wind Velocity(MPH)",
y="Wind Veloctiy (mph)",
x="Temparature (F)",
title="Scatterplot",
caption = "Source: BaldEagle")eagle_raw_data$temp_z <- round((eagle_raw_data$Temp - mean(eagle_raw_data$Temp))/sd(eagle_raw_data$Temp), 2)
# above and below avg
eagle_raw_data$temp_type <- ifelse(eagle_raw_data$temp_z < 0, "below", "above")
eagle_raw_data < eagle_raw_data[order(eagle_raw_data$temp_z),] #sorting
ggplot(eagle_raw_data, aes(x=NestStatus,y=temp_z, label=temp_z)) + geom_bar(stat='identity', aes(fill=temp_type), width = .5) + scale_fill_manual(name="Temperature", labels = c("Above Average", "Below Average"), values = c("above"="#00ba38", "below"="#f8766d")) + labs(subtitle="Normalised Temperature from 'BaldEagle'", title= "Diverging Bars") + coord_flip()Indra - I worked on Bald eagle, github.
Sun - Worked on eBird Data, github
Kalpana - worked on Indicator species, github and proofreading.