1 .R Packages required

Make sure you have latest R and Rstudio installed before starting this process. These are the R packages that are required to complete the data cleaning and documentation using Rstudio.

  • knitr - for rendering HTML reports
  • tidyverse - for data manupulations

Note: The above packages do not come with Rstudio installation, they need to be installed explictly, use the packages tab or just type install.packages(“package_name”).

Next load the R packages:

2 .Avian Monitoring - eBird

2.1 Data Preparation

Reshaped data of Avia Monitoring

We reshape some data format and create some new columns which are easy and convient for the following analysis.

Monitoring_Route Date eBird eachCount totalCount H_S variable value Weeks WeekNoYear Year Month MonthWithYear season
Purple Trail 2016-08-31 N 17 13 2016-W35 W35 2016 8 2016-08 summer
Green Trail 2017-04-03 N 5 4 Heard CAGO 2 2017-W14 W14 2017 4 2017-04 spring
Red Trail 2018-07-11 N 8 5 2018-W28 W28 2018 7 2018-07 summer
Green Trail 2016-07-19 N 5 5 2016-W29 W29 2016 7 2016-07 summer
Purple Trail 2018-06-07 N 5 4 2018-W23 W23 2018 6 2018-06 summer

Data from eBird

With the limited access to eBird dataset, we cannot compare this example data with our data directly, we just take this part of example data from eBird website as reference, whcih will make the following analysis easier.

Week_starting_on species Frequency Total_checklists_submitted Abundance Birds_Per_Party_Hour checklists_reporting_species High_Count checklists_reporting_species__1 Totals checklists_reporting_species__2 Average_Count checklists_reporting_species__3 variable
02-07 Common Yellowthroat 0.0000000 7679 0.0000000 0.000000 0 0 0 0 0 0.000000 0 Common Yellowthroat
01-07 Marsh Wren 0.0529101 5670 0.0005291 2.351564 3 1 3 3 3 1.000000 3 MAWR
08-14 Canada Goose 28.3185841 3503 7.3011704 43.862224 994 350 1029 26305 1000 26.305000 1000 CAGO
01-21 Great Blue Heron 8.4825117 7262 0.1817681 5.347221 633 24 665 1382 662 2.087613 662 GBHE
08-31 Marsh Wren 0.9322974 4505 0.0142064 1.220736 43 4 44 65 43 1.511628 43 MAWR

2.2 Descriptive Analysis

variable min median mean max
MAWR 1 4 5.000000 11
GBHE 1 3 8.000000 30
WOTH 1 1 2.000000 6
WODU 1 3 3.428571 5
CAGO 1 2 8.357143 29

2.2.1 Cross Table

With season

First, we compare these five speices seasonally which includes autumn, spring, summer and winter.

variable autumn spring summer winter
CAGO 22 63 1 31
MAWR 0 1 19 0
GBHE 1 18 85 0
WODU 5 3 16 0
WOTH 0 0 14 0

With Monitoring Route

Secondly, we compare thesee five species based on Monitoring_Route which includes Blue Trail, Green Trail, Purpule Trail and Red Trail.

variable Blue Trail Green Trail Purple Trail Red Trail
CAGO 105 6 2 4
MAWR 20 0 0 0
GBHE 103 1 0 0
WODU 24 0 0 0
WOTH 0 1 1 12

With H_S

Thirdly, we comare these five species based on H_S which means the species both heard and seen, headred, seen seperately.

variable H&S Heard Seen
CAGO 2 12 103
MAWR 0 14 6
GBHE 10 10 84
WODU 6 0 18
WOTH 1 10 3

With Year

Forthly, we coompare these five species yearly which mainly focues on 2016, 2017, 2018 years.

variable 2016 2017 2018
CAGO 24 72 21
MAWR 11 1 8
GBHE 30 33 41
WODU 13 9 2
WOTH 7 2 5

2.3 ANOVA Analysis

According to descriptive part, We can be sure each variable with each species in the sample is not homogeneous. But five species have their unique characters on each variable, and we may eplore more relationship on the variables.

The total number of observation in MAWR, WOTH and WODU is not much enough to well estimate the variables difference in the sample. We will only take a trying to test anova group defference on the two species(CAGO, GBHE).

2.3.1 Canada Goose(CAGO)

## Analysis of Variance Table
## 
## Response: value
##                  Df Sum Sq Mean Sq F value Pr(>F)
## H_S               2 565.79 282.893  2.0096 0.2488
## Year              1   7.49   7.494  0.0532 0.8288
## Monitoring_Route  3 301.97 100.657  0.7150 0.5924
## season            3  86.87  28.956  0.2057 0.8876
## Residuals         4 563.10 140.774

As the result shows, we can not reject the null hypothesis on 5%, GAGO’s observation number in the four variables has no significant different.

2.3.2 Great blue hero(GBHE)

## Analysis of Variance Table
## 
## Response: value
##                  Df Sum Sq Mean Sq F value  Pr(>F)  
## H_S               2 414.00 207.000  3.7619 0.08733 .
## season            2 356.36 178.182  3.2382 0.11122  
## Year              1  15.99  15.988  0.2906 0.60926  
## Monitoring_Route  1  61.50  61.499  1.1177 0.33112  
## Residuals         6 330.15  55.025                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Still no variable’s p-value low than 5% in GBHE. But as the result of the fit and the alluvial diagram suggest, the independence variables have interaction.

## Analysis of Variance Table
## 
## Response: value
##            Df Sum Sq Mean Sq F value  Pr(>F)  
## season      2 265.94 132.971  3.2609 0.11002  
## H_S         2 504.42 252.210  6.1850 0.03484 *
## season:H_S  2 162.97  81.485  1.9983 0.21622  
## Residuals   6 244.67  40.778                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With the involving of the interaction, the H_S variable seems to be significant this time. It means different season has different observation ways number.

3 .Avian Monitoring - Indicator Species

## 'data.frame':    395 obs. of  21 variables:
##  $ Date : Date, format: "2016-07-10" "2016-07-10" ...
##  $ Month: int  7 7 7 7 7 7 7 7 7 7 ...
##  $ Day  : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ Year : int  2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
##  $ EABL : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WBNU : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ COYE : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ EAKI : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ MALL : int  NA NA NA NA NA NA NA NA NA 1 ...
##  $ CAGO : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ MODO : int  NA NA NA NA NA NA 1 NA 1 NA ...
##  $ MAWR : int  1 NA NA NA 1 NA NA NA NA NA ...
##  $ CATE : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ GREG : int  2 NA 1 1 NA NA NA NA NA NA ...
##  $ WODU : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ GBHE : int  1 1 NA NA NA 1 NA NA NA NA ...
##  $ RBWO : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WOTH : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ GRFL : int  NA NA NA NA NA NA NA 1 NA NA ...
##  $ AMRE : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ EATO : int  NA NA NA NA NA NA NA NA NA NA ...
##       Date                Month             Day             Year     
##  Min.   :2016-07-10   Min.   : 1.000   Min.   : 1.00   Min.   :2016  
##  1st Qu.:2016-07-29   1st Qu.: 6.000   1st Qu.: 9.00   1st Qu.:2016  
##  Median :2017-04-03   Median : 7.000   Median :17.00   Median :2017  
##  Mean   :2017-05-01   Mean   : 6.706   Mean   :16.37   Mean   :2017  
##  3rd Qu.:2018-04-08   3rd Qu.: 8.000   3rd Qu.:23.00   3rd Qu.:2018  
##  Max.   :2018-08-08   Max.   :12.000   Max.   :31.00   Max.   :2018  
##                                                                      
##       EABL          WBNU            COYE          EAKI     
##  Min.   :1.0   Min.   :1.000   Min.   :1     Min.   :1.00  
##  1st Qu.:1.0   1st Qu.:1.000   1st Qu.:1     1st Qu.:1.00  
##  Median :1.0   Median :1.000   Median :1     Median :1.00  
##  Mean   :1.8   Mean   :1.094   Mean   :1     Mean   :1.25  
##  3rd Qu.:2.0   3rd Qu.:1.000   3rd Qu.:1     3rd Qu.:1.25  
##  Max.   :5.0   Max.   :2.000   Max.   :1     Max.   :2.00  
##  NA's   :385   NA's   :363     NA's   :381   NA's   :391   
##       MALL            CAGO             MODO            MAWR      
##  Min.   :1.000   Min.   : 1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.: 1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :1.000   Median : 2.000   Median :1.000   Median :1.000  
##  Mean   :1.632   Mean   : 3.297   Mean   :1.273   Mean   :1.261  
##  3rd Qu.:1.500   3rd Qu.: 3.000   3rd Qu.:1.750   3rd Qu.:1.000  
##  Max.   :6.000   Max.   :20.000   Max.   :2.000   Max.   :3.000  
##  NA's   :376     NA's   :358      NA's   :373     NA's   :372    
##       CATE          GREG           WODU            GBHE      
##  Min.   :1     Min.   :1.00   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1     1st Qu.:1.00   1st Qu.:1.000   1st Qu.:1.000  
##  Median :1     Median :1.00   Median :1.000   Median :1.000  
##  Mean   :1     Mean   :1.25   Mean   :2.053   Mean   :1.265  
##  3rd Qu.:1     3rd Qu.:1.00   3rd Qu.:3.000   3rd Qu.:1.000  
##  Max.   :1     Max.   :5.00   Max.   :5.000   Max.   :3.000  
##  NA's   :392   NA's   :331    NA's   :376     NA's   :293    
##       RBWO            WOTH            GRFL            AMRE    
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :2    
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:2    
##  Median :1.000   Median :1.000   Median :1.000   Median :2    
##  Mean   :1.082   Mean   :1.077   Mean   :1.077   Mean   :2    
##  3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:2    
##  Max.   :3.000   Max.   :2.000   Max.   :2.000   Max.   :2    
##  NA's   :346     NA's   :382     NA's   :382     NA's   :394  
##       EATO      
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :1.000  
##  Mean   :1.222  
##  3rd Qu.:1.000  
##  Max.   :2.000  
##  NA's   :386

4 Bald Eagle Nesting

Based on Research questions, we have identified the following variables that are useful in deriving our data insights and conclusions. There are two types of data that are observed by the scientists/volunteers in same dataset i.e, Quantitative data and Qualitative data.

4.1 Qualitative Data Variables:

The following data values are descriptive and non-numeric.

  • NestStatus
  • NestCondition
  • BirdsPresent&Plumages
  • BirdsonNest
  • AprroxEggs
  • ApproxChicks

Setting the working directory and reading the cleaned data:

‘eagle_raw_data’ stores the whole dataset from csv file. We make data manupulations of the variable ‘eagle_raw_data’.

4.1.1 Frequency Distribution of Qualitative Data:

NestStatus.freq stores the frequency of each occurence. Lets view the frequecy distribution:

NestStatus Freq
87
A 4
B 59
BR 33
Branching 82
Building 5
Building/Guarding 4
F 1
Fledged 2
Guarding 4
Guarding 1
H 665
Hatched 85
I 521
Incubating 22
Incubating/Hatched 12
Old Nest 2
Protecting 3
R 86

Similarly, we can get the frequency distribution for other qualitative variables. We are working on using this frequency to get the plots by joining the Quantitative data.

4.1.2 Relative Frequency Distribution of Qualitative Data

The relative frequency distribution of a data variable is a summary of the frequency proportion in a collection of non-overlapping categories. [2]

The relationship of frequency and relative frequency is:

Relative Frequency =  \frac{Frequency}{Sample Size}

Rounding the decimal frequencies to ‘3’ digits.

NestStatus Freq
0.052
A 0.002
B 0.035
BR 0.020
Branching 0.049
Building 0.003
Building/Guarding 0.002
F 0.001
Fledged 0.001
Guarding 0.002
Guarding 0.001
H 0.396
Hatched 0.051
I 0.310
Incubating 0.013
Incubating/Hatched 0.007
Old Nest 0.001
Protecting 0.002
R 0.051

4.1.3 Barplot for Frequency

4.1.5 Category Statistics

H - stands for hatching Nest Status. To find mean temperature(F) for ‘Nest Status’ = H.

Create logical vector for NestStatus = H:

Now, find the mean Temperature(F) of NestStatus = H:

## [1] 59.84

The average mean temperature for NestStatus = Hatching is 59.84

Similarly, summary will show the quantile, median, Min, Max and Mean values for temperature when NestStatus = H:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   35.00   51.00   55.00   59.84   70.00   90.00

4.2 Quantitative Data Variables:

The following data values are non-descriptive and numeric.

  • Temp
  • Wind-Velocity
  • Wind-Direction
  • Precipatation
  • CloudCover

5 .Contributorship

Indra - I worked on Bald eagle, github.

Sun - Worked on eBird Data, github

Kalpana - worked on Indicator species, github and proofreading.