1 .R Packages required

Make sure you have latest R and Rstudio installed before starting this process. These are the R packages that are required to complete the data cleaning and documentation using Rstudio.

knitr - for rendering HTML reports
tidyverse - for data manupulations

Note: The above packages do not come with Rstudio installation, they need to be installed explictly, use the packages tab or just type install.packages(“package_name”).

Next load the R packages:

#install those packages before run the rmd if there is no package following codes:
#install.packages(c("rio","data.table","dplyr","anytime","knitr","kableExtra", "ggplot2","ISOweek","alluvial","lubridate"))
library("knitr")
library("tidyverse")

2 .Avian Monitoring - eBird

2.1 Data Preparation

Reshaped data of Avia Monitoring

We reshape some data format and create some new columns which are easy and convient for the following analysis.

Monitoring_Route	Date	eBird	eachCount	totalCount	H_S	variable	value	Weeks	WeekNoYear	Year	Month	MonthWithYear	season
Purple Trail	2016-08-31	N	17	13				2016-W35	W35	2016	8	2016-08	summer
Green Trail	2017-04-03	N	5	4	Heard	CAGO	2	2017-W14	W14	2017	4	2017-04	spring
Red Trail	2018-07-11	N	8	5				2018-W28	W28	2018	7	2018-07	summer
Green Trail	2016-07-19	N	5	5				2016-W29	W29	2016	7	2016-07	summer
Purple Trail	2018-06-07	N	5	4				2018-W23	W23	2018	6	2018-06	summer

Data from eBird

With the limited access to eBird dataset, we cannot compare this example data with our data directly, we just take this part of example data from eBird website as reference, whcih will make the following analysis easier.

Week_starting_on	species	Frequency	Total_checklists_submitted	Abundance	Birds_Per_Party_Hour	checklists_reporting_species	High_Count	checklists_reporting_species__1	Totals	checklists_reporting_species__2	Average_Count	checklists_reporting_species__3	variable
02-07	Common Yellowthroat	0.0000000	7679	0.0000000	0.000000	0	0	0	0	0	0.000000	0	Common Yellowthroat
01-07	Marsh Wren	0.0529101	5670	0.0005291	2.351564	3	1	3	3	3	1.000000	3	MAWR
08-14	Canada Goose	28.3185841	3503	7.3011704	43.862224	994	350	1029	26305	1000	26.305000	1000	CAGO
01-21	Great Blue Heron	8.4825117	7262	0.1817681	5.347221	633	24	665	1382	662	2.087613	662	GBHE
08-31	Marsh Wren	0.9322974	4505	0.0142064	1.220736	43	4	44	65	43	1.511628	43	MAWR

2.2 Descriptive Analysis

variable	min	median	mean	max
MAWR	1	4	5.000000	11
GBHE	1	3	8.000000	30
WOTH	1	1	2.000000	6
WODU	1	3	3.428571	5
CAGO	1	2	8.357143	29

2.2.1 Cross Table

With season

First, we compare these five speices seasonally which includes autumn, spring, summer and winter.

variable	autumn	spring	summer	winter
CAGO	22	63	1	31
MAWR	0	1	19	0
GBHE	1	18	85	0
WODU	5	3	16	0
WOTH	0	0	14	0

With Monitoring Route

Secondly, we compare thesee five species based on Monitoring_Route which includes Blue Trail, Green Trail, Purpule Trail and Red Trail.

variable	Blue Trail	Green Trail	Purple Trail	Red Trail
CAGO	105	6	2	4
MAWR	20	0	0	0
GBHE	103	1	0	0
WODU	24	0	0	0
WOTH	0	1	1	12

With H_S

Thirdly, we comare these five species based on H_S which means the species both heard and seen, headred, seen seperately.

variable	H&S	Heard	Seen
CAGO	2	12	103
MAWR	0	14	6
GBHE	10	10	84
WODU	6	0	18
WOTH	1	10	3

With Year

Forthly, we coompare these five species yearly which mainly focues on 2016, 2017, 2018 years.

variable	2016	2017	2018
CAGO	24	72	21
MAWR	11	1	8
GBHE	30	33	41
WODU	13	9	2
WOTH	7	2	5

2.3 ANOVA Analysis

According to descriptive part, We can be sure each variable with each species in the sample is not homogeneous. But five species have their unique characters on each variable, and we may eplore more relationship on the variables.

The total number of observation in MAWR, WOTH and WODU is not much enough to well estimate the variables difference in the sample. We will only take a trying to test anova group defference on the two species(CAGO, GBHE).

2.3.1 Canada Goose(CAGO)

#Randomized Block Design
fit <- aov(value ~ H_S + Year + Monitoring_Route + season, aviaAllu[variable == "CAGO",])
anova(fit)

## Analysis of Variance Table
## 
## Response: value
##                  Df Sum Sq Mean Sq F value Pr(>F)
## H_S               2 565.79 282.893  2.0096 0.2488
## Year              1   7.49   7.494  0.0532 0.8288
## Monitoring_Route  3 301.97 100.657  0.7150 0.5924
## season            3  86.87  28.956  0.2057 0.8876
## Residuals         4 563.10 140.774

As the result shows, we can not reject the null hypothesis on 5%, GAGO’s observation number in the four variables has no significant different.

2.3.2 Great blue hero(GBHE)

#Randomized Block Design
fit <- aov(value ~ H_S +  season + Year + Monitoring_Route, aviaAllu[variable == "GBHE",])
anova(fit)

## Analysis of Variance Table
## 
## Response: value
##                  Df Sum Sq Mean Sq F value  Pr(>F)  
## H_S               2 414.00 207.000  3.7619 0.08733 .
## season            2 356.36 178.182  3.2382 0.11122  
## Year              1  15.99  15.988  0.2906 0.60926  
## Monitoring_Route  1  61.50  61.499  1.1177 0.33112  
## Residuals         6 330.15  55.025                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Still no variable’s p-value low than 5% in GBHE. But as the result of the fit and the alluvial diagram suggest, the independence variables have interaction.

# Two Way Factorial Design 
fit <- aov(value ~  season * H_S, aviaAllu[variable == "GBHE",])
anova(fit)

## Analysis of Variance Table
## 
## Response: value
##            Df Sum Sq Mean Sq F value  Pr(>F)  
## season      2 265.94 132.971  3.2609 0.11002  
## H_S         2 504.42 252.210  6.1850 0.03484 *
## season:H_S  2 162.97  81.485  1.9983 0.21622  
## Residuals   6 244.67  40.778                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With the involving of the interaction, the H_S variable seems to be significant this time. It means different season has different observation ways number.

3 .Avian Monitoring - Indicator Species

## 'data.frame':    395 obs. of  21 variables:
##  $ Date : Date, format: "2016-07-10" "2016-07-10" ...
##  $ Month: int  7 7 7 7 7 7 7 7 7 7 ...
##  $ Day  : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ Year : int  2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
##  $ EABL : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WBNU : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ COYE : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ EAKI : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ MALL : int  NA NA NA NA NA NA NA NA NA 1 ...
##  $ CAGO : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ MODO : int  NA NA NA NA NA NA 1 NA 1 NA ...
##  $ MAWR : int  1 NA NA NA 1 NA NA NA NA NA ...
##  $ CATE : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ GREG : int  2 NA 1 1 NA NA NA NA NA NA ...
##  $ WODU : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ GBHE : int  1 1 NA NA NA 1 NA NA NA NA ...
##  $ RBWO : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ WOTH : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ GRFL : int  NA NA NA NA NA NA NA 1 NA NA ...
##  $ AMRE : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ EATO : int  NA NA NA NA NA NA NA NA NA NA ...

##       Date                Month             Day             Year     
##  Min.   :2016-07-10   Min.   : 1.000   Min.   : 1.00   Min.   :2016  
##  1st Qu.:2016-07-29   1st Qu.: 6.000   1st Qu.: 9.00   1st Qu.:2016  
##  Median :2017-04-03   Median : 7.000   Median :17.00   Median :2017  
##  Mean   :2017-05-01   Mean   : 6.706   Mean   :16.37   Mean   :2017  
##  3rd Qu.:2018-04-08   3rd Qu.: 8.000   3rd Qu.:23.00   3rd Qu.:2018  
##  Max.   :2018-08-08   Max.   :12.000   Max.   :31.00   Max.   :2018  
##                                                                      
##       EABL          WBNU            COYE          EAKI     
##  Min.   :1.0   Min.   :1.000   Min.   :1     Min.   :1.00  
##  1st Qu.:1.0   1st Qu.:1.000   1st Qu.:1     1st Qu.:1.00  
##  Median :1.0   Median :1.000   Median :1     Median :1.00  
##  Mean   :1.8   Mean   :1.094   Mean   :1     Mean   :1.25  
##  3rd Qu.:2.0   3rd Qu.:1.000   3rd Qu.:1     3rd Qu.:1.25  
##  Max.   :5.0   Max.   :2.000   Max.   :1     Max.   :2.00  
##  NA's   :385   NA's   :363     NA's   :381   NA's   :391   
##       MALL            CAGO             MODO            MAWR      
##  Min.   :1.000   Min.   : 1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.: 1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :1.000   Median : 2.000   Median :1.000   Median :1.000  
##  Mean   :1.632   Mean   : 3.297   Mean   :1.273   Mean   :1.261  
##  3rd Qu.:1.500   3rd Qu.: 3.000   3rd Qu.:1.750   3rd Qu.:1.000  
##  Max.   :6.000   Max.   :20.000   Max.   :2.000   Max.   :3.000  
##  NA's   :376     NA's   :358      NA's   :373     NA's   :372    
##       CATE          GREG           WODU            GBHE      
##  Min.   :1     Min.   :1.00   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1     1st Qu.:1.00   1st Qu.:1.000   1st Qu.:1.000  
##  Median :1     Median :1.00   Median :1.000   Median :1.000  
##  Mean   :1     Mean   :1.25   Mean   :2.053   Mean   :1.265  
##  3rd Qu.:1     3rd Qu.:1.00   3rd Qu.:3.000   3rd Qu.:1.000  
##  Max.   :1     Max.   :5.00   Max.   :5.000   Max.   :3.000  
##  NA's   :392   NA's   :331    NA's   :376     NA's   :293    
##       RBWO            WOTH            GRFL            AMRE    
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :2    
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:2    
##  Median :1.000   Median :1.000   Median :1.000   Median :2    
##  Mean   :1.082   Mean   :1.077   Mean   :1.077   Mean   :2    
##  3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:2    
##  Max.   :3.000   Max.   :2.000   Max.   :2.000   Max.   :2    
##  NA's   :346     NA's   :382     NA's   :382     NA's   :394  
##       EATO      
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :1.000  
##  Mean   :1.222  
##  3rd Qu.:1.000  
##  Max.   :2.000  
##  NA's   :386

boxplot(avian$EABL, main = 'Eastern Blue Bird', ylab='No. observed')

CAGO.ts<- ts(avian$CAGO,start = c(2016), frequency = 365)
  plot(CAGO.ts, main = 'Canada Goose')

4 Bald Eagle Nesting

Based on Research questions, we have identified the following variables that are useful in deriving our data insights and conclusions. There are two types of data that are observed by the scientists/volunteers in same dataset i.e, Quantitative data and Qualitative data.

4.1 Qualitative Data Variables:

The following data values are descriptive and non-numeric.

NestStatus
NestCondition
BirdsPresent&Plumages
BirdsonNest
AprroxEggs
ApproxChicks

Setting the working directory and reading the cleaned data:

setwd("C:/Users/indra/Desktop/Week12/OldWomanCreek/Deliverables/RScript") # setting the working dir
eagle_raw_data <- read.csv("BaldEagle.csv") # reading the csv data

‘eagle_raw_data’ stores the whole dataset from csv file. We make data manupulations of the variable ‘eagle_raw_data’.

4.1.1 Frequency Distribution of Qualitative Data:

NestStatus = eagle_raw_data$NestStatus # select NestStatus from raw data
NestStatus.freq = table(NestStatus) # Apply the table function

NestStatus.freq stores the frequency of each occurence. Lets view the frequecy distribution:

kable(NestStatus.freq) # view the distributions

NestStatus	Freq
	87
A	4
B	59
BR	33
Branching	82
Building	5
Building/Guarding	4
F	1
Fledged	2
Guarding	4
Guarding	1
H	665
Hatched	85
I	521
Incubating	22
Incubating/Hatched	12
Old Nest	2
Protecting	3
R	86

Similarly, we can get the frequency distribution for other qualitative variables. We are working on using this frequency to get the plots by joining the Quantitative data.

4.1.2 Relative Frequency Distribution of Qualitative Data

The relative frequency distribution of a data variable is a summary of the frequency proportion in a collection of non-overlapping categories. [2]

The relationship of frequency and relative frequency is:

$Relative Frequency = \frac{Frequency}{Sample Size}$

NestStatus.relfreq = NestStatus.freq / nrow(eagle_raw_data) # calculating the relative freq.

Rounding the decimal frequencies to ‘3’ digits.

old = options(digits = 3)
kable(NestStatus.relfreq)

NestStatus	Freq
	0.052
A	0.002
B	0.035
BR	0.020
Branching	0.049
Building	0.003
Building/Guarding	0.002
F	0.001
Fledged	0.001
Guarding	0.002
Guarding	0.001
H	0.396
Hatched	0.051
I	0.310
Incubating	0.013
Incubating/Hatched	0.007
Old Nest	0.001
Protecting	0.002
R	0.051

options(old)

4.1.3 Barplot for Frequency

barplot(NestStatus.freq)

4.1.4 Pie Chart for Frequency

colors = c("red", "yellow", "green", "violet", "orange", "blue", "pink", "cyan")
pie(NestStatus.freq, col=colors)

4.1.5 Category Statistics

H - stands for hatching Nest Status. To find mean temperature(F) for ‘Nest Status’ = H.

Create logical vector for NestStatus = H:

H_NestStatus = NestStatus =='H'

H_eagle_raw_data = eagle_raw_data[H_NestStatus,]

Now, find the mean Temperature(F) of NestStatus = H:

round(mean(H_eagle_raw_data$Temp), digits = 2)

## [1] 59.84

The average mean temperature for NestStatus = Hatching is 59.84

Similarly, summary will show the quantile, median, Min, Max and Mean values for temperature when NestStatus = H:

summary(H_eagle_raw_data$Temp)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   35.00   51.00   55.00   59.84   70.00   90.00

4.2 Quantitative Data Variables:

The following data values are non-descriptive and numeric.

Temp
Wind-Velocity
Wind-Direction
Precipatation
CloudCover

5 .Contributorship

Indra - I worked on Bald eagle, github.

Sun - Worked on eBird Data, github

Kalpana - worked on Indicator species, github and proofreading.

6 .References

Rmarkdown Authoring Basics

R Notebooks

Latex equation editor

QUalitative Stats in R

Create Awesome HTML Table with knitr::kable and kableExtra

Meteorological season

Creating Alluvial Diagrams

Factorial Treatment Structure

Two way - between subject analysis of variance

Bird Observations - eBird

ggplot2 Cheatsheet

InSuKa: Descriptive Statistics and R Script

Indra Chintakayala, Wancheng Sun, Kalapna Jha

October 30, 2018