(About this document: This is an R notebook. As such, it is very informal and contains lots of typos and poorly constructed fragment sentences. It’s purpose is to generate converstation among staff. It’s not suitable for publishing in the Daily Camera.)
Reading in a file of bird sampling along line transects and looking through it to understand its dimension. This file will be used to compare some different approaches to converting the raw observation data for the decay in detection probability from an observation. In principle, this inflates the value of detection that are far from the observer.
options(stringsAsFactors = FALSE)
library(readxl)
## Warning: package 'readxl' was built under R version 3.6.2
d <- read_excel('S:/OSMP/ECO_SYSTEMS/WILDLIFE/BIRDS/Distance data analysis/qry.GMAPAllSpp.No.FO.wDistance.and.EnivroData.2017thru2019.xlsx')
names(d)
## [1] "Sample_name" "Property" "Year"
## [4] "BirdSpp" "Number" "Date"
## [7] "Distance" "Angle" "DetectionType"
## [10] "Comments" "CloudIndexBegin" "WindIndexBegin"
## [13] "NoiseIndexBegin" "Description"
The first 8 variables are the main ones of interest. Cloud, Wind, and Noise are covariates that could be used to change the detection probability. I’m not sure what DetectionType is.
Next, we’ll step through each one to get a sense of the study design.
Sample_nametable(d$Sample_name)
##
## AWBM-96 AWBM-97 BWBM-100 BWBM-101 DVBM-98 DVBM-99 EBBM-102 EBBM-103
## 28 39 46 28 39 29 62 23
## EBBM-66 EBBM-67 EBBM-68 EBBM-69 EBBM-70 EBBM-71 EBBM-72 EBBM-73
## 35 29 35 51 46 42 39 31
## EBBM-74 EBBM-75 ETBM-01 ETBM-02 GHBM-104 GHBM-105 GHBM-106 GHBM-107
## 32 42 33 40 34 29 43 46
## GHBM-32 GHBM-33 GHBM-34 GHBM-35 GHBM-36 GHBM-37 GHBM-38 GHBM-39
## 36 33 40 29 26 24 39 31
## IBMBM-50 IBMBM-51 JBM-01 JBM-02 JBM-03 JFBM-52 JFBM-53 JFBM-54
## 32 35 24 48 34 16 15 34
## JMBM-10 JMBM-11 JMBM-12 JMBM-14 JMBM-15 JMBM-16 JMBM-17 JMBM-18
## 50 42 38 35 43 40 26 25
## JMBM-19 JMBM-91 JMBM-92 JMBM-93 JMBM-94 JMBM-95 NBBM-20 NBBM-21
## 29 40 31 54 31 40 46 57
## NBBM-22 NBBM-23 NBBM-24 NBBM-25 NBBM-26 NBBM-27 NBBM-28 NBBM-30
## 44 34 28 27 32 46 35 34
## NBBM-31 NBBM-76 NBBM-77 NBBM-78 NBBM-79 NBBM-80 PGBM-01 PGBM-02
## 47 57 30 51 47 55 32 39
## PGBM-03 PGBM-04 PGBM-05 PGBM-06 PGBM-07 PGBM-08 PGBM-09 PGBM-108
## 40 44 35 45 43 48 44 46
## PGBM-109 PGBM-110 SGBM-40 SGBM-41 SGBM-42 SGBM-43 SGBM-44 SGBM-45
## 53 66 38 44 51 39 38 58
## SGBM-46 SGBM-47 SGBM-60 SGBM-61 SGBM-81 SGBM-82 SGBM-83 SGBM-84
## 31 38 34 42 47 47 40 31
## SGBM-85 SGBM-86 SGBM-87 SGBM-88 SGBM-89 SGBM-90 STBM-58 STBM-59
## 33 43 36 64 42 56 56 64
## TGWB-04 TGWB-15 TGWB-20 TGWB-21 TGWB-23 TGWB-24 TGWB-26 TGWB-27
## 32 34 36 48 35 36 38 30
## TGWB-28 TGWB-29 TGWB-30 TGWB-31 TGWB-32 TGWB-34 TGWB-35 WBBM-55
## 32 39 30 38 43 38 44 29
## WBBM-56 WBBM-57
## 61 55
length(table(d$Sample_name))
## [1] 122
122 different samples. The 4 character code appears to concatenate two different things. Guessing property, and then the 2nd set of 2 vary between BM and WB. Not sure what those are yet.
The number of records per sample varies from around 16 to over 60.
Propertytable(d$Property)
##
## Aweida II Boulder Warehouse Damyanonvich
## 67 74 68
## East Beech Ertl III Gunbarrel Hill
## 467 73 410
## IBM Jafay Jewell Mt.
## 67 65 524
## Joder North Boulder Southern Grasslands
## 106 670 852
## Steinbach Tallgrass West West Rudd
## 120 553 535
## Woods Brothers
## 145
About 16 different properties. Indeed, these seem to correspond the the 1st two characters of the Sample_name.
Some properties have way more observations than others, probably due to an unbalanced distribution of transects (i.e., more transects at West Rudd than IBM). We can check that assumption later.
Yeartable(d$Year)
##
## 2017 2018 2019
## 1641 1553 1602
Three different years, relatively equal number of observations.
BirdSppsort(table(d$BirdSpp))
##
## AMBI BAEA BANS BRSP BRTH
## 1 1 1 1 1
## CANG COHA DICK GBHE GOEA
## 1 1 1 1 1
## GRCA GTGR HAWO INBU MOBL
## 1 1 1 1 1
## OCWA PEFA PYNU RECR SOSP
## 1 1 1 1 1
## SWHA UNK FLYCATCHER WEKE WESJ YRWA
## 1 1 1 1 1
## AMCR BUOW CASP LARB MERL
## 2 2 2 2 2
## NRWS WCSP CHSP CORA DEJU
## 2 2 3 3 3
## WAVI BHGR TRES BCCH WETA
## 3 4 4 5 5
## HOSP EUCD KILL COYE MALL
## 6 7 7 8 8
## ROPI VGSW EAKI ECDO CONI
## 8 9 10 10 11
## NOFL AMGO BLJA ROWR CLSW
## 11 12 12 13 15
## BARS BHCO GTTO SAPH WEWP
## 16 16 16 16 16
## AMKE BGGN UNK HOFI BUOR
## 17 17 18 21 23
## YWAR BLGR BTAH LEGO RTHA
## 23 24 29 29 30
## COGR BRBL YBCH EUST WEKI
## 32 39 41 49 49
## HOWR MODO LAZB AMRO RWBL
## 52 52 53 81 128
## BBMA HOLA LASP SPTO GRSP
## 147 151 169 215 408
## VESP WEME
## 767 1836
length(table(d$BirdSpp))
## [1] 87
87 different species, with a lot of rare detections (i.e., birds with only one record). Really, almost all the data is WEME and VESP. We should keep that in mind as we go forward. According to the Bird Conservancy of the Rockies (or whatever they are called), you need at least 70 records per species for the distance analysis. In this dataset, that would leave us with just 9 species. If we subset our data to just keep these 9, I wonder how it will affect the overall design, in terms on number of transects and properties sampled?
Numberplot(table(d$Number))
Almost all the records have a Number value of one, as expected (single bird observed). One crazy record has a count of 85, a big flock of birds apparently. This will be important to think about as we get into modeling. I wonder what this looks like if we focus on our top 9 species only? We’ll see later.
Datetable(d$Date)
##
## 2017-05-22 2017-05-23 2017-05-24 2017-05-25 2017-05-26 2017-05-30
## 119 145 126 151 143 87
## 2017-05-31 2017-06-02 2017-06-19 2017-06-20 2017-06-21 2017-06-22
## 48 34 174 175 181 131
## 2017-06-26 2017-06-28 2018-05-28 2018-05-29 2018-05-30 2018-05-31
## 94 33 88 91 165 263
## 2018-06-01 2018-06-11 2018-06-12 2018-06-13 2018-06-14 2018-06-15
## 146 88 216 193 257 46
## 2019-01-01 2019-06-03 2019-06-04 2019-06-05 2019-06-06 2019-06-17
## 2 153 249 177 173 150
## 2019-06-18 2019-06-19 2019-06-20 2019-06-21
## 274 213 168 43
length(table(d$Date))
## [1] 34
34 different dates, with 14 in 2017, 10 in 2018, and 10 in 2019. Unbalance among years. Also, note a big range in observations per date. One weird looking date of Jan 1, 2019; guessing that is an error? Also note that no observations were done in May of 2019.
Distanceplot(table(d$Distance))
median(d$Distance, na.rm = T)
## [1] 76
Median distance is 76 (meters?). Left skewed (non-normal) distribution, as I would have expected.
Note, there are some NA values in the dataset that need to be dealt with. Let’s see how many.
length(which(is.na(d$Distance)))
## [1] 175
That’s a lot! What’s the deal?
Angleplot(table(d$Angle))
It appears that the valid range of values for Angle is 0 to 90. So, we have about 5 values that above 90. Remove/fix? Also, I’d like to know why only 90 degrees is the search area…
DetectionTypetable(d$DetectionType)
##
## C C,S C,S,V C,V C,V,FT C,V,S FT FT, V FT,0
## 176 5 5 221 2 5 78 1 1
## FT,C FT,C,V FT,O FT,V FT,V,C O O,FT S S,C
## 8 5 3 26 18 2 1 1263 78
## S,C,V S,V S,V,C S,V,FT S.C SCV V V, C V, FT
## 67 604 8 1 1 1 1143 2 1
## V, S, C V,C V,C,FT V,C,S V,FT V,O V,S V,S,C V.S
## 1 479 3 52 30 1 406 84 1
## VS
## 1
I don’t know what this variable is, but could speculate (I won’t tho). Whatever it is, it’s a bit crazy in the number of levels and variability in formatting. I’m ignoring this for now.
Cloud, Wind, Noisetable(d$CloudIndexBegin)
##
## 0 1 2 3 4 5 6
## 921 1627 698 487 518 502 43
table(d$WindIndexBegin)
##
## 0 1 2 3 4
## 574 1949 1430 655 188
table(d$NoiseIndexBegin)
##
## 0 1 2 3 4 5
## 68 960 1803 1303 660 2
I’m guessing higher values mean more clouds, wind, and noise, and therefore poorer detection. I also notice that each variable has a different maximum value… Can field observers reliably distinguish 7 levels of cloudiness? I suppose it’s possible. Do the functions to calculate distance-corrected density take covariates that are binned like these? Do the covariates need to have the same number of levels? We shall see.
Descriptiontable(d$Description)
##
## Aweida Bird Monitoring 96
## 28
## Aweida Bird Monitoring 97
## 39
## Boulder Warehouse Bird Monitoring 100
## 46
## Boulder Warehouse Bird Monitoring 101
## 28
## Damyanovich Bird Monitoring 98
## 39
## Damyanovich Bird Monitoring 99
## 29
## East Beech Bird Monitoring 102
## 62
## East Beech Bird Monitoring 103
## 23
## East Beech Bird Monitoring 66
## 35
## East Beech Bird Monitoring 67
## 29
## East Beech Bird Monitoring 68
## 35
## East Beech Bird Monitoring 69
## 51
## East Beech Bird Monitoring 70
## 46
## East Beech Bird Monitoring 71
## 42
## East Beech Bird Monitoring 72
## 39
## East Beech Bird Monitoring 73
## 31
## East Beech Bird Monitoring 74
## 32
## East Beech Bird Monitoring 75
## 42
## Ertl Three Bird Monitoring 1
## 33
## Ertl Three Bird Monitoring 2
## 40
## Gunbarrel Hill Bird Monitoring 104
## 34
## Gunbarrel Hill Bird Monitoring 105
## 29
## Gunbarrel Hill Bird Monitoring 106
## 43
## Gunbarrel Hill Bird Monitoring 107
## 46
## Gunbarrel Hill Bird Monitoring 32
## 36
## Gunbarrel Hill Bird Monitoring 33
## 33
## Gunbarrel Hill Bird Monitoring 34
## 40
## Gunbarrel Hill Bird Monitoring 35
## 29
## Gunbarrel Hill Bird Monitoring 36
## 26
## Gunbarrel Hill Bird Monitoring 37
## 24
## Gunbarrel Hill Bird Monitoring 38
## 39
## Gunbarrel Hill Bird Monitoring 39
## 31
## IBM Bird Monitoring 50
## 32
## IBM Bird Monitoring 51
## 35
## Jafay Bird Monitoring 52
## 16
## Jafay Bird Monitoring 53
## 15
## Jafay Bird Monitoring 54
## 34
## Jewell Mountain Bird Monitoring 10
## 50
## Jewell Mountain Bird Monitoring 11
## 42
## Jewell Mountain Bird Monitoring 12
## 38
## Jewell Mountain Bird Monitoring 14
## 35
## Jewell Mountain Bird Monitoring 15
## 43
## Jewell Mountain Bird Monitoring 16
## 40
## Jewell Mountain Bird Monitoring 17
## 26
## Jewell Mountain Bird Monitoring 18
## 25
## Jewell Mountain Bird Monitoring 19
## 29
## Jewell Mountain Bird Monitoring 91
## 40
## Jewell Mountain Bird Monitoring 92
## 31
## Jewell Mountain Bird Monitoring 93
## 54
## Jewell Mountain Bird Monitoring 94
## 31
## Jewell Mountain Bird Monitoring 95
## 40
## Joder Bird Monitoring 1
## 24
## Joder Bird Monitoring 2
## 48
## Joder Bird Monitoring 3
## 34
## North Boulder Bird Monitoring 20
## 46
## North Boulder Bird Monitoring 21
## 57
## North Boulder Bird Monitoring 22
## 44
## North Boulder Bird Monitoring 23
## 34
## North Boulder Bird Monitoring 24
## 28
## North Boulder Bird Monitoring 25
## 27
## North Boulder Bird Monitoring 26
## 32
## North Boulder Bird Monitoring 27
## 46
## North Boulder Bird Monitoring 28
## 35
## North Boulder Bird Monitoring 30
## 34
## North Boulder Bird Monitoring 31
## 47
## North Boulder Bird Monitoring 76
## 57
## North Boulder Bird Monitoring 77
## 30
## North Boulder Bird Monitoring 78
## 51
## North Boulder Bird Monitoring 79
## 47
## North Boulder Bird Monitoring 80
## 55
## Paragliding (West Rudd) Bird Monitoring 1
## 32
## Paragliding (West Rudd) Bird Monitoring 108
## 46
## Paragliding (West Rudd) Bird Monitoring 109
## 53
## Paragliding (West Rudd) Bird Monitoring 110
## 66
## Paragliding (West Rudd) Bird Monitoring 2
## 39
## Paragliding (West Rudd) Bird Monitoring 3
## 40
## Paragliding (West Rudd) Bird Monitoring 4
## 44
## Paragliding (West Rudd) Bird Monitoring 5
## 35
## Paragliding (West Rudd) Bird Monitoring 6
## 45
## Paragliding (West Rudd) Bird Monitoring 7
## 43
## Paragliding (West Rudd) Bird Monitoring 8
## 48
## Paragliding (West Rudd) Bird Monitoring 9
## 44
## Southern Grasslands Bird Monitoring 40
## 38
## Southern Grasslands Bird Monitoring 41
## 44
## Southern Grasslands Bird Monitoring 42
## 51
## Southern Grasslands Bird Monitoring 43
## 39
## Southern Grasslands Bird Monitoring 44
## 38
## Southern Grasslands Bird Monitoring 45
## 58
## Southern Grasslands Bird Monitoring 46
## 31
## Southern Grasslands Bird Monitoring 47
## 38
## Southern Grasslands Bird Monitoring 60
## 34
## Southern Grasslands Bird Monitoring 61
## 42
## Southern Grasslands Bird Monitoring 81
## 47
## Southern Grasslands Bird Monitoring 82
## 47
## Southern Grasslands Bird Monitoring 83
## 40
## Southern Grasslands Bird Monitoring 84
## 31
## Southern Grasslands Bird Monitoring 85
## 33
## Southern Grasslands Bird Monitoring 86
## 43
## Southern Grasslands Bird Monitoring 87
## 36
## Southern Grasslands Bird Monitoring 88
## 64
## Southern Grasslands Bird Monitoring 89
## 42
## Southern Grasslands Bird Monitoring 90
## 56
## Steinbach Bird Monitoring 58
## 56
## Steinbach Bird Monitoring 59
## 64
## Tallgrass West Bird Monitoring 04
## 32
## Tallgrass West Bird Monitoring 15
## 34
## Tallgrass West Bird Monitoring 20
## 36
## Tallgrass West Bird Monitoring 21
## 48
## Tallgrass West Bird Monitoring 23
## 35
## Tallgrass West Bird Monitoring 24
## 36
## Tallgrass West Bird Monitoring 26
## 38
## Tallgrass West Bird Monitoring 27
## 30
## Tallgrass West Bird Monitoring 28
## 32
## Tallgrass West Bird Monitoring 29
## 39
## Tallgrass West Bird Monitoring 30
## 30
## Tallgrass West Bird Monitoring 31
## 38
## Tallgrass West Bird Monitoring 32
## 43
## Tallgrass West Bird Monitoring 34
## 38
## Tallgrass West Bird Monitoring 35
## 44
## Woods Brothers Bird Monitoring 55
## 29
## Woods Brothers Bird Monitoring 56
## 61
## Woods Brothers Bird Monitoring 57
## 55
Description seems to be a concatenation of several of the previous variables. It also shows us what some of the code values in Sampe_name mean; e.g., PG = Paragliding.
So, here’s what we know so far:
Number value (n=85)Date errorDistance.The point of the distance analysis is to calculate density estimates and variance for those estimates for groups of interest. What are the groups we’d be wanting to compare here? I think there’s just two possibilities:
Year n = 3Property n = 16A third group might be the WB and BM values that are seen in the Sample_name, but I don’t know what those are.
Would we want to compare all 16 properties, or drop some of the less sampled ones? Let’s look at number of transects per property.
lapply(split(d, d$Property), function(x){
length(unique(x$Sample_name))
})
## $`Aweida II`
## [1] 2
##
## $`Boulder Warehouse`
## [1] 2
##
## $Damyanonvich
## [1] 2
##
## $`East Beech`
## [1] 12
##
## $`Ertl III`
## [1] 2
##
## $`Gunbarrel Hill`
## [1] 12
##
## $IBM
## [1] 2
##
## $Jafay
## [1] 3
##
## $`Jewell Mt.`
## [1] 14
##
## $Joder
## [1] 3
##
## $`North Boulder`
## [1] 16
##
## $`Southern Grasslands`
## [1] 20
##
## $Steinbach
## [1] 2
##
## $`Tallgrass West`
## [1] 15
##
## $`West Rudd`
## [1] 12
##
## $`Woods Brothers`
## [1] 3
Large range in number of samples per property, likely reflecting different property sizes? As a result, we may end up with high variance for some of these poorly sampled properties, making it difficult to accurately describe differences among them. Are any of these properties adjacent and could therefore be combined? Are there other groupings of interest, like grassland conservation target and land use history, that we should investigate?
For Year, let’s look to make sure that each Sample_name was monitored in each year.
table(d$Sample_name, d$Year)
##
## 2017 2018 2019
## AWBM-96 9 9 10
## AWBM-97 9 16 14
## BWBM-100 13 12 21
## BWBM-101 9 9 10
## DVBM-98 18 11 10
## DVBM-99 12 8 9
## EBBM-102 23 16 23
## EBBM-103 7 8 8
## EBBM-66 13 16 6
## EBBM-67 10 12 7
## EBBM-68 11 17 7
## EBBM-69 17 25 9
## EBBM-70 20 14 12
## EBBM-71 17 13 12
## EBBM-72 14 16 9
## EBBM-73 17 9 5
## EBBM-74 12 12 8
## EBBM-75 13 15 14
## ETBM-01 10 7 16
## ETBM-02 16 13 11
## GHBM-104 15 12 7
## GHBM-105 12 5 12
## GHBM-106 19 16 8
## GHBM-107 20 12 14
## GHBM-32 18 9 9
## GHBM-33 12 12 9
## GHBM-34 15 17 8
## GHBM-35 9 9 11
## GHBM-36 9 10 7
## GHBM-37 12 6 6
## GHBM-38 17 14 8
## GHBM-39 15 9 7
## IBMBM-50 8 10 14
## IBMBM-51 12 12 11
## JBM-01 8 6 10
## JBM-02 16 14 18
## JBM-03 14 8 12
## JFBM-52 9 7 0
## JFBM-53 6 9 0
## JFBM-54 9 14 11
## JMBM-10 20 15 15
## JMBM-11 12 14 16
## JMBM-12 15 6 17
## JMBM-14 11 10 14
## JMBM-15 14 15 14
## JMBM-16 14 11 15
## JMBM-17 9 4 13
## JMBM-18 9 6 10
## JMBM-19 11 9 9
## JMBM-91 13 12 15
## JMBM-92 8 6 17
## JMBM-93 17 20 17
## JMBM-94 9 10 12
## JMBM-95 9 11 20
## NBBM-20 20 11 15
## NBBM-21 20 19 18
## NBBM-22 13 17 14
## NBBM-23 14 15 5
## NBBM-24 11 10 7
## NBBM-25 16 3 8
## NBBM-26 16 6 10
## NBBM-27 18 19 9
## NBBM-28 14 9 12
## NBBM-30 14 12 8
## NBBM-31 18 15 14
## NBBM-76 24 16 17
## NBBM-77 9 10 11
## NBBM-78 22 6 23
## NBBM-79 16 14 17
## NBBM-80 18 12 25
## PGBM-01 9 13 10
## PGBM-02 12 19 8
## PGBM-03 12 18 10
## PGBM-04 11 19 14
## PGBM-05 11 15 9
## PGBM-06 11 23 11
## PGBM-07 12 17 14
## PGBM-08 12 17 19
## PGBM-09 13 12 19
## PGBM-108 11 18 17
## PGBM-109 15 20 18
## PGBM-110 17 22 27
## SGBM-40 13 11 14
## SGBM-41 14 14 16
## SGBM-42 17 16 18
## SGBM-43 12 8 19
## SGBM-44 10 13 15
## SGBM-45 17 18 23
## SGBM-46 10 8 13
## SGBM-47 12 14 12
## SGBM-60 15 7 12
## SGBM-61 10 20 12
## SGBM-81 20 13 14
## SGBM-82 13 17 17
## SGBM-83 14 15 11
## SGBM-84 16 10 5
## SGBM-85 9 10 14
## SGBM-86 16 10 17
## SGBM-87 7 12 17
## SGBM-88 19 17 28
## SGBM-89 11 12 19
## SGBM-90 20 16 20
## STBM-58 19 19 18
## STBM-59 18 18 28
## TGWB-04 13 10 9
## TGWB-15 14 7 13
## TGWB-20 11 11 14
## TGWB-21 16 15 17
## TGWB-23 8 15 12
## TGWB-24 12 9 15
## TGWB-26 18 13 7
## TGWB-27 11 11 8
## TGWB-28 9 12 11
## TGWB-29 12 14 13
## TGWB-30 13 7 10
## TGWB-31 13 15 10
## TGWB-32 15 17 11
## TGWB-34 11 15 12
## TGWB-35 19 13 12
## WBBM-55 3 11 15
## WBBM-56 16 17 28
## WBBM-57 10 18 27
Looks good, no zeroes here. So, we should have plenty of evidence (122 point estimates per year) to compare years.
Number values >> 1