Proportion of samples for which there were strikes

This document shows strike data for systematic observations only.

Data wrangling

I took the data frame and created a new variable for season. Basically, any observation before June is “spring” and any observation after June is “fall.”

bxb$season <- as.character(bxb$Date)
bxb$season [bxb$Date %in% 0:60000] <- "spring"
bxb$season [bxb$Date %in% 60001:123026] <- "fall"

I created a table that shows the number of strikes per building, year, and season. Only the first 6 rows are shown as an example.

strike.table <- bxb %>% group_by(Building, Year, season) %>% 
  summarise(no.strikes=sum(Systematic))
head(strike.table)
## # A tibble: 6 × 4
## # Groups:   Building, Year [6]
##   Building  Year season no.strikes
##   <fct>    <int> <chr>       <int>
## 1 ACT       2019 spring          2
## 2 ACT       2020 spring          1
## 3 ACT       2021 spring          1
## 4 ACT       2022 spring          3
## 5 ACT       2023 spring          2
## 6 CFWC      2019 spring          0

I created a table that shows the proportion of strikes per building, year, and season relative to sampling effort. Again, only the first 6 rows are shown as an example.

total.table <- bxb %>% group_by(Building, Year, season) %>%
  summarise(total.obs=length(Building))
total.table$proportion = c(strike.table$no.strikes / total.table$total.obs)
total.table$Year = factor(total.table$Year)
head(total.table)
## # A tibble: 6 × 5
## # Groups:   Building, Year [6]
##   Building Year  season total.obs proportion
##   <fct>    <fct> <chr>      <int>      <dbl>
## 1 ACT      2019  spring        29     0.0690
## 2 ACT      2020  spring        17     0.0588
## 3 ACT      2021  spring        29     0.0345
## 4 ACT      2022  spring        28     0.107 
## 5 ACT      2023  spring        26     0.0769
## 6 CFWC     2019  spring        29     0

Graphics

Boxplot: proportion of strikes per building.

This combines all years and seasons. Reeve and Sage have the highest.

total.table$Building = factor(total.table$Building)
#levels(total.table$Building)
boxplot(total.table$proportion ~ total.table$Building, xlab = "Building", ylab = "Proportion of samples")

A quick ANOVA shows that there are significant differences but I haven’t sorted out where the significance lies (though one can guess).

summary(aov(total.table$proportion ~ total.table$Building))
##                      Df Sum Sq  Mean Sq F value  Pr(>F)   
## total.table$Building  7 0.2087 0.029822   3.802 0.00412 **
## Residuals            32 0.2510 0.007844                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Scatterplot:

window strikes relative to sampling effort

The colors represent years and the shapes represent seasons.

ggplot(total.table,aes(x=Building,y=proportion, color = Year, shape = season))+
  geom_jitter(width = 0.2, size = 3)+
  labs(x="Building",y="Proportion of samples with strikes",title="Window strikes relative sampling effort")

separated out by year

Again, the colors represent years and the shapes represent seasons.

ggplot(total.table,aes(x=Building,y=proportion, color = Year, shape = season))+
  geom_jitter(size = 2.5)+
  labs(x="Building",y="Proportion of strikes",title="Window strikes as a proportion of sampling effort")+
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
    facet_wrap(~Year)

Heatmaps:

The next three graphs are heat maps that might not be helpful but here they are. For each, the color represents proportion of samples that had strikes with the darker colors having a greater proportion.

strikes per building per year

The color represents proportion of samples that had strikes with the darker colors having a greater proportion.

ggplot(total.table, aes(Year, Building, fill= proportion)) + 
  geom_tile()+
  scale_fill_gradient2(low="lavender", mid = "orchid", high="slateblue", guide="colorbar", midpoint = 0.25)

strikes per season per year

ggplot(total.table, aes(Year, season, fill= proportion)) + 
  geom_tile()+
  scale_fill_gradient2(low="lavender", mid = "orchid", high="slateblue", guide="colorbar", midpoint = 0.25)

strikes per season per building

ggplot(total.table, aes(Building, season, fill= proportion)) + 
  geom_tile()+
  scale_fill_gradient2(low="lavender", mid = "orchid", high="slateblue", guide="colorbar", midpoint = 0.25)

Strikes as a function of family

This section shows strike data by species for systematic and opportunistic observations. There are still quite a few unknowns that are not included but as I fill them in I’ll update the information. That said, you have plenty to present as it is.

Family x building

I created a table that shows the number of strikes per family and building

FxBtable = table(spp$Family, spp$Building)
FxBtable
##                
##                 ACT Albee CCE&D CFWC Clow FDL Halsey Harrington Polk Prkgramp
##   Alcedinidae     0     0     0    0    0   1      0          0    0        0
##   Bombycillidae   0     0     0    0    0   1      0          0    0        0
##   Cardinalidae    0     0     0    0    0   0      0          0    0        0
##   Certhiidae      0     0     0    0    0   0      1          0    0        0
##   Columbidae      1     0     0    0    1   0      0          0    0        0
##   Emberizidae     0     0     0    0    0   1      0          0    0        0
##   Fringillidae    0     0     0    2    1   0      0          0    1        0
##   Mimidae         1     0     0    1    0   0      0          0    0        0
##   Paridae         0     0     0    0    0   0      0          0    0        0
##   Parulidae       2     0     0    1    1   0      0          0    0        1
##   Passerelidae    0     0     0    2    1   0      1          0    0        0
##   Passeridae      0     0     0    0    0   0      0          0    1        0
##   Phasianidae     0     1     0    0    0   0      0          0    0        0
##   Picidae         0     0     0    0    0   0      0          1    0        0
##   Regulidae       0     0     0    0    2   1      2          0    0        0
##   Sittidae        0     0     0    0    0   0      0          0    0        0
##   Sturnidae       0     0     0    0    0   0      0          0    0        0
##   Trochilidae     0     0     0    0    0   0      0          0    0        0
##   Turdidae        3     0     1    0    1   1      2          0    0        0
##   Vireonidae      0     0     0    0    0   0      1          0    0        0
##                
##                 Radford Reeve Sage SRWC UNKNOWN
##   Alcedinidae         0     0    0    0       0
##   Bombycillidae       0     0    0    0       0
##   Cardinalidae        0     1    0    0       0
##   Certhiidae          0     0    0    0       0
##   Columbidae          0     0    0    0       0
##   Emberizidae         0     1    3    0       0
##   Fringillidae        0     0    1    2       0
##   Mimidae             0     0    1    0       0
##   Paridae             0     0    1    0       0
##   Parulidae           0     7   11    0       0
##   Passerelidae        0     2    2    0       0
##   Passeridae          0     1    0    0       0
##   Phasianidae         0     0    0    0       0
##   Picidae             1     1    0    0       0
##   Regulidae           0     4    3    0       1
##   Sittidae            0     0    1    0       0
##   Sturnidae           0     2    0    1       0
##   Trochilidae         0     0    1    0       0
##   Turdidae            0     4    2    1       0
##   Vireonidae          0     1    0    0       0

I then created a vector of the counts from the table.

FxB.counts = as.integer(table(spp$Family, spp$Building))
#FxB.counts
#length(FxB.counts)

# FxB.counts goes through all families for one building

I created a family vector that corresponded to the counts…

FxB.family = rep(c(levels(spp$Family)), times=nlevels(spp$Building))
FxB.family = factor(FxB.family)
#FxB.family
#length(FxB.family)

and a building vector that corresponded to the counts.

FxB.building = rep(c(levels(spp$Building)), each=nlevels(spp$Family))
FxB.building = factor(FxB.building)
#FxB.building
#length(FxB.building)

I combined the new counts, family, and building vectors into one data frame called FxB.

FxB = data.frame(FxB.family, FxB.counts, FxB.building)
#head(FxB)
ggplot(FxB, aes(FxB.family, FxB.building, fill= FxB.counts)) + 
  geom_tile()+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  scale_fill_gradient2(low="lavender", mid = "orchid", high="slateblue", guide="colorbar", midpoint = (max(FxB.counts)/2))+
  labs(x="Familes",y="Buildings",title="Number of strikes per building per Family")

Family x year

Using the same methods as above, I created a data frame that allows me to map strikes per family per year. I don’t show the code here to save space. Please ask if you want to see it.

Family x season

Again, I don’t show the code here to save space. Please ask if you want to see it.