Community differences update feb2026

Evaluating community differences

Q2.2: Is bird community different between yards and street segments?

We answered this statistically with a PERMANOVA, and associated PCoA

Bullet points from our discussion on Wednesday regarding Q2.2

Add some kind of incidence per visit when calculating occurrence
Add a day of year variable if adding the per visit piece
Split this analysis into spring and summer

How I’ve updated the code:

I treated each visit as a different site, so now instead of n(yards)=21 and n(streets)=21, it is n(yards)=147 and n(streets)=147, since in total we visited each yard 7 times
Because in this question we are looking at the differences between yards and streets, and not the differences among yards and streets I didn’t think this would cause any bias
Moreover, I thought making each individual survey a site, it would add weight consistency of use and not just pres/abs

Specific questions I have:

Not sure if this is (statistically/scientifically) legal?
I’m not sure where to incorporate day-of-year

I included the code in case you had any questions on how I went about it.

# Loading in the data
data2024 <- read.csv("2-Cleaned_data/ndg_cleaneddata_2024.csv") #all obs from 2024
data2025 <- read.csv("2-Cleaned_data/ndg_cleaneddata_2025.csv") #all obs from 2025

# Creating a global dataset with the data from 2025&2024
alldata <- bind_rows(data2024, data2025) %>% drop_na(Bird.code)


# Species matrix (site x bird species)
alldata_visits <- alldata %>% 
  unite("SurveyID", Code, Date, remove = TRUE)



# Number of sites for each land use
street_sites2 <- 147
yard_sites2 <- 154


# Species occurence calculations
species_occurrence <- alldata_visits %>%
  group_by(Landtype, Bird.code) %>% 
  summarise(
    # Number of sites a species was observed for each land use
    n_occurrences = n_distinct(SurveyID), 
  ) %>%
  group_by(Bird.code) %>%
  mutate(
    # Total # of sites a species was observed in
    total_sites = sum(n_occurrences) 
  ) %>%
  ungroup() %>%
  pivot_wider(
    names_from = Landtype,
    values_from = c(n_occurrences),
    values_fill = 0
  ) %>%
  mutate(
    # The proportion of sites where X species occurred (streets)
    occurenceprop_street = street / street_sites2,
    # The proportion of sites where X species occurred (yards)
    occurenceprop_yard = yard / yard_sites2, 
    # Calculated the difference in proprtion of occurrences
    occurenceprop_diff = occurenceprop_yard - occurenceprop_street, 
    # Calculating the ratio, adding 0.01 to avoid division by 0
    yard_street_ratio = occurenceprop_yard / (occurenceprop_street + 0.01)  
  ) %>%
  arrange(desc(abs(occurenceprop_diff)))

`summarise()` has grouped output by 'Landtype'. You can override using the
`.groups` argument.

head(species_occurrence)

# A tibble: 6 × 8
  Bird.code total_sites street  yard occurenceprop_street occurenceprop_yard
  <chr>           <int>  <int> <int>                <dbl>              <dbl>
1 SOSP               21      2    19               0.0136             0.123 
2 EUST               60     37    23               0.252              0.149 
3 NOCA               58     23    35               0.156              0.227 
4 AMRO               63     35    28               0.238              0.182 
5 BCCH               36     21    15               0.143              0.0974
6 HOSP              126     64    62               0.435              0.403 
# ℹ 2 more variables: occurenceprop_diff <dbl>, yard_street_ratio <dbl>

community_composition <- species_occurrence %>% 
  ggplot(aes(x = reorder(Bird.code, occurenceprop_diff), 
             y = occurenceprop_diff,
             fill = occurenceprop_diff > 0)) +
  geom_col() +
  coord_flip() +
  labs(x = "Species code", 
       y = "Difference in occurence (Yard - Street)",
       title = "Difference in Species Occurrences") +
  scale_fill_manual(values = c("coral", "skyblue"),
                    labels = c("More in streets", "More in yards"),
                    name = "") +
  theme_minimal()
community_composition

This compared to what I presented

`summarise()` has grouped output by 'Landtype'. You can override using the
`.groups` argument.

The range of percentages contracts a decent amount, not sure if this is a good thing (more conservative results).

Thank you!!