Week_3_DataDive

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

wildlife <- read_delim("./Urban_Wildlife_Response.csv", delim = ",")

## Rows: 6385 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): DT_Initial, DT_Response, Borough, Property, Location, Species, Cal...
## dbl  (3): Response_Duration, Num_of_Animals, Hours_Monitoring
## lgl  (4): PEP_Response, Animal_Monitored, Police_Response, ESU_Response
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Call Source

This group-by has incidents grouped by who called in the animal response needed. There are several options for who could call in an incident, such as employees, rescue organizations, or random citizens going about their day.

The least likely option for call sources is Wildlife in Need of Rescue and Rehabilitation (WINORR), a non-profit organization. As a volunteer organization, they likely have one of the smallest number of people associated with them, along with a relatively small area of influence. Given the limited number of people associated with the organization and the limited range, they are less likely to run into urban wildlife. The WBF is the Wild Bird Fund, which is another non-profit in the NYC area. Due to the two organizations being similar, I’ve applied the “NP” (stands for Non Profit) tag.

wildlife |>
  group_by(Call_Source) |>
  summarise(Instances = n()) |>
  arrange(Instances)

## # A tibble: 8 × 2
##   Call_Source                           Instances
##   <chr>                                     <int>
## 1 "WINORR"                                      3
## 2 "WBF"                                        99
## 3 "Other"                                     169
## 4 "Observed by Ranger"                        442
## 5 "Conservancies/\"Friends of\" Groups"       638
## 6 "Central"                                  1398
## 7 "Public"                                   1622
## 8 "Employee"                                 2014

wildlife$NP <- ifelse(wildlife$Call_Source %in% c("WINORR","WBF"), "NP", "")

Hypothesis

Some groups having a lower probability of calling in an incident due to having fewer members associated with the group (for example - there are few rangers than there are people in the general public).

Visualization

wildlife |>
  ggplot() +
  geom_bar(mapping = aes(y = Call_Source)) +
  theme_minimal() +
  scale_fill_brewer(palette = 'Dark2')

Incident by Ages

This group-by was purely exploratory - I was curious to see what values existed and wasn’t expecting to find anything particularly interesting. However, the results below are definitely interesting, mostly in that they are not what I would call “clean data”. There are several instances where the same data is input, however the formatting and order between entries differ, which is causing noise in the group-by.

To make this column more useful in the future, I’ve gone into my copy of the dataset and created a new copy with the values for this column standardized, so that “Adult, Infant, Juvenile” and “Adult, Juvenile, Infant” are grouped as the same in future data dives.

wildlife |>
  group_by(Age) |>
  summarise(Instances = n()) |>
  arrange(Instances)

## # A tibble: 24 × 2
##    Age                                   Instances
##    <chr>                                     <int>
##  1 "Adult, Infant, Juvenile"                     1
##  2 "Adult, Juvenile, Infant"                     1
##  3 "Infant, Adult"                               1
##  4 "Juvenile, Infant"                            1
##  5 "Juvenile;#Infant"                            1
##  6 "[\"Infant\",\"Adult\",\"Juvenile\"]"         1
##  7 "[\"Juvenile\",\"Adult\"]"                    1
##  8 "[\"Juvenile\",\"Infant\"]"                   1
##  9 "Infant;#Adult"                               2
## 10 "Juvenile, Adult"                             2
## # ℹ 14 more rows

Due to infants being the least likely to be found by urban rangers, I have added the “Infant” tag to those records.

wildlife$Infant <- ifelse(wildlife$Age %in% "Infant", "Infant", "")

Hypothesis

I hypothesize that it is less likely to find groups of animals of different ages than either single animals or animals of the same age. While it would make sense for infant animals to be found with adults, adult and juveniles I would expect to be more likely to be on their own or paired with (an) animal(s) of a similar age.

Visualization

The below visualization shows how many records contain an animal of the corresponding age group. Because some records had multiple animals, sometimes with differing ages, the total sum of the columns is more than the sum of each record. Given that animals are only infants for so long, it is not surprising that it is the smallest category. Juvenile and Adult follow in a similar pattern, animals tend to be in their adult stages for much longer than either infant or juvenile stages, and there’s more adult animals than infant or juvenile animals at any given time.

Age_Instance <- data.frame(
  Age_Group = c("Infant", "Juvenile", "Adult"),
  Count_Instance = c(sum(str_count(wildlife$Age, pattern = "Infant"), na.rm = FALSE),
                     sum(str_count(wildlife$Age, pattern = "Juvenile"), na.rm = FALSE),
                     sum(str_count(wildlife$Age, pattern = "Adult"), na.rm = FALSE))
)

Age_Instance |>
  ggplot(Age_Instance, mapping = aes(x=Age_Group, y=Count_Instance)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  scale_fill_brewer(palette = 'Dark2')

Responses by Borough and Species Status

Are there certain boroughs with a higher percentage of certain types of species? For example, are there places where a response call for an exotic animal is more likely to occur?

The lowest probabilities for this dataset are for N/A and Exotic species types - the Exotic types makes sense, as you’d expect to find less exotic animals roaming the streets of NYC than native or domestic animals. The N/A column is one that requires more digging - my initial thought is that the N/A species types are likely for response calls where the animals aren’t found when the rangers arrive on scene.

The highest probabilities for this dataset are for domestic animals, which makes sense, as domestic animals (such as pets) are likely more common than wildlife, and people are more likely to call about a dog or cat roaming around or in distress than maybe a raccoon or possum.

wildlife |>
  group_by(Borough, Species_Status) |>
  summarise(Instances = n()) |>
  arrange(Instances)

## `summarise()` has grouped output by 'Borough'. You can override using the
## `.groups` argument.

## # A tibble: 25 × 3
## # Groups:   Borough [5]
##    Borough       Species_Status Instances
##    <chr>         <chr>              <int>
##  1 Staten Island N/A                    6
##  2 Bronx         N/A                    8
##  3 Staten Island Exotic                11
##  4 Manhattan     N/A                   12
##  5 Queens        N/A                   13
##  6 Brooklyn      Exotic                16
##  7 Bronx         Exotic                22
##  8 Brooklyn      N/A                   22
##  9 Queens        Exotic                24
## 10 Manhattan     Exotic                27
## # ℹ 15 more rows

Due to the N/A Species Status being of particular concern, I’ve added a tag for that scenario.

wildlife$Unknown_Spec_Status <- ifelse(wildlife$Species_Status %in% "N/A", 1, 0)

Hypothesis

I hypothesis that responses with an N/A species status are more likely to be responses where the animals are not found when the wildlife ranger arrives on scene.

Visualization

d_b_ss <- wildlife |>
  group_by(Borough, Species_Status) |>
  summarise(Instances = n()) |>
  mutate(perc = Instances/sum(Instances))

## `summarise()` has grouped output by 'Borough'. You can override using the
## `.groups` argument.

d_b_ss

## # A tibble: 25 × 4
## # Groups:   Borough [5]
##    Borough  Species_Status Instances    perc
##    <chr>    <chr>              <int>   <dbl>
##  1 Bronx    Domestic             271 0.285  
##  2 Bronx    Exotic                22 0.0231 
##  3 Bronx    Invasive              43 0.0452 
##  4 Bronx    N/A                    8 0.00840
##  5 Bronx    Native               608 0.639  
##  6 Brooklyn Domestic             238 0.179  
##  7 Brooklyn Exotic                16 0.0120 
##  8 Brooklyn Invasive             113 0.0850 
##  9 Brooklyn N/A                   22 0.0166 
## 10 Brooklyn Native               940 0.707  
## # ℹ 15 more rows

d_b_ss |>
  ggplot() +
  geom_bar(mapping = aes(x = Borough, y = perc*100, fill = Species_Status), position = "fill", stat = "identity") +
  theme_minimal() +
  scale_fill_brewer(palette = 'Dark2')

Animal Condition by Borough

I was curious if there were any Boroughs where animals were more likely to be found DOA or injured when the ranger arrived. Thankfully, it does seem that in general, healthy is the most common option, while DOA and Injured are less common. This information could be useful for determining what sort of steps should be taken to reduce poor outcomes for the wildlife in a given borough. For example, Manhattan has 424 instances of “unhealthy” rescues. What is causing the animals of Manhattan to be “unhealthy”, and what steps can be taken to mitigate the risk for those animals.

For this data frame, there are 30 possible combinations - 5 Boroughs (Brooklyn, Bronx, Manhattan, Queens, and State Island) and 6 Condition Statuses (Healthy, Unhealthy, Injured, DOA, N/A, and some instances of no value), which means each combination of Borough and Condition status is present in the data frame. Given the small number of options for both variables, it’s not particularly surprising that each is present.

d_b_animal_cond <- wildlife |>
  group_by(Borough, Animal_Condition) |>
  summarise(Instances = n()) |>
  arrange(Instances)

## `summarise()` has grouped output by 'Borough'. You can override using the
## `.groups` argument.

d_b_animal_cond

## # A tibble: 30 × 3
## # Groups:   Borough [5]
##    Borough       Animal_Condition Instances
##    <chr>         <chr>                <int>
##  1 Staten Island <NA>                     4
##  2 Bronx         <NA>                     6
##  3 Manhattan     <NA>                     6
##  4 Brooklyn      <NA>                     8
##  5 Queens        <NA>                    12
##  6 Queens        N/A                     49
##  7 Bronx         N/A                     92
##  8 Staten Island N/A                    106
##  9 Brooklyn      DOA                    112
## 10 Bronx         DOA                    114
## # ℹ 20 more rows

Visualization

Following up on me question in the introduction to this section (what could be done in Manhattan to decrease the number of “unhealthy” animals encountered?) I have decided to dig into that some more. To narrow down the question further, the below graph shows the number of instance by species type, and shows the overwhelming majority of unhealthy animals encountered in Manhattan are considered native.

wildlife |>
  filter(Borough %in% "Manhattan") |>
  filter(Animal_Condition %in% "Unhealthy") |>
  ggplot() +
  geom_bar(mapping = aes(x = Species_Status), position = "dodge", stat = "count") +
  theme_minimal() +
  scale_fill_brewer(palette = 'Dark2')

This next graph looks even further by drilling down into just the native and unhealthy animals of Manhattan. Again, the overwhelming majority belong to one group - raccoons. This is doubly true when you realize there is a second group of raccoons caused by being capitalized and the other not. If I was part of the NYC Wildlife Response team, I would perhaps take some more detailed notes of what is going on when raccoons are encountered to see if we couldn’t encourage initatives to prevent them from becoming unwell, especially if part of the need from the response team is due to the animals being unwell.

wildlife |>
  filter(Borough %in% "Manhattan") |>
  filter(Animal_Condition %in% "Unhealthy") |>
  filter(Species_Status %in% "Native") |>
  ggplot() +
  geom_bar(mapping = aes(y = Species)) +
  theme_minimal() +
  scale_fill_brewer(palette = 'Dark2')

Week_3_DataDive_

2025-01-28

Call Source

Hypothesis

Visualization

Incident by Ages

Hypothesis

Visualization

Responses by Borough and Species Status

Hypothesis

Visualization

Animal Condition by Borough

Visualization