Week_2_DataDive

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

wildlife <- read_delim("./Urban_Wildlife_Response.csv", delim = ",")

## Rows: 6385 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): DT_Initial, DT_Response, Borough, Property, Location, Species, Cal...
## dbl  (3): Response_Duration, Num_of_Animals, Hours_Monitoring
## lgl  (4): PEP_Response, Animal_Monitored, Police_Response, ESU_Response
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Numeric Summary of Species Status

The below table shows a breakdown of the different types of animals (domestic, native, exotic, or invasive) by the number of instances in the table and the total number of animals as is recorded in the “Number of Animals” column. I chose this view, as I was curious as to the breakdown of instances by animal type, however I thought the added view of total animals may prove insightful. Given that the numbers in the “Instance” summary is close to, if not larger than the “Total Animals” summary, I would say it is pretty insightful, in that I would have expected the opposite. I wonder, how many of the “Number of Animals” column’s entries are null or zero, and what could cause them to be recorded as such? Are there certain instances where the “Number of Animals” column seems to be counter-intuitive?

wildlife |>
  group_by(Species_Status) |>
  summarise(Instances = n(),
            Total_Animals = sum(Num_of_Animals,nm.rm = TRUE))

## # A tibble: 5 × 3
##   Species_Status Instances Total_Animals
##   <chr>              <int>         <dbl>
## 1 Domestic            1105          1298
## 2 Exotic               100            90
## 3 Invasive             346           472
## 4 N/A                   61            36
## 5 Native              4773            NA

Numeric Summary of Response Time

Summary data of the response time gives us a good starting point for understanding the response times. The minimum response time of 0 does raise questions - are there actually scenarios that require no response time, or is that the default if an animal isn’t found? The max of 75 hours, especially with the average being 1.4 hours, does seem to be a massive outlier, and the spread of response times would be very interesting to see.

wildlife |>
  summarise(Min_Response = min(Response_Duration),
            Max_Response = max(Response_Duration),
            Avg_Response = mean(Response_Duration),
            FirstQ_Response = quantile(Response_Duration, probs = (0.25)),
            ThirdQ_Response = quantile(Response_Duration, probs = (0.75)))

## # A tibble: 1 × 5
##   Min_Response Max_Response Avg_Response FirstQ_Response ThirdQ_Response
##          <dbl>        <dbl>        <dbl>           <dbl>           <dbl>
## 1            0           75         1.43             0.5               2

Novel Questions

What differences exists between the boroughs in regards to response times, animals encountered, and method of contact?
Which animals are most likely to require a response - both species and age of animal?
What steps can be taken to improve responses - are there certain animals or locations that are more likely to lead to negative outcomes (such as injured or unfound animals)?

Difference in Boroughs

By looking at the number of incidents and the aggregate response times statistics by borough, we can begin to get a picture on the differences between the boroughs. With this data, the outlier seems to be Manhattan, which has significantly more incidents and a comparatively shorter response time.

wildlife |>
  group_by(Borough) |>
  summarise(Instances = n(),
            Min_Response = min(Response_Duration),
            Max_Response = max(Response_Duration),
            Avg_Response = mean(Response_Duration))

## # A tibble: 5 × 5
##   Borough       Instances Min_Response Max_Response Avg_Response
##   <chr>             <int>        <dbl>        <dbl>        <dbl>
## 1 Bronx               952          0             30         1.45
## 2 Brooklyn           1329          0             21         1.56
## 3 Manhattan          1761          0             75         1.19
## 4 Queens             1358          0.1           20         1.56
## 5 Staten Island       985          0.1           35         1.47

Visualization - Response Time by Borough

This visualization shows that, at least compared with all the values collected, the IQR for response time is a pretty small window.

wildlife |>
  ggplot() +
  geom_boxplot(mapping = aes(x = Borough, y = Response_Duration)) +
  scale_y_log10() +
  labs(title="Response Time by Borough (scaled by Log)") +  
  theme_minimal()

## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

## Warning: Removed 9 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Visualization - Incidents by Borough and Animal Type

This visualization shows the breakdown of incidents by animal type and borough, which is a good first step to asking more questions about the differences in Boroughs.

wildlife |>
  ggplot() +
  geom_bar(mapping = aes(x = Borough, fill = Species_Status)) +
  theme_minimal() +
  scale_fill_brewer(palette = 'Dark2')

Week_2_DataDive_

2025-01-28