library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
wildlife <- read_delim("./Urban_Wildlife_Response.csv", delim = ",")
## Rows: 6385 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): DT_Initial, DT_Response, Borough, Property, Location, Species, Cal...
## dbl (3): Response_Duration, Num_of_Animals, Hours_Monitoring
## lgl (4): PEP_Response, Animal_Monitored, Police_Response, ESU_Response
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The below table shows a breakdown of the different types of animals (domestic, native, exotic, or invasive) by the number of instances in the table and the total number of animals as is recorded in the “Number of Animals” column. I chose this view, as I was curious as to the breakdown of instances by animal type, however I thought the added view of total animals may prove insightful. Given that the numbers in the “Instance” summary is close to, if not larger than the “Total Animals” summary, I would say it is pretty insightful, in that I would have expected the opposite. I wonder, how many of the “Number of Animals” column’s entries are null or zero, and what could cause them to be recorded as such? Are there certain instances where the “Number of Animals” column seems to be counter-intuitive?
wildlife |>
group_by(Species_Status) |>
summarise(Instances = n(),
Total_Animals = sum(Num_of_Animals,nm.rm = TRUE))
## # A tibble: 5 × 3
## Species_Status Instances Total_Animals
## <chr> <int> <dbl>
## 1 Domestic 1105 1298
## 2 Exotic 100 90
## 3 Invasive 346 472
## 4 N/A 61 36
## 5 Native 4773 NA
Summary data of the response time gives us a good starting point for understanding the response times. The minimum response time of 0 does raise questions - are there actually scenarios that require no response time, or is that the default if an animal isn’t found? The max of 75 hours, especially with the average being 1.4 hours, does seem to be a massive outlier, and the spread of response times would be very interesting to see.
wildlife |>
summarise(Min_Response = min(Response_Duration),
Max_Response = max(Response_Duration),
Avg_Response = mean(Response_Duration),
FirstQ_Response = quantile(Response_Duration, probs = (0.25)),
ThirdQ_Response = quantile(Response_Duration, probs = (0.75)))
## # A tibble: 1 × 5
## Min_Response Max_Response Avg_Response FirstQ_Response ThirdQ_Response
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 75 1.43 0.5 2
By looking at the number of incidents and the aggregate response times statistics by borough, we can begin to get a picture on the differences between the boroughs. With this data, the outlier seems to be Manhattan, which has significantly more incidents and a comparatively shorter response time.
wildlife |>
group_by(Borough) |>
summarise(Instances = n(),
Min_Response = min(Response_Duration),
Max_Response = max(Response_Duration),
Avg_Response = mean(Response_Duration))
## # A tibble: 5 × 5
## Borough Instances Min_Response Max_Response Avg_Response
## <chr> <int> <dbl> <dbl> <dbl>
## 1 Bronx 952 0 30 1.45
## 2 Brooklyn 1329 0 21 1.56
## 3 Manhattan 1761 0 75 1.19
## 4 Queens 1358 0.1 20 1.56
## 5 Staten Island 985 0.1 35 1.47
This visualization shows that, at least compared with all the values collected, the IQR for response time is a pretty small window.
wildlife |>
ggplot() +
geom_boxplot(mapping = aes(x = Borough, y = Response_Duration)) +
scale_y_log10() +
labs(title="Response Time by Borough (scaled by Log)") +
theme_minimal()
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
## Warning: Removed 9 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
This visualization shows the breakdown of incidents by animal type and borough, which is a good first step to asking more questions about the differences in Boroughs.
wildlife |>
ggplot() +
geom_bar(mapping = aes(x = Borough, fill = Species_Status)) +
theme_minimal() +
scale_fill_brewer(palette = 'Dark2')