Week 6 Datadive

Response Time versus Initial Call Time

The “Time_To_Response” variable is a measure of how long it takes from the initial call for an urban ranger to respond to the situation. I would expect the time to respond would vary depending on the time of day. To investigate this, I’ve created groups based on the time of day. Due to the incredibly skewed nature of the data, I’ve opted to look at the hours to response as a log of that field. Without this change, the chart is practically unreadable and not helpful for exploring the relationship between the time of initial call and how long it takes to respond.

Because the call time is independent from how long it takes to respond to a call, and because the response could be influenced by the time the call is made, the relationship between these two variables is that of a response variable (response time) and explanatory variable (time of initial call).

wildlife <- wildlife |>
    mutate(DT_Initial = as.POSIXct(DT_Initial, format = "%m/%d/%Y %H:%M")) |>
    mutate(
      Initial_Range = case_when(
        hour(DT_Initial) >= 0 & hour(DT_Initial) < 4 ~ "Early Morning",
        hour(DT_Initial) >= 4 & hour(DT_Initial) < 8 ~ "Morning",
        hour(DT_Initial) >= 8 & hour(DT_Initial) < 12 ~ "Late Morning",
        hour(DT_Initial) >= 12 & hour(DT_Initial) < 16 ~ "Afternoon",
        hour(DT_Initial) >= 16 & hour(DT_Initial) < 20 ~ "Evening",
        hour(DT_Initial) >= 20 & hour(DT_Initial) < 24 ~ "Late Evening"
    )
  ) |>
  mutate(Initial_Range = factor(Initial_Range, levels = c("Early Morning", "Morning", "Late Morning", "Afternoon", "Evening", "Late Evening")))

wildlife |>
  filter(Time_To_Respond >= 0) |> 
  ggplot() +
  geom_boxplot(mapping = aes(x = Initial_Range, y = log(as.numeric(hms(Time_To_Respond))))) +
  labs(title="Hours to Respond Based On Initial Call Time", x = "Time of Initial Call", y = "Log of Hours Until Response") +  
  theme_minimal()

## Warning: Removed 1224 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Conclusions

My first conclusion is that the data for time to respond is incredibly skewed towards longer response times, this can be seen due to the change to a log of the response time, and based on where the box plot outliers are for most of the Initial Call Time groups.

The average time for late morning to evening being almost identical is not particularly surprising, given that those groups represent 8:00 AM to 8:00 PM. Early morning and late evening averaging a much longer response time also makes sense, given those groups represent 8:00 PM to 4:00 AM. I am somewhat surprised about the outliers on the low end for early morning. Based on the graph below, it does seem there are several outliers on the low end.

wildlife |>
  filter(Time_To_Respond >= 0) |> 
  filter(Initial_Range == "Early Morning") |>
  ggplot() +
  geom_point(mapping = aes(x = as.numeric(hms(DT_Initial)), 
                           y = log(as.numeric(hms(Time_To_Respond))))) +
  labs(title="Hours to Respond Based On Initial Call Time for Early Morning Calls", x = "Time of Initial Call", y = "Log of Hours Until Response") +  
  theme_minimal()

## Warning in .parse_hms(..., order = "HMS", quiet = quiet): Some strings failed
## to parse

## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).

Correlations

First, I’d like to double check that my groups are ranking as expected.

wildlife |>
  filter(Time_To_Respond >= 0) |> 
  mutate(Initial_Range_Rank = as.numeric(Initial_Range)) |>
  select(Initial_Range, Initial_Range_Rank) |>
  unique() |>
  arrange(Initial_Range)

## # A tibble: 6 × 2
##   Initial_Range Initial_Range_Rank
##   <fct>                      <dbl>
## 1 Early Morning                  1
## 2 Morning                        2
## 3 Late Morning                   3
## 4 Afternoon                      4
## 5 Evening                        5
## 6 Late Evening                   6

Then, we can calculate Spearman’s Rho. I have the time groups ordered by earliest in the day to latest in the day (12 AM to 11 PM). This gives a correlation of -0.033, which is not particularly strong.

wl_filt <- wildlife |>
  filter(Time_To_Respond >= 0) |> 
  mutate(Time_To_Respond = as.numeric(hms(Time_To_Respond))) |> 
  filter(!is.na(Time_To_Respond)) 
  

w1_rho <- cor(x = as.numeric(wl_filt$Initial_Range), 
             y = (wl_filt$Time_To_Respond), 
             method = "spearman")

w1_rho

## [1] -0.0333517

There are a few ways that time could be looked at, however, so in the next calculation, I have reordered the groups, so that they start with the late evening group, so the “day” ends approximately when people start to call it an evening (8 PM to 7 PM). I made this decision, as it would make more sense for late evening and early morning to be ranked next to each other as opposed to opposite end of the list. This returns a slightly stronger correlation at -0.08.

wl_filt2 <- wildlife |>
  filter(Time_To_Respond >= 0) |> 
  mutate(Time_To_Respond = as.numeric(hms(Time_To_Respond))) |> 
  filter(!is.na(Time_To_Respond)) |>
  mutate(Initial_Range = factor(Initial_Range, levels = c("Late Evening", "Early Morning", "Morning", "Late Morning", "Afternoon", "Evening")))


w1_2_rho <- cor(x = as.numeric(wl_filt2$Initial_Range), 
             y = (wl_filt2$Time_To_Respond), 
             method = "spearman")

w1_2_rho

## [1] -0.0820776

Just to ensure that the two outlier groups (early morning and late evening), are not skewing the correlation too much (given when a day “starts” can be somewhat subjective), I also ran the correlation removing those two groups. The correlation of -0.039 is much closer to the original calculation. This would lead me to believe the true correlation is closer to the original -0.033.

wl_filt3 <- wildlife |>
  filter(Time_To_Respond >= 0) |> 
  mutate(Time_To_Respond = as.numeric(hms(Time_To_Respond))) |> 
  filter(!is.na(Time_To_Respond)) |>
  filter(!Initial_Range %in% c("Late Evening", "Early Morning")) |>
  mutate(Initial_Range = factor(Initial_Range, levels = c("Late Evening", "Early Morning", "Morning", "Late Morning", "Afternoon", "Evening")))


w1_3_rho <- cor(x = as.numeric(wl_filt3$Initial_Range), 
             y = (wl_filt3$Time_To_Respond), 
             method = "spearman")

w1_3_rho

## [1] -0.03910552

Regardless of the actual value, directionally these three calculations were the same, and they were on similar magnitude levels. The correlation is showing a very slight negative correlation, though nothing strong enough to make solid predictions on.

Confidence Interval

boot_ci <- function (v, func = median, conf = 0.95, n_iter = 1000) {
  # the `boot` library needs the function in this format
  boot_func <- \(x, i) func(x[i], na.rm=TRUE)
  
  b <- boot(v, boot_func, R = n_iter)
  
  boot.ci(b, conf = conf, type = "perc")
}

boot_ci(wl_filt$Time_To_Respond, mean, 0.95)

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b, conf = conf, type = "perc")
## 
## Intervals : 
## Level     Percentile     
## 95%   (10490, 14409 )  
## Calculations and Intervals on Original Scale

Below is a conversion of the percentile range from seconds to hours. The 95% confidence interval suggests that the average response time to a call is between 2.9 and 4.0 hours. I would consider this confidence interval to be neither particularly wide nor particularly narrow. Given some of the responses can take several days (or longer), a range of just over an hour for the “average response time”, seems reasonable.

10588 * (1/60) * (1/60)

## [1] 2.941111

14482 * (1/60) * (1/60)

## [1] 4.022778

Number of Animals vs Response Duration

It would make sense that a larger number of animals would require a longer response time. To investigate this, I’ve created a new variable “Group_Size” to see if there are any correlations between the number of animals and how long a response takes.

wildlife <- wildlife |>
  mutate(
    Size_of_Group = case_when(
      Num_of_Animals >= 0 & Num_of_Animals < 1 ~ "None",
      Num_of_Animals >= 1 & Num_of_Animals < 2 ~ "Single",
      Num_of_Animals >= 2 & Num_of_Animals < 3 ~ "Pair",
      Num_of_Animals >= 3 & Num_of_Animals < 6 ~ "Small Group",
      Num_of_Animals >= 6 & Num_of_Animals < 10 ~ "Medium Group",
      Num_of_Animals >= 10 ~ "Large Group"
    )
  ) |>
  mutate(Size_of_Group = factor(Size_of_Group, levels = c("None", "Single", "Pair", "Small Group", "Medium Group", "Large Group")))

wildlife |>
  ggplot() +
  geom_boxplot(mapping = aes(x = Size_of_Group, y = log(Response_Duration))) +
  labs(title="Length of Response by Number of Animals", x = "Number of Animals", y = "Log of Response Duration") +  
  theme_minimal()

## Warning: Removed 9 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Conclusions

The box plot for duration of response versus the group size of the animals shows a relatively consistent average, and similiar IDRs, across the different groups. This suggests that the variance across groups and the variance within groups are both on the smaller side. This should lead to less variability reflected in population parameter estimates.

There are more outliers in the smaller groups (namely the “None” and “Single” groups). This is likely due to the higher volume of single or no animal responses. There could also be the added factor of it being either much harder or much easier to find a single animal depending on what that animal is. (A specific goose versus a single alligator, for example).

Correlations

wl_filt4 <- wildlife |>
  filter(!is.na(Size_of_Group)) |> 
  filter(!is.na(Response_Duration)) 

w4_rho <- cor(x = as.numeric(wl_filt4$Size_of_Group), 
             y = (wl_filt4$Response_Duration), 
             method = "spearman")

w4_rho

## [1] 0.2205569

Spearman’s Rho provides a correlation of 0.22. This shows a positive correlation between the number of animals responded to and the amount of time a response takes. This would make sense, because the more animals you have to care for, the longer it will take. While 0.22 is not an incredibly strong correlation, it is not exactly small either, which would lead me to say there is some significant positive correlation between the number of animals and the duration of a response.

Confidence Intervals

Below is the 95% confidence interval for the variable “Response_Duration”. Based on the calculation, the average response time is likely to be between 1.40 hours (84 minutes) and 1.47 hours (88.2 minutes). That is less than a 5 minute range. This suggests that the true average of response duration is somewhere in that small window.

boot_ci(wl_filt4$Response_Duration, mean, 0.95)

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b, conf = conf, type = "perc")
## 
## Intervals : 
## Level     Percentile     
## 95%   ( 1.391,  1.470 )  
## Calculations and Intervals on Original Scale

Week 6 Datadive

2025-04-08

Response Time versus Initial Call Time

Conclusions

Correlations

Confidence Interval

Number of Animals vs Response Duration

Conclusions

Correlations

Confidence Intervals