Trek Data Analysis: Wildlife Detection Patterns in Interior vs. Exterior Habitats

1. Introduction

This report analyzes hornbill detection data from 81 trekking surveys in a protected area, focusing on habitat preferences (interior vs. exterior) across ranges and regions. Metrics include detection rates and Jacob’s Electivity Index, with chi-square tests for significance. Analysis uses R on data from data_sheet.xlsx.

2. Dataset Columns

The dataset includes the following columns:

Column name Description
Region Specific survey location (e.g., Panbari Camp, a unique site within a range)
Date Date of the trek (format: YYYY-MM-DD)
Start time Trek start time (time of day, e.g., HH:MM:SS)
End time Trek end time (time of day, e.g., HH:MM:SS)
Distance covered (km) Length of the trek route in kilometers
Detection Binary indicator of hornbill sighting (“Yes” or “No”)
Detection Id(s) Unique identifier(s) for detected hornbills (if applicable; may be blank)
Interior/Exterior Habitat type (“Interior” for core forest zones; “Exterior” for edge/buffer areas)
Time bucket Categorized time period of the trek
Range Broader administrative area (in this scenario there are two ranges: Bonda Range and Khanapara Range)
Trek/Point Type of survey method (“Trek” for line transects; “Point” for stationary counts)

3. Preliminary Data Analysis

Basic exploration provides an overview of data structure, distributions, and trends before habitat comparisons.

3.1. Basic summary of distance covered in the treks

summary(trek_data[, c("Distance covered (km)", "Detection")])
 Distance covered (km)  Detection        
 Min.   :0.0000        Length:81         
 1st Qu.:0.0000        Class :character  
 Median :0.4500        Mode  :character  
 Mean   :0.9195                          
 3rd Qu.:1.4400                          
 Max.   :4.2400                          

3.2. Plot of detections by date

trek_data$Date <- as.Date(trek_data$Date)
ggplot(trek_data, aes(x = Date, fill = Detection)) +
  geom_bar() +
  labs(title = "Hornbill Detections by Date", x = "Date", y = "Number of Treks") +
  theme_minimal()

3.3. Histogram of trek durations

trek_data$Duration_min <- as.numeric(difftime(as.POSIXct(trek_data$`End time`), 
                                              as.POSIXct(trek_data$`Start time`), 
                                              units = "mins"))
ggplot(trek_data, aes(x = Duration_min)) +
  geom_histogram(bins = 20, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Trek Durations (Minutes)", x = "Duration (min)", y = "Frequency") +
  theme_minimal()

3.4. Overall detection rate

detection_rate <- mean(trek_data$Detection == "Yes", na.rm = TRUE)
print(paste("Overall Detection Rate:", round(detection_rate, 3)))
[1] "Overall Detection Rate: 0.593"

These steps reveal 81 rows (treks), variable distances (mean ~0.9 km), and an overall detection rate of ~0.593. The date plot shows temporal patterns, while the duration histogram indicates most treks last 30-90 minutes, aiding effort standardization.

4. Enhanced preliminary data analysis

4.1 Interior vs Exterior detection rates

int_ext_summary <- trek_data %>%
  group_by(`Interior/Exterior`) %>%
  summarise(
    Total_Detections = sum(Detection_flag, na.rm = TRUE),
    N_Treks = n(),
    Detection_Rate = sum(Detection_flag, na.rm = TRUE) / n(),
    .groups = "drop"
  ) %>%
  mutate(Detection_Rate = percent(Detection_Rate))  

kable(int_ext_summary, caption = "Detection Rates: Interior vs Exterior", digits = 3)
Detection Rates: Interior vs Exterior
Interior/Exterior Total_Detections N_Treks Detection_Rate
Exterior 46 59 78%
Interior 2 22 9%
ggplot(int_ext_summary, aes(x = `Interior/Exterior`, y = Detection_Rate, fill = `Interior/Exterior`)) +
  geom_bar(stat = "identity", alpha = 0.7) +
  geom_text(aes(label = paste0(Detection_Rate)), vjust = -0.3) +
  labs(title = "Hornbill Detection Rates by Habitat Type", y = "Detection rate", x = "Habitat") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set2")

4.2 Range and Range per Interior/Exterior detection rate

range_summary_rate <- trek_data %>%
  group_by(Range, `Interior/Exterior`) %>%
  summarise(
    Total_Detections = sum(Detection_flag, na.rm = TRUE),
    N_Treks = n(),
    Detection_Rate = sum(Detection_flag, na.rm = TRUE) / n(),
    .groups = "drop"
  ) %>%
  mutate(Detection_Rate = percent(Detection_Rate, accuracy = 0.1)) %>%
  bind_rows(
    # Overall Range summary (aggregated)
    trek_data %>%
      group_by(Range) %>%
      summarise(
        `Interior/Exterior` = "Overall",
        Total_Detections = sum(Detection_flag, na.rm = TRUE),
        N_Treks = n(),
        Detection_Rate = sum(Detection_flag, na.rm = TRUE) / n(),
        .groups = "drop"
      ) %>%
      mutate(Detection_Rate = percent(Detection_Rate, accuracy = 0.1))
  ) %>%
  arrange(Range, `Interior/Exterior`)

kable(range_summary_rate, caption = "Detection Rates: By Range and Habitat (Trek-Based)", digits = 3)
Detection Rates: By Range and Habitat (Trek-Based)
Range Interior/Exterior Total_Detections N_Treks Detection_Rate
Bonda Range Exterior 40 41 97.6%
Bonda Range Interior 2 11 18.2%
Bonda Range Overall 42 52 80.8%
Khanapara Range Exterior 6 18 33.3%
Khanapara Range Interior 0 11 0.0%
Khanapara Range Overall 6 29 20.7%
ggplot(range_summary_rate %>% filter(`Interior/Exterior` != "Overall"), 
       aes(x = Range, y = Detection_Rate, fill = `Interior/Exterior`)) +
  geom_bar(stat = "identity", position = "dodge", alpha = 0.7) +
  labs(title = "Hornbill Detection Rates by Range and Habitat (Trek-Based)", y = "Detection Rate", x = "Range") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set1") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

4.3 Region and Region per Interior/Exterior detection rate

region_summary_rate <- trek_data %>%
  group_by(Region, `Interior/Exterior`) %>%
  summarise(
    Total_Detections = sum(Detection_flag, na.rm = TRUE),
    N_Treks = n(),
    Detection_Rate = sum(Detection_flag, na.rm = TRUE) / n(),
    .groups = "drop"
  ) %>%
  mutate(Detection_Rate = percent(Detection_Rate, accuracy = 0.1)) %>%
  bind_rows(
    trek_data %>%
      group_by(Region) %>%
      summarise(
        `Interior/Exterior` = "Overall",
        Total_Detections = sum(Detection_flag, na.rm = TRUE),
        N_Treks = n(),
        Detection_Rate = sum(Detection_flag, na.rm = TRUE) / n(),
        .groups = "drop"
      ) %>%
      mutate(Detection_Rate = percent(Detection_Rate, accuracy = 0.1))
  ) %>%
  arrange(Region, `Interior/Exterior`)

kable(region_summary_rate, caption = "Detection Rates: By Region and Habitat (Trek-Based)", digits = 3)
Detection Rates: By Region and Habitat (Trek-Based)
Region Interior/Exterior Total_Detections N_Treks Detection_Rate
9 mile view point Interior 0 2 0.0%
9 mile view point Overall 0 2 0.0%
Amchong Tea Estate Exterior 0 2 0.0%
Amchong Tea Estate Overall 0 2 0.0%
Birkuchi Camp Interior 1 1 100.0%
Birkuchi Camp Overall 1 1 100.0%
Borbilla Bill Exterior 1 1 100.0%
Borbilla Bill Interior 0 5 0.0%
Borbilla Bill Overall 1 6 16.7%
Brahmaputra Jungle Resort Exterior 0 2 0.0%
Brahmaputra Jungle Resort Overall 0 2 0.0%
Camp Digaru Exterior 8 8 100.0%
Camp Digaru Overall 8 8 100.0%
Chandrapur Exterior 6 6 100.0%
Chandrapur Overall 6 6 100.0%
Dhangiri Gaon Exterior 0 3 0.0%
Dhangiri Gaon Overall 0 3 0.0%
Don Bosco University Campus Exterior 4 5 80.0%
Don Bosco University Campus Overall 4 5 80.0%
Ethnic Village Resort Exterior 1 1 100.0%
Ethnic Village Resort Interior 0 1 0.0%
Ethnic Village Resort Overall 1 2 50.0%
Kasutali Pathar Exterior 0 2 0.0%
Kasutali Pathar Overall 0 2 0.0%
Khamrenga Exterior 4 5 80.0%
Khamrenga Overall 4 5 80.0%
Moriyanak View Point Interior 0 4 0.0%
Moriyanak View Point Overall 0 4 0.0%
Nazirakhat Archeological Park Exterior 0 2 0.0%
Nazirakhat Archeological Park Overall 0 2 0.0%
PRP Village Exterior 2 2 100.0%
PRP Village Interior 0 1 0.0%
PRP Village Overall 2 3 66.7%
Panbari Camp Exterior 18 18 100.0%
Panbari Camp Interior 1 3 33.3%
Panbari Camp Overall 19 21 90.5%
Pothali Pani Camp Interior 0 3 0.0%
Pothali Pani Camp Overall 0 3 0.0%
Punjabari Forest Camp Interior 0 1 0.0%
Punjabari Forest Camp Overall 0 1 0.0%
Tepesia Exterior 2 2 100.0%
Tepesia Interior 0 1 0.0%
Tepesia Overall 2 3 66.7%
top_regions_data <- region_summary_rate %>%
  filter(`Interior/Exterior` != "Overall") %>%
  group_by(Region) %>%
  summarise(
    Overall_Rate = mean(as.numeric(sub("%", "", Detection_Rate)) / 100, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(Overall_Rate)) %>%
  slice_head(n = 10)

top_regions_plot <- region_summary_rate %>%
  filter(`Interior/Exterior` != "Overall", Region %in% top_regions_data$Region) %>%
  mutate(Rate_Numeric = as.numeric(sub("%", "", Detection_Rate)) / 100) %>%
  arrange(Region, desc(Rate_Numeric))  

ggplot(top_regions_plot, aes(x = Rate_Numeric, y = reorder(Region, Rate_Numeric), fill = `Interior/Exterior`)) +
  geom_bar(stat = "identity", position = "dodge", alpha = 0.8, width = 0.7) +
  geom_text(aes(label = Detection_Rate), hjust = -0.1, size = 3) +
  labs(title = "Top 10 Regions: Detection Rates by Habitat (Trek-Based)",
       x = "Detection Rate", y = "Region",
       fill = "Habitat") +
  scale_x_continuous(labels = percent) +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 10),
        plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Set1")

5. Core analysis (Chi square and Jacob’s tests)

The core analysis uses chi-square tests to assess if hornbill detections are associated with habitat type (interior vs. exterior), beyond chance or effort. Contingency tables compare observed detections against trek counts (N), testing independence. Jacob’s Electivity Index quantifies preference/avoidance relative to expected (proportional to time spent). Tests are run at three scales: overall habitat, range-specific, and region-specific.

Why Divide by Time? Trek durations vary (e.g., some are 30 minutes, others 2 hours), so raw detection counts alone can bias results: longer treks naturally offer more chances to spot hornbills. Dividing detections by total time (hours) yields a detection rate per hour, normalizing for survey effort. This makes comparisons fair across habitats, ranges, or regions, revealing true patterns in hornbill activity rather than just sampling intensity. Without this, shorter interior treks might falsely appear “safer” for hornbills due to less exposure time.

Across the dataset, this yields 74.13 total hours of effort and 48 detections, for an overall rate of ~0.65 per hour.

5.1 Overall Habitat Test

Purpose: This test evaluates whether hornbill detections differ significantly between interior (core, undisturbed forest) and exterior (edge/buffer zones, often with more human activity) habitats. By comparing detections to trek numbers, it checks if habitat influences sighting probability, controlling for sampling effort. This helps identify if hornbills avoid or prefer edges, informing conservation (e.g., buffer zone management).

Null Hypothesis (H0): Detection frequency is independent of habitat type—hornbills are equally likely to be detected in interior and exterior areas, proportional to treks conducted (no preference or avoidance).

Alternative Hypothesis (H1): Detection frequency depends on habitat type—hornbills show preference for or avoidance of one habitat, leading to uneven detections.

Chi-Square Test Code and Output: The code groups data by habitat, summarizes detections and treks, computes expected counts (proportional to total time), and applies Jacob’s Index. The chi-square uses a 2x2 contingency table (rows: habitat; columns: detections vs. treks).

habitat_summary <- trek_data %>%
  group_by(`Interior/Exterior`) %>%
  summarise(
    Total_Detections = sum(Detection_flag, na.rm = TRUE),
    Total_Time = sum(Duration_hours, na.rm = TRUE),
    Detection_per_Hour = round(Total_Detections / Total_Time, 3),
    N = n()
  )

total_detections <- sum(habitat_summary$Total_Detections)
total_time <- sum(habitat_summary$Total_Time)

habitat_summary <- habitat_summary %>%
  mutate(
    Expected_Counts = (Total_Time / total_time) * total_detections,
    Jacobs_Index = round((Total_Detections - Expected_Counts) /
                         (Total_Detections + Expected_Counts), 2)
  )


kable(habitat_summary, caption = "Interior vs Exterior")
Interior vs Exterior
Interior/Exterior Total_Detections Total_Time Detection_per_Hour N Expected_Counts Jacobs_Index
Exterior 46 50.70000 0.907 59 32.82734 0.17
Interior 2 23.43333 0.085 22 15.17266 -0.77
contingency_table <- as.matrix(habitat_summary[, c("Total_Detections", "N")])
rownames(contingency_table) <- habitat_summary$`Interior/Exterior`
chi_result_1 <- chisq.test(contingency_table)
chi_result_1

    Pearson's Chi-squared test with Yates' continuity correction

data:  contingency_table
X-squared = 9.0593, df = 1, p-value = 0.002614

Chi-Square Result: χ² = 9.06, df = 1, p = 0.003 (significant; reject H0, evidence of association).

With p < 0.05, we reject H0—there’s a significant association (exteriors have disproportionately more detections). Yates’ correction adjusts for small expected values (<5 in interior).

Jacob’s Index Explanation: This index measures habitat selection. It ranges from -1 (complete avoidance) to +1 (complete preference), with 0 neutral. Here, exterior’s +0.17 suggests mild preference (hornbills use edges slightly more than availability), while interior’s -0.77 indicates strong avoidance (possibly due to deeper forest inaccessibility or lower food resources). This aligns with the low interior rate (0.085/hour vs. 0.907 exterior), even after time normalization.

5.2 Range Specific test

Purpose: Ranges (Bonda, Khanapara) represent administrative divisions with potentially varying ecology or disturbance. This test checks if habitat biases persist within each, revealing local patterns (e.g., Bonda’s denser forest might amplify avoidance).

Null Hypothesis (H0): Within each range, detections are independent of habitat (proportional to treks).

Alternative Hypothesis (H1): Within each range, detections associate with habitat (local preference/avoidance).

Chi-Square Test Code and Output: Similar to overall, but grouped by range first, with per-range expectations and separate chi-squares.

range_summary <- trek_data %>%
  group_by(Range, `Interior/Exterior`) %>%
  summarise(
    Total_Detections = sum(Detection_flag, na.rm = TRUE),
    Total_Time = sum(Duration_hours, na.rm = TRUE),
    Detection_per_Hour = round(Total_Detections / Total_Time, 3),
    N = n(),
    .groups = "drop"
  )

range_summary <- range_summary %>%
  group_by(Range) %>%
  mutate(
    Total_Detections_Range = sum(Total_Detections),
    Total_Time_Range = sum(Total_Time),
    Expected_Counts = (Total_Time / Total_Time_Range) * Total_Detections_Range,
    Jacobs_Index = round((Total_Detections - Expected_Counts) /
                         (Total_Detections + Expected_Counts), 2)
  )

kable(range_summary, caption = "Range-based Interior vs Exterior")
Range-based Interior vs Exterior
Range Interior/Exterior Total_Detections Total_Time Detection_per_Hour N Total_Detections_Range Total_Time_Range Expected_Counts Jacobs_Index
Bonda Range Exterior 40 26.45000 1.512 41 42 35.05000 31.694722 0.12
Bonda Range Interior 2 8.60000 0.233 11 42 35.05000 10.305278 -0.67
Khanapara Range Exterior 6 24.25000 0.247 18 6 39.08333 3.722815 0.23
Khanapara Range Interior 0 14.83333 0.000 11 6 39.08333 2.277185 -1.00
chi_results_range <- range_summary %>%
  group_by(Range) %>%
  group_map(~ chisq.test(as.matrix(.x[, c("Total_Detections", "N")])))

chi_results_range
[[1]]

    Pearson's Chi-squared test with Yates' continuity correction

data:  as.matrix(.x[, c("Total_Detections", "N")])
X-squared = 3.9534, df = 1, p-value = 0.04678


[[2]]

    Pearson's Chi-squared test with Yates' continuity correction

data:  as.matrix(.x[, c("Total_Detections", "N")])
X-squared = 1.7922, df = 1, p-value = 0.1807

Chi-Square Results:

  • Bonda: χ² = 3.95, df = 1, p = 0.047 (significant; reject H0).

  • Khanapara: χ² = 1.79, df = 1, p = 0.181 (not significant; fail to reject H0).

Bonda shows clear habitat bias (high exterior rate: 1.512/hour), while Khanapara’s low detections reduce power.

Jacob’s Index Explanation: Computed per range (using range-specific time). Bonda: Exterior +0.12 (mild preference), interior -0.67 (avoidance). Khanapara: Exterior +0.23 (mild preference), interior -1.00 (complete avoidance, no detections despite effort). Time division ensures indices reflect availability, not just raw counts e.g., Khanapara interiors had 14.83 hours but zero sightings.