Trek Data Analysis: Wildlife Detection Patterns in Interior vs. Exterior Habitats

1. Introduction

This report analyzes hornbill detection data from 81 trekking surveys in a protected area, focusing on habitat preferences (interior vs. exterior) across ranges and regions. Metrics include detection rates and Jacob’s Electivity Index, with chi-square tests for significance. Analysis uses R on data from data_sheet.xlsx.

2. Dataset Columns

The dataset includes the following columns:

Column name	Description
Region	Specific survey location (e.g., Panbari Camp, a unique site within a range)
Date	Date of the trek (format: YYYY-MM-DD)
Start time	Trek start time (time of day, e.g., HH:MM:SS)
End time	Trek end time (time of day, e.g., HH:MM:SS)
Distance covered (km)	Length of the trek route in kilometers
Detection	Binary indicator of hornbill sighting (“Yes” or “No”)
Detection Id(s)	Unique identifier(s) for detected hornbills (if applicable; may be blank)
Interior/Exterior	Habitat type (“Interior” for core forest zones; “Exterior” for edge/buffer areas)
Time bucket	Categorized time period of the trek
Range	Broader administrative area (in this scenario there are two ranges: Bonda Range and Khanapara Range)
Trek/Point	Type of survey method (“Trek” for line transects; “Point” for stationary counts)

3. Preliminary Data Analysis

Basic exploration provides an overview of data structure, distributions, and trends before habitat comparisons.

3.1. Basic summary of distance covered in the treks

summary(trek_data[, c("Distance covered (km)", "Detection")])

 Distance covered (km)  Detection        
 Min.   :0.0000        Length:81         
 1st Qu.:0.0000        Class :character  
 Median :0.4500        Mode  :character  
 Mean   :0.9195                          
 3rd Qu.:1.4400                          
 Max.   :4.2400

3.2. Plot of detections by date

trek_data$Date <- as.Date(trek_data$Date)
ggplot(trek_data, aes(x = Date, fill = Detection)) +
  geom_bar() +
  labs(title = "Hornbill Detections by Date", x = "Date", y = "Number of Treks") +
  theme_minimal()

3.3. Histogram of trek durations

trek_data$Duration_min <- as.numeric(difftime(as.POSIXct(trek_data$`End time`), 
                                              as.POSIXct(trek_data$`Start time`), 
                                              units = "mins"))
ggplot(trek_data, aes(x = Duration_min)) +
  geom_histogram(bins = 20, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Trek Durations (Minutes)", x = "Duration (min)", y = "Frequency") +
  theme_minimal()

3.4. Overall detection rate

detection_rate <- mean(trek_data$Detection == "Yes", na.rm = TRUE)
print(paste("Overall Detection Rate:", round(detection_rate, 3)))

[1] "Overall Detection Rate: 0.593"

These steps reveal 81 rows (treks), variable distances (mean ~0.9 km), and an overall detection rate of ~0.593. The date plot shows temporal patterns, while the duration histogram indicates most treks last 30-90 minutes, aiding effort standardization.

4. Enhanced preliminary data analysis

4.1 Interior vs Exterior detection rates

int_ext_summary <- trek_data %>%
  group_by(`Interior/Exterior`) %>%
  summarise(
    Total_Detections = sum(Detection_flag, na.rm = TRUE),
    N_Treks = n(),
    Detection_Rate = sum(Detection_flag, na.rm = TRUE) / n(),
    .groups = "drop"
  ) %>%
  mutate(Detection_Rate = percent(Detection_Rate))  

kable(int_ext_summary, caption = "Detection Rates: Interior vs Exterior", digits = 3)

Detection Rates: Interior vs Exterior
Interior/Exterior	Total_Detections	N_Treks	Detection_Rate
Exterior	46	59	78%
Interior	2	22	9%

ggplot(int_ext_summary, aes(x = `Interior/Exterior`, y = Detection_Rate, fill = `Interior/Exterior`)) +
  geom_bar(stat = "identity", alpha = 0.7) +
  geom_text(aes(label = paste0(Detection_Rate)), vjust = -0.3) +
  labs(title = "Hornbill Detection Rates by Habitat Type", y = "Detection rate", x = "Habitat") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set2")

4.2 Range and Range per Interior/Exterior detection rate

range_summary_rate <- trek_data %>%
  group_by(Range, `Interior/Exterior`) %>%
  summarise(
    Total_Detections = sum(Detection_flag, na.rm = TRUE),
    N_Treks = n(),
    Detection_Rate = sum(Detection_flag, na.rm = TRUE) / n(),
    .groups = "drop"
  ) %>%
  mutate(Detection_Rate = percent(Detection_Rate, accuracy = 0.1)) %>%
  bind_rows(
    # Overall Range summary (aggregated)
    trek_data %>%
      group_by(Range) %>%
      summarise(
        `Interior/Exterior` = "Overall",
        Total_Detections = sum(Detection_flag, na.rm = TRUE),
        N_Treks = n(),
        Detection_Rate = sum(Detection_flag, na.rm = TRUE) / n(),
        .groups = "drop"
      ) %>%
      mutate(Detection_Rate = percent(Detection_Rate, accuracy = 0.1))
  ) %>%
  arrange(Range, `Interior/Exterior`)

kable(range_summary_rate, caption = "Detection Rates: By Range and Habitat (Trek-Based)", digits = 3)

Detection Rates: By Range and Habitat (Trek-Based)
Range	Interior/Exterior	Total_Detections	N_Treks	Detection_Rate
Bonda Range	Exterior	40	41	97.6%
Bonda Range	Interior	2	11	18.2%
Bonda Range	Overall	42	52	80.8%
Khanapara Range	Exterior	6	18	33.3%
Khanapara Range	Interior	0	11	0.0%
Khanapara Range	Overall	6	29	20.7%

ggplot(range_summary_rate %>% filter(`Interior/Exterior` != "Overall"), 
       aes(x = Range, y = Detection_Rate, fill = `Interior/Exterior`)) +
  geom_bar(stat = "identity", position = "dodge", alpha = 0.7) +
  labs(title = "Hornbill Detection Rates by Range and Habitat (Trek-Based)", y = "Detection Rate", x = "Range") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set1") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

4.3 Region and Region per Interior/Exterior detection rate

region_summary_rate <- trek_data %>%
  group_by(Region, `Interior/Exterior`) %>%
  summarise(
    Total_Detections = sum(Detection_flag, na.rm = TRUE),
    N_Treks = n(),
    Detection_Rate = sum(Detection_flag, na.rm = TRUE) / n(),
    .groups = "drop"
  ) %>%
  mutate(Detection_Rate = percent(Detection_Rate, accuracy = 0.1)) %>%
  bind_rows(
    trek_data %>%
      group_by(Region) %>%
      summarise(
        `Interior/Exterior` = "Overall",
        Total_Detections = sum(Detection_flag, na.rm = TRUE),
        N_Treks = n(),
        Detection_Rate = sum(Detection_flag, na.rm = TRUE) / n(),
        .groups = "drop"
      ) %>%
      mutate(Detection_Rate = percent(Detection_Rate, accuracy = 0.1))
  ) %>%
  arrange(Region, `Interior/Exterior`)

kable(region_summary_rate, caption = "Detection Rates: By Region and Habitat (Trek-Based)", digits = 3)

Detection Rates: By Region and Habitat (Trek-Based)
Region	Interior/Exterior	Total_Detections	N_Treks	Detection_Rate
9 mile view point	Interior	0	2	0.0%
9 mile view point	Overall	0	2	0.0%
Amchong Tea Estate	Exterior	0	2	0.0%
Amchong Tea Estate	Overall	0	2	0.0%
Birkuchi Camp	Interior	1	1	100.0%
Birkuchi Camp	Overall	1	1	100.0%
Borbilla Bill	Exterior	1	1	100.0%
Borbilla Bill	Interior	0	5	0.0%
Borbilla Bill	Overall	1	6	16.7%
Brahmaputra Jungle Resort	Exterior	0	2	0.0%
Brahmaputra Jungle Resort	Overall	0	2	0.0%
Camp Digaru	Exterior	8	8	100.0%
Camp Digaru	Overall	8	8	100.0%
Chandrapur	Exterior	6	6	100.0%
Chandrapur	Overall	6	6	100.0%
Dhangiri Gaon	Exterior	0	3	0.0%
Dhangiri Gaon	Overall	0	3	0.0%
Don Bosco University Campus	Exterior	4	5	80.0%
Don Bosco University Campus	Overall	4	5	80.0%
Ethnic Village Resort	Exterior	1	1	100.0%
Ethnic Village Resort	Interior	0	1	0.0%
Ethnic Village Resort	Overall	1	2	50.0%
Kasutali Pathar	Exterior	0	2	0.0%
Kasutali Pathar	Overall	0	2	0.0%
Khamrenga	Exterior	4	5	80.0%
Khamrenga	Overall	4	5	80.0%
Moriyanak View Point	Interior	0	4	0.0%
Moriyanak View Point	Overall	0	4	0.0%
Nazirakhat Archeological Park	Exterior	0	2	0.0%
Nazirakhat Archeological Park	Overall	0	2	0.0%
PRP Village	Exterior	2	2	100.0%
PRP Village	Interior	0	1	0.0%
PRP Village	Overall	2	3	66.7%
Panbari Camp	Exterior	18	18	100.0%
Panbari Camp	Interior	1	3	33.3%
Panbari Camp	Overall	19	21	90.5%
Pothali Pani Camp	Interior	0	3	0.0%
Pothali Pani Camp	Overall	0	3	0.0%
Punjabari Forest Camp	Interior	0	1	0.0%
Punjabari Forest Camp	Overall	0	1	0.0%
Tepesia	Exterior	2	2	100.0%
Tepesia	Interior	0	1	0.0%
Tepesia	Overall	2	3	66.7%

top_regions_data <- region_summary_rate %>%
  filter(`Interior/Exterior` != "Overall") %>%
  group_by(Region) %>%
  summarise(
    Overall_Rate = mean(as.numeric(sub("%", "", Detection_Rate)) / 100, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(Overall_Rate)) %>%
  slice_head(n = 10)

top_regions_plot <- region_summary_rate %>%
  filter(`Interior/Exterior` != "Overall", Region %in% top_regions_data$Region) %>%
  mutate(Rate_Numeric = as.numeric(sub("%", "", Detection_Rate)) / 100) %>%
  arrange(Region, desc(Rate_Numeric))  

ggplot(top_regions_plot, aes(x = Rate_Numeric, y = reorder(Region, Rate_Numeric), fill = `Interior/Exterior`)) +
  geom_bar(stat = "identity", position = "dodge", alpha = 0.8, width = 0.7) +
  geom_text(aes(label = Detection_Rate), hjust = -0.1, size = 3) +
  labs(title = "Top 10 Regions: Detection Rates by Habitat (Trek-Based)",
       x = "Detection Rate", y = "Region",
       fill = "Habitat") +
  scale_x_continuous(labels = percent) +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 10),
        plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Set1")

5. Core analysis (Chi square and Jacob’s tests)

The core analysis uses chi-square tests to assess if hornbill detections are associated with habitat type (interior vs. exterior), beyond chance or effort. Contingency tables compare observed detections against trek counts (N), testing independence. Jacob’s Electivity Index quantifies preference/avoidance relative to expected (proportional to time spent). Tests are run at three scales: overall habitat, range-specific, and region-specific.

Why Divide by Time? Trek durations vary (e.g., some are 30 minutes, others 2 hours), so raw detection counts alone can bias results: longer treks naturally offer more chances to spot hornbills. Dividing detections by total time (hours) yields a detection rate per hour, normalizing for survey effort. This makes comparisons fair across habitats, ranges, or regions, revealing true patterns in hornbill activity rather than just sampling intensity. Without this, shorter interior treks might falsely appear “safer” for hornbills due to less exposure time.

Across the dataset, this yields 74.13 total hours of effort and 48 detections, for an overall rate of ~0.65 per hour.

5.1 Overall Habitat Test

Purpose: This test evaluates whether hornbill detections differ significantly between interior (core, undisturbed forest) and exterior (edge/buffer zones, often with more human activity) habitats. By comparing detections to trek numbers, it checks if habitat influences sighting probability, controlling for sampling effort. This helps identify if hornbills avoid or prefer edges, informing conservation (e.g., buffer zone management).

Null Hypothesis (H0): Detection frequency is independent of habitat type—hornbills are equally likely to be detected in interior and exterior areas, proportional to treks conducted (no preference or avoidance).

Alternative Hypothesis (H1): Detection frequency depends on habitat type—hornbills show preference for or avoidance of one habitat, leading to uneven detections.

Chi-Square Test Code and Output: The code groups data by habitat, summarizes detections and treks, computes expected counts (proportional to total time), and applies Jacob’s Index. The chi-square uses a 2x2 contingency table (rows: habitat; columns: detections vs. treks).

habitat_summary <- trek_data %>%
  group_by(`Interior/Exterior`) %>%
  summarise(
    Total_Detections = sum(Detection_flag, na.rm = TRUE),
    Total_Time = sum(Duration_hours, na.rm = TRUE),
    Detection_per_Hour = round(Total_Detections / Total_Time, 3),
    N = n()
  )

total_detections <- sum(habitat_summary$Total_Detections)
total_time <- sum(habitat_summary$Total_Time)

habitat_summary <- habitat_summary %>%
  mutate(
    Expected_Counts = (Total_Time / total_time) * total_detections,
    Jacobs_Index = round((Total_Detections - Expected_Counts) /
                         (Total_Detections + Expected_Counts), 2)
  )


kable(habitat_summary, caption = "Interior vs Exterior")

Interior vs Exterior
Interior/Exterior	Total_Detections	Total_Time	Detection_per_Hour	N	Expected_Counts	Jacobs_Index
Exterior	46	50.70000	0.907	59	32.82734	0.17
Interior	2	23.43333	0.085	22	15.17266	-0.77

contingency_table <- as.matrix(habitat_summary[, c("Total_Detections", "N")])
rownames(contingency_table) <- habitat_summary$`Interior/Exterior`
chi_result_1 <- chisq.test(contingency_table)
chi_result_1


    Pearson's Chi-squared test with Yates' continuity correction

data:  contingency_table
X-squared = 9.0593, df = 1, p-value = 0.002614

Chi-Square Result: χ² = 9.06, df = 1, p = 0.003 (significant; reject H0, evidence of association).

With p < 0.05, we reject H0—there’s a significant association (exteriors have disproportionately more detections). Yates’ correction adjusts for small expected values (<5 in interior).

Jacob’s Index Explanation: This index measures habitat selection. It ranges from -1 (complete avoidance) to +1 (complete preference), with 0 neutral. Here, exterior’s +0.17 suggests mild preference (hornbills use edges slightly more than availability), while interior’s -0.77 indicates strong avoidance (possibly due to deeper forest inaccessibility or lower food resources). This aligns with the low interior rate (0.085/hour vs. 0.907 exterior), even after time normalization.

5.2 Range Specific test

Purpose: Ranges (Bonda, Khanapara) represent administrative divisions with potentially varying ecology or disturbance. This test checks if habitat biases persist within each, revealing local patterns (e.g., Bonda’s denser forest might amplify avoidance).

Null Hypothesis (H0): Within each range, detections are independent of habitat (proportional to treks).

Alternative Hypothesis (H1): Within each range, detections associate with habitat (local preference/avoidance).

Chi-Square Test Code and Output: Similar to overall, but grouped by range first, with per-range expectations and separate chi-squares.

range_summary <- trek_data %>%
  group_by(Range, `Interior/Exterior`) %>%
  summarise(
    Total_Detections = sum(Detection_flag, na.rm = TRUE),
    Total_Time = sum(Duration_hours, na.rm = TRUE),
    Detection_per_Hour = round(Total_Detections / Total_Time, 3),
    N = n(),
    .groups = "drop"
  )

range_summary <- range_summary %>%
  group_by(Range) %>%
  mutate(
    Total_Detections_Range = sum(Total_Detections),
    Total_Time_Range = sum(Total_Time),
    Expected_Counts = (Total_Time / Total_Time_Range) * Total_Detections_Range,
    Jacobs_Index = round((Total_Detections - Expected_Counts) /
                         (Total_Detections + Expected_Counts), 2)
  )

kable(range_summary, caption = "Range-based Interior vs Exterior")

Range-based Interior vs Exterior
Range	Interior/Exterior	Total_Detections	Total_Time	Detection_per_Hour	N	Total_Detections_Range	Total_Time_Range	Expected_Counts	Jacobs_Index
Bonda Range	Exterior	40	26.45000	1.512	41	42	35.05000	31.694722	0.12
Bonda Range	Interior	2	8.60000	0.233	11	42	35.05000	10.305278	-0.67
Khanapara Range	Exterior	6	24.25000	0.247	18	6	39.08333	3.722815	0.23
Khanapara Range	Interior	0	14.83333	0.000	11	6	39.08333	2.277185	-1.00

chi_results_range <- range_summary %>%
  group_by(Range) %>%
  group_map(~ chisq.test(as.matrix(.x[, c("Total_Detections", "N")])))

chi_results_range

[[1]]

    Pearson's Chi-squared test with Yates' continuity correction

data:  as.matrix(.x[, c("Total_Detections", "N")])
X-squared = 3.9534, df = 1, p-value = 0.04678


[[2]]

    Pearson's Chi-squared test with Yates' continuity correction

data:  as.matrix(.x[, c("Total_Detections", "N")])
X-squared = 1.7922, df = 1, p-value = 0.1807

Chi-Square Results:

Bonda: χ² = 3.95, df = 1, p = 0.047 (significant; reject H0).
Khanapara: χ² = 1.79, df = 1, p = 0.181 (not significant; fail to reject H0).

Bonda shows clear habitat bias (high exterior rate: 1.512/hour), while Khanapara’s low detections reduce power.

Jacob’s Index Explanation: Computed per range (using range-specific time). Bonda: Exterior +0.12 (mild preference), interior -0.67 (avoidance). Khanapara: Exterior +0.23 (mild preference), interior -1.00 (complete avoidance, no detections despite effort). Time division ensures indices reflect availability, not just raw counts e.g., Khanapara interiors had 14.83 hours but zero sightings.