Trek Data Analysis: Wildlife Detection Patterns in Interior vs. Exterior Habitats
1. Introduction
This report analyzes hornbill detection data from 81 trekking surveys in a protected area, focusing on habitat preferences (interior vs. exterior) across ranges and regions. Metrics include detection rates and Jacob’s Electivity Index, with chi-square tests for significance. Analysis uses R on data from data_sheet.xlsx.
2. Dataset Columns
The dataset includes the following columns:
Column name
Description
Region
Specific survey location (e.g., Panbari Camp, a unique site within a range)
Date
Date of the trek (format: YYYY-MM-DD)
Start time
Trek start time (time of day, e.g., HH:MM:SS)
End time
Trek end time (time of day, e.g., HH:MM:SS)
Distance covered (km)
Length of the trek route in kilometers
Detection
Binary indicator of hornbill sighting (“Yes” or “No”)
Detection Id(s)
Unique identifier(s) for detected hornbills (if applicable; may be blank)
Interior/Exterior
Habitat type (“Interior” for core forest zones; “Exterior” for edge/buffer areas)
Time bucket
Categorized time period of the trek
Range
Broader administrative area (in this scenario there are two ranges: Bonda Range and Khanapara Range)
Trek/Point
Type of survey method (“Trek” for line transects; “Point” for stationary counts)
3. Preliminary Data Analysis
Basic exploration provides an overview of data structure, distributions, and trends before habitat comparisons.
3.1. Basic summary of distance covered in the treks
Distance covered (km) Detection
Min. :0.0000 Length:81
1st Qu.:0.0000 Class :character
Median :0.4500 Mode :character
Mean :0.9195
3rd Qu.:1.4400
Max. :4.2400
3.2. Plot of detections by date
trek_data$Date <-as.Date(trek_data$Date)ggplot(trek_data, aes(x = Date, fill = Detection)) +geom_bar() +labs(title ="Hornbill Detections by Date", x ="Date", y ="Number of Treks") +theme_minimal()
3.3. Histogram of trek durations
trek_data$Duration_min <-as.numeric(difftime(as.POSIXct(trek_data$`End time`), as.POSIXct(trek_data$`Start time`), units ="mins"))ggplot(trek_data, aes(x = Duration_min)) +geom_histogram(bins =20, fill ="skyblue", color ="black") +labs(title ="Distribution of Trek Durations (Minutes)", x ="Duration (min)", y ="Frequency") +theme_minimal()
These steps reveal 81 rows (treks), variable distances (mean ~0.9 km), and an overall detection rate of ~0.593. The date plot shows temporal patterns, while the duration histogram indicates most treks last 30-90 minutes, aiding effort standardization.
Detection Rates: By Range and Habitat (Trek-Based)
Range
Interior/Exterior
Total_Detections
N_Treks
Detection_Rate
Bonda Range
Exterior
40
41
97.6%
Bonda Range
Interior
2
11
18.2%
Bonda Range
Overall
42
52
80.8%
Khanapara Range
Exterior
6
18
33.3%
Khanapara Range
Interior
0
11
0.0%
Khanapara Range
Overall
6
29
20.7%
ggplot(range_summary_rate %>%filter(`Interior/Exterior`!="Overall"), aes(x = Range, y = Detection_Rate, fill =`Interior/Exterior`)) +geom_bar(stat ="identity", position ="dodge", alpha =0.7) +labs(title ="Hornbill Detection Rates by Range and Habitat (Trek-Based)", y ="Detection Rate", x ="Range") +theme_minimal() +scale_fill_brewer(palette ="Set1") +theme(axis.text.x =element_text(angle =45, hjust =1))
4.3 Region and Region per Interior/Exterior detection rate
The core analysis uses chi-square tests to assess if hornbill detections are associated with habitat type (interior vs. exterior), beyond chance or effort. Contingency tables compare observed detections against trek counts (N), testing independence. Jacob’s Electivity Index quantifies preference/avoidance relative to expected (proportional to time spent). Tests are run at three scales: overall habitat, range-specific, and region-specific.
Why Divide by Time? Trek durations vary (e.g., some are 30 minutes, others 2 hours), so raw detection counts alone can bias results: longer treks naturally offer more chances to spot hornbills. Dividing detections by total time (hours) yields a detection rate per hour, normalizing for survey effort. This makes comparisons fair across habitats, ranges, or regions, revealing true patterns in hornbill activity rather than just sampling intensity. Without this, shorter interior treks might falsely appear “safer” for hornbills due to less exposure time.
Across the dataset, this yields 74.13 total hours of effort and 48 detections, for an overall rate of ~0.65 per hour.
5.1 Overall Habitat Test
Purpose: This test evaluates whether hornbill detections differ significantly between interior (core, undisturbed forest) and exterior (edge/buffer zones, often with more human activity) habitats. By comparing detections to trek numbers, it checks if habitat influences sighting probability, controlling for sampling effort. This helps identify if hornbills avoid or prefer edges, informing conservation (e.g., buffer zone management).
Null Hypothesis (H0): Detection frequency is independent of habitat type—hornbills are equally likely to be detected in interior and exterior areas, proportional to treks conducted (no preference or avoidance).
Alternative Hypothesis (H1): Detection frequency depends on habitat type—hornbills show preference for or avoidance of one habitat, leading to uneven detections.
Chi-Square Test Code and Output: The code groups data by habitat, summarizes detections and treks, computes expected counts (proportional to total time), and applies Jacob’s Index. The chi-square uses a 2x2 contingency table (rows: habitat; columns: detections vs. treks).
Pearson's Chi-squared test with Yates' continuity correction
data: contingency_table
X-squared = 9.0593, df = 1, p-value = 0.002614
Chi-Square Result: χ² = 9.06, df = 1, p = 0.003 (significant; reject H0, evidence of association).
With p < 0.05, we reject H0—there’s a significant association (exteriors have disproportionately more detections). Yates’ correction adjusts for small expected values (<5 in interior).
Jacob’s Index Explanation: This index measures habitat selection. It ranges from -1 (complete avoidance) to +1 (complete preference), with 0 neutral. Here, exterior’s +0.17 suggests mild preference (hornbills use edges slightly more than availability), while interior’s -0.77 indicates strong avoidance (possibly due to deeper forest inaccessibility or lower food resources). This aligns with the low interior rate (0.085/hour vs. 0.907 exterior), even after time normalization.
5.2 Range Specific test
Purpose: Ranges (Bonda, Khanapara) represent administrative divisions with potentially varying ecology or disturbance. This test checks if habitat biases persist within each, revealing local patterns (e.g., Bonda’s denser forest might amplify avoidance).
Null Hypothesis (H0): Within each range, detections are independent of habitat (proportional to treks).
Alternative Hypothesis (H1): Within each range, detections associate with habitat (local preference/avoidance).
Chi-Square Test Code and Output: Similar to overall, but grouped by range first, with per-range expectations and separate chi-squares.
Jacob’s Index Explanation: Computed per range (using range-specific time). Bonda: Exterior +0.12 (mild preference), interior -0.67 (avoidance). Khanapara: Exterior +0.23 (mild preference), interior -1.00 (complete avoidance, no detections despite effort). Time division ensures indices reflect availability, not just raw counts e.g., Khanapara interiors had 14.83 hours but zero sightings.