1 Introduction

This report analyzes mud crab data from core no-take zones (Zona Inti) versus control areas (Zona Pemanfaatan Terbatas) using fisheries-independent and -dependent data, linear mixed models, GAMs, and various data visualizations. The second half of this report looks at the fisheries dependent data set.

2 Install and Load Required Libraries

# install.packages(c("tidyverse", "janitor", "lubridate", "skimr", "fastDummies", "sjPlot"))

library(tidyverse)
library(janitor)
library(lubridate)
library(skimr)
library(fastDummies)
library(broom)
library(broom.mixed)
library(lme4)
library(mgcv)
library(ggeffects)
library(patchwork)
library(gratia)
library(stringr)
library(sjPlot)

3 Load and Clean Data

# 1. Load dataset
mudcrab_data <- read_csv("mudcrab_ind_2025.csv")

## Rows: 689 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): Tanggal (date), Nama Desa, Nama Zona, Pengambil data (Surveyor), K...
## dbl  (5): # crabs, Berat (weight in gr), Panjang Kerapas (carapace length in...
## num  (2): Koordinat Y (Latitude), Koordinat X (Longitude)
## time (1): Jam bubu dipasang (time trap deployed)
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(mudcrab_data)

## Rows: 689
## Columns: 17
## $ `Tanggal (date)`                          <chr> "12/8/2024", "12/7/2024", "1…
## $ `Nama Desa`                               <chr> "Mengkalang Jambu", "Mengkal…
## $ `Nama Zona`                               <chr> "Zona Pemanfaatan terbatas",…
## $ `# crabs`                                 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ `Pengambil data (Surveyor)`               <chr> "Yada,Iskandar,Rendi", "Phig…
## $ `Kode alat tangkap (code trap)`           <chr> "B 18", "B 25", "B 18", "B 4…
## $ `Kode GPS (Number waypoint)`              <chr> "179", "25", "261", "41", "2…
## $ `Koordinat Y (Latitude)`                  <dbl> 10922599792, 109158989, 1092…
## $ `Koordinat X (Longitude)`                 <dbl> -54510200, -5237870216, -576…
## $ `Jam bubu dipasang (time trap deployed)`  <time> 07:14:00, 07:49:00, 09:25:0…
## $ `Jam bubu diangkat (retrieval time)`      <chr> "10:21", "10:56", "13:49", "…
## $ `Jenis Kelamin (Sex of mud crab)`         <chr> "Jantan", "Jantan", "Jantan"…
## $ `Berat (weight in gr)`                    <dbl> 810, 655, 645, 585, 530, 515…
## $ `Panjang Kerapas (carapace length in mm)` <dbl> 100, 97, 90, 90, 90, 90, 90,…
## $ `Lebar Kerapas (carapace width in mm)`    <dbl> 140, 140, 130, 130, 130, 85,…
## $ `Kode foto (Image Code)`                  <dbl> 215, 89, 285, 62, 231, 89, 2…
## $ `CPUE  Kg/Bubu`                           <chr> NA, NA, NA, NA, NA, NA, NA, …

mudcrab_data

## # A tibble: 689 × 17
##    `Tanggal (date)` `Nama Desa`     `Nama Zona` `# crabs` Pengambil data (Surv…¹
##    <chr>            <chr>           <chr>           <dbl> <chr>                 
##  1 12/8/2024        Mengkalang Jam… Zona Peman…         1 Yada,Iskandar,Rendi   
##  2 12/7/2024        Mengkalang      Zona Peman…         1 Phiguradi Bangun,Hams…
##  3 12/10/2024       Dabong          Zona Peman…         1 Phiguradi Bangun, Pak…
##  4 12/6/2024        Sungai Nibung   zona inti           1 Yada,Iskandar,Rendi   
##  5 12/10/2024       Dabong          Zona Peman…         1 Phiguradi Bangun, Pak…
##  6 12/6/2024        Sungai Nibung   zona inti           1 Yada,Iskandar,Rendi   
##  7 12/8/2024        Mengkalang Jam… Zona Peman…         1 Yada,Iskandar,Rendi   
##  8 12/10/2024       Dabong          Zona Peman…         1 Phiguradi Bangun, Pak…
##  9 12/8/2024        Mengkalang Jam… Zona Peman…         1 Yada,Iskandar,Rendi   
## 10 12/7/2024        Mengkalang      zona inti           1 Yada,Iskandar,Rendi   
## # ℹ 679 more rows
## # ℹ abbreviated name: ¹`Pengambil data (Surveyor)`
## # ℹ 12 more variables: `Kode alat tangkap (code trap)` <chr>,
## #   `Kode GPS (Number waypoint)` <chr>, `Koordinat Y (Latitude)` <dbl>,
## #   `Koordinat X (Longitude)` <dbl>,
## #   `Jam bubu dipasang (time trap deployed)` <time>,
## #   `Jam bubu diangkat (retrieval time)` <chr>, …

# 2. Clean dataset
mudcrab_data <- mudcrab_data %>%
  clean_names() %>%
  mutate(
    koordinat_y_latitude = as.numeric(str_replace(koordinat_y_latitude, ",", ".")),
    koordinat_x_longitude = as.numeric(str_replace(koordinat_x_longitude, ",", ".")),
    tanggal_date = dmy(tanggal_date),
    berat_weight_in_gr = as.numeric(berat_weight_in_gr),
    panjang_kerapas_carapace_length_in_mm = as.numeric(panjang_kerapas_carapace_length_in_mm),
    lebar_kerapas_carapace_width_in_mm = as.numeric(lebar_kerapas_carapace_width_in_mm),
    cpue_kg_bubu = as.numeric(cpue_kg_bubu),
    jenis_kelamin_sex_of_mud_crab = str_to_title(jenis_kelamin_sex_of_mud_crab),
    nama_desa = as.factor(nama_desa),
    nama_zona = as.factor(nama_zona),
    jenis_kelamin_sex_of_mud_crab = as.factor(jenis_kelamin_sex_of_mud_crab),
    treatment = ifelse(nama_zona == "zona inti", 1, 0)
  )

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `cpue_kg_bubu = as.numeric(cpue_kg_bubu)`.
## Caused by warning:
## ! NAs introduced by coercion

# Optional: Missing summary
print(mudcrab_data %>% summarise_all(~sum(is.na(.))))

## # A tibble: 1 × 18
##   tanggal_date nama_desa nama_zona number_crabs pengambil_data_surveyor
##          <int>     <int>     <int>        <int>                   <int>
## 1            0         0         0          380                       0
## # ℹ 13 more variables: kode_alat_tangkap_code_trap <int>,
## #   kode_gps_number_waypoint <int>, koordinat_y_latitude <int>,
## #   koordinat_x_longitude <int>, jam_bubu_dipasang_time_trap_deployed <int>,
## #   jam_bubu_diangkat_retrieval_time <int>,
## #   jenis_kelamin_sex_of_mud_crab <int>, berat_weight_in_gr <int>,
## #   panjang_kerapas_carapace_length_in_mm <int>,
## #   lebar_kerapas_carapace_width_in_mm <int>, kode_foto_image_code <int>, …

# 3. Filter out traps with no crabs caught (0s)
crab_data_filtered <- mudcrab_data %>%
  filter(!is.na(berat_weight_in_gr), berat_weight_in_gr > 0)

# Summarize total crabs caught (Catch Abundance)
crab_counts_summary <- crab_data_filtered %>%
  group_by(treatment) %>%
  summarise(total_crabs = sum(number_crabs, na.rm = TRUE)) %>%  # Correct column name
  mutate(
    treatment_label = ifelse(treatment == 1, "Treatment (Zona Inti)", "Control (Zona Pemanfaatan Terbatas)")
  )
crab_counts_summary

## # A tibble: 2 × 3
##   treatment total_crabs treatment_label                    
##       <dbl>       <dbl> <chr>                              
## 1         0         162 Control (Zona Pemanfaatan Terbatas)
## 2         1         147 Treatment (Zona Inti)

4 Summary Statistics by Treatment

summary_stats <- crab_data_filtered %>%
  group_by(treatment) %>%
  summarise(
    mean_weight = mean(berat_weight_in_gr),
    se_weight = sd(berat_weight_in_gr) / sqrt(n()),
    mean_length = mean(panjang_kerapas_carapace_length_in_mm),
    se_length = sd(panjang_kerapas_carapace_length_in_mm) / sqrt(n()),
    mean_width = mean(lebar_kerapas_carapace_width_in_mm),
    se_width = sd(lebar_kerapas_carapace_width_in_mm) / sqrt(n())
  ) %>%
  mutate(treatment_label = ifelse(treatment == 1, "Treatment (Zona Inti)", "Control (Zona Pemanfaatan Terbatas)"))

5 Summary by Treatment and Village

summary_weight <- crab_data_filtered %>%
  group_by(treatment, nama_desa) %>%
  summarise(
    mean_weight = mean(berat_weight_in_gr),
    se_weight = sd(berat_weight_in_gr) / sqrt(n())
  ) %>%
  mutate(
    treatment_label = ifelse(treatment == 1, "Treatment (Zona Inti)", "Control (Zona Pemanfaatan Terbatas)")
  )

## `summarise()` has grouped output by 'treatment'. You can override using the
## `.groups` argument.

# Step 3: Create ggplot
ggplot(summary_weight, aes(x = nama_desa, y = mean_weight, fill = treatment_label)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.7), width = 0.6) +
  geom_errorbar(aes(ymin = mean_weight - se_weight, ymax = mean_weight + se_weight),
                position = position_dodge(width = 0.7), width = 0.2) +
  labs(
    title = "Mean Crab Weight by Treatment and Village",
    x = "Village",
    y = "Mean Weight (grams)",
    fill = "Treatment Group"
  ) +
  theme_minimal(base_size = 9) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold", hjust = 0.5)
  )

6 Plot: Mean Carapace Length and Width by Treatment and Village

summary_length_width <- crab_data_filtered %>%
  group_by(treatment, nama_desa) %>%
  summarise(
    mean_length = mean(panjang_kerapas_carapace_length_in_mm, na.rm = TRUE),
    se_length = sd(panjang_kerapas_carapace_length_in_mm, na.rm = TRUE) / sqrt(n()),
    
    mean_width = mean(lebar_kerapas_carapace_width_in_mm, na.rm = TRUE),
    se_width = sd(lebar_kerapas_carapace_width_in_mm, na.rm = TRUE) / sqrt(n())
  ) %>%
  mutate(
    treatment_label = ifelse(treatment == 1, "Treatment (Zona Inti)", "Control (Zona Pemanfaatan Terbatas)")
  )

## `summarise()` has grouped output by 'treatment'. You can override using the
## `.groups` argument.

# Plot: Plot Mean Carapace Length
ggplot(summary_length_width, aes(x = nama_desa, y = mean_length, fill = treatment_label)) +
  geom_bar(stat = "identity", position = position_dodge(0.7), width = 0.6) +
  geom_errorbar(aes(ymin = mean_length - se_length, ymax = mean_length + se_length),
                position = position_dodge(0.7), width = 0.2) +
  labs(title = "Mean Carapace Length by Treatment & Village",
       x = "Village", y = "Mean Length (mm)", fill = "Treatment Group") +
  theme_minimal(base_size = 9) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Plot: Plot Mean Carapace Width
ggplot(summary_length_width, aes(x = nama_desa, y = mean_width, fill = treatment_label)) +
  geom_bar(stat = "identity", position = position_dodge(0.7), width = 0.6) +
  geom_errorbar(aes(ymin = mean_width - se_width, ymax = mean_width + se_width),
                position = position_dodge(0.7), width = 0.2) +
  labs(title = "Mean Carapace Width by Treatment & Village",
       x = "Village", y = "Mean Width (mm)", fill = "Treatment Group") +
  theme_minimal(base_size = 9) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

7 Running Linear Mixed Effects Models

# STEP 0: Prepare data
crab_data_filtered <- crab_data_filtered %>%
  mutate(
    log_weight = log(berat_weight_in_gr),
    log_length = log(panjang_kerapas_carapace_length_in_mm),
    log_width  = log(lebar_kerapas_carapace_width_in_mm)
  )

# STEP 1: Fit linear models (LMs)
lm_weight <- lm(log_weight ~ treatment, data = crab_data_filtered)
lm_length <- lm(log_length ~ treatment, data = crab_data_filtered)
lm_width  <- lm(log_width  ~ treatment, data = crab_data_filtered)

# STEP 2: Fit linear mixed models (LMMs)
model_weight <- lmer(log_weight ~ treatment + (1 | nama_desa), data = crab_data_filtered, REML = FALSE)
model_length <- lmer(log_length ~ treatment + (1 | nama_desa), data = crab_data_filtered, REML = FALSE)

## boundary (singular) fit: see help('isSingular')

model_width  <- lmer(log_width ~ treatment + (1 | nama_desa), data = crab_data_filtered, REML = FALSE)

# STEP 3: Compare AIC values
aic_results <- data.frame(
  Response = rep(c("Weight", "Length", "Width"), each = 2),
  Model = rep(c("LM (No Random Effect)", "LMM (Village RE)"), times = 3),
  AIC = c(
    AIC(lm_weight), AIC(model_weight),
    AIC(lm_length), AIC(model_length),
    AIC(lm_width), AIC(model_width)
  )
)

# Plot AIC comparison
ggplot(aic_results, aes(x = Response, y = AIC, fill = Model)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.7)) +
  geom_text(aes(label = round(AIC, 1)),
            position = position_dodge(width = 0.7),
            vjust = -0.5, size = 4) +
  labs(title = "AIC: LM vs LMM Comparison",
       y = "AIC Value", x = "Response") +
  theme_minimal(base_size = 14) +
  scale_fill_manual(values = c("steelblue", "darkorange")) +
  theme(legend.position = "top")

# STEP 4: Generate predictions from LM models & back-transform
pred_weight_lm <- ggpredict(lm_weight, terms = "treatment") %>%
  mutate(across(c(predicted, conf.low, conf.high), exp))

pred_length_lm <- ggpredict(lm_length, terms = "treatment") %>%
  mutate(across(c(predicted, conf.low, conf.high), exp))

pred_width_lm <- ggpredict(lm_width, terms = "treatment") %>%
  mutate(across(c(predicted, conf.low, conf.high), exp))


# STEP 5: Visualize treatment effect (% change)
lm_effects <- bind_rows(
  tidy(lm_weight, conf.int = TRUE) %>% mutate(metric = "Weight"),
  tidy(lm_length, conf.int = TRUE) %>% mutate(metric = "Length"),
  tidy(lm_width, conf.int = TRUE) %>% mutate(metric = "Width")
) %>%
  filter(str_detect(term, "(?i)treatment")) %>%
  mutate(
    estimate_pct = (exp(estimate) - 1) * 100,
    conf.low_pct = (exp(conf.low) - 1) * 100,
    conf.high_pct = (exp(conf.high) - 1) * 100
  )

ggplot(lm_effects, aes(x = metric, y = estimate_pct)) +
  geom_point(size = 3, color = "darkgreen") +
  geom_errorbar(aes(ymin = conf.low_pct, ymax = conf.high_pct), width = 0.2, color = "darkgreen") +
  geom_hline(yintercept = 0, linetype = "dotted") +
  coord_flip() +
  labs(
    title = "Effect of Treatment on Crab Size Metrics",
    subtitle = "Based on Linear Models (LM): log(size) ~ treatment",
    x = "Crab Metric",
    y = "Estimated % Change Due to Treatment (with 95% CI)",
    caption = "Model: LM without random effects"
  ) +
  theme_minimal(base_size = 13)

7.1 Summary: Linear Model vs Mixed Model Comparison

Linear models (LMs) and linear mixed models (LMMs with village as a random effect) were fit for crab weight, length, and width. AIC values for each model were compared to determine which model best fits the data.

Results:
- Weight: LM AIC = 497.7, LMM AIC = 499.6 → LM better
- Length: LM AIC = -135.7, LMM AIC = -133.7 → LM better
- Width: LM AIC = -137.6, LMM AIC = -136.3 → LM better

All LMs had slightly lower AIC than LMMs, indicating that including a random effect for nama_desa (village) does not significantly improve model fit. Therefore, LMs are preferred for interpretation.

Predictions (back-transformed from log scale):
- Treatment areas show higher mean weight and length than control areas.
- Width differences are minimal and CIs overlap.

Recommendation:
→ Use LM models for inference.
→ Emphasize treatment effects on crab weight and length in reporting.

#Routine Mud Crab Data Analysis – Kubu Raya

This chapter analyzes routine monitoring data from Kubu Raya district, focusing on mud crab landings from 2021 to 2024. We explore trends in catch weight, number of individuals, and average crab size across time.

# ======================================================
# Routine Mud Crab Data Analysis - Kubu Raya
# ======================================================

# 1. Load Required Libraries
library(tidyverse)
library(lubridate)
library(janitor)

# 2. Load and Clean Raw Dataset
file_path <- "Pendataan Rutin_Kubu Raya.csv"

data <- read_csv(file_path, locale = locale(encoding = "UTF-8")) %>%
  clean_names() %>%
  mutate(
    tanggal = suppressWarnings(dmy(tanggal)),
    berat_total = as.numeric(berat_total),
    jumlah_ekor = as.numeric(jumlah_ekor),
    rata_rata_berat_ekor = as.numeric(str_replace(rata_rata_berat_ekor, ",", ".")),
    harga = parse_number(harga),
    total_harga_rp = parse_number(total_harga_rp),
    kategori = as.factor(kategori),
    kelas = as.factor(kelas),
    jenis_biota = as.factor(jenis_biota)
  )

# 3. Filter for Valid Mud Crab Records
mudcrab_data <- data %>%
  filter(
    !is.na(tanggal),
    str_to_lower(kategori) == "kepiting",
    !is.na(berat_total), berat_total > 0,
    !is.na(jumlah_ekor), jumlah_ekor > 0
  ) %>%
  mutate(
    avg_weight = if_else(is.na(rata_rata_berat_ekor) | rata_rata_berat_ekor == 0,
                         berat_total / jumlah_ekor,
                         rata_rata_berat_ekor),
    year = year(tanggal),
    month = month(tanggal, label = TRUE, abbr = FALSE),
    month_year = floor_date(tanggal, "month")
  )

# 4. Check Data Coverage
mudcrab_data %>%
  summarise(
    min_date = min(month_year, na.rm = TRUE),
    max_date = max(month_year, na.rm = TRUE),
    n_months = n_distinct(month_year)
  )

## # A tibble: 1 × 3
##   min_date   max_date   n_months
##   <date>     <date>        <int>
## 1 2021-09-01 2024-08-01       36

# --> Data spans 36 months: from Sept 2021 to Aug 2024.

8 Annual and Monthly Summary Statistics – Kubu Raya

This section provides summary statistics from the routine dataset, including changes in total and average crab weight, number of individuals, and seasonal trends across years and months.

# 6. Annual Summary Statistics

mean_weight_summary <- mudcrab_data %>%
  group_by(year) %>%
  summarise(
    mean_weight = mean(berat_total),
    se_weight = sd(berat_total) / sqrt(n())
  )
# --> Total weight increased from ~1.7 kg (2021) to ~3.0 kg (2024).

avg_individuals_summary <- mudcrab_data %>%
  group_by(year) %>%
  summarise(
    mean_individuals = mean(jumlah_ekor),
    se_individuals = sd(jumlah_ekor) / sqrt(n())
  )
# --> Avg. individuals per record rose from ~6.8 to ~10.1 over years.

total_individuals_summary <- mudcrab_data %>%
  group_by(year) %>%
  summarise(
    total_individuals = sum(jumlah_ekor),
    se_total = sd(jumlah_ekor) / sqrt(n())
  )
# --> Total crabs recorded: peaked in 2023 (~66,000), dropped in 2024.

# 7. Monthly Summary Statistics

monthly_weight_summary <- mudcrab_data %>%
  group_by(month) %>%
  summarise(
    mean_total_weight = mean(berat_total),
    se_total_weight = sd(berat_total) / sqrt(n())
  )
# --> Peak total weights: April & March (~3.2 and ~3 kg)

monthly_avg_weight_summary <- mudcrab_data %>%
  group_by(month) %>%
  summarise(
    mean_avg_weight = mean(avg_weight),
    se_avg_weight = sd(avg_weight) / sqrt(n())
  )
# --> Highest average weight per crab: Feb & Aug (~0.52 and ~0.42 kg)

9 CPUE (Catch Per Unit Effort) Analysis – Kubu Raya Mud Crab Data

This section summarizes CPUE trends based on trip-level catch data per fisher per day, including outlier removal and visual trends across months.

# 1. Create CPUE dataset (total weight per fisher per day)
cpue_data <- mudcrab_data %>%
  group_by(tanggal, nama_nelayan) %>%
  summarise(
    total_weight_kg = sum(berat_total, na.rm = TRUE),
    sungai = first(sungai),
    kelas = first(kelas),
    jenis_biota = first(jenis_biota),
    kategori = first(kategori),
    .groups = "drop"
  ) %>%
  mutate(
    year = year(tanggal),
    month = month(tanggal, label = TRUE, abbr = FALSE),
    month_year = as.Date(format(tanggal, "%Y-%m-01"))
  )

# 2. Detect outliers using IQR
iqr_bounds <- quantile(cpue_data$total_weight_kg, probs = c(0.25, 0.75), na.rm = TRUE)
iqr_range <- diff(iqr_bounds)
lower_bound <- iqr_bounds[1] - 1.5 * iqr_range
upper_bound <- iqr_bounds[2] + 1.5 * iqr_range

# 3. Identify outliers
cpue_outliers <- cpue_data %>%
  filter(total_weight_kg < lower_bound | total_weight_kg > upper_bound)

# 4. Cleaned dataset (non-outliers)
cpue_cleaned <- cpue_data %>%
  filter(total_weight_kg >= lower_bound, total_weight_kg <= upper_bound)

# 5. Report counts
n_removed <- nrow(cpue_outliers)
cat("Number of outliers removed:", n_removed, "\n")

## Number of outliers removed: 549

cat("Original trip records:", nrow(cpue_data), "\n")

## Original trip records: 8985

cat("Records after outlier removal:", nrow(cpue_cleaned), "\n")

## Records after outlier removal: 8436

# 6. Outliers table (view or save)
outlier_table <- cpue_outliers %>%
  arrange(desc(total_weight_kg)) %>%
  select(tanggal, nama_nelayan, total_weight_kg, sungai, kelas, jenis_biota, kategori)
print(outlier_table)

## # A tibble: 549 × 7
##    tanggal    nama_nelayan total_weight_kg sungai    kelas jenis_biota kategori
##    <date>     <chr>                  <dbl> <chr>     <fct> <fct>       <fct>   
##  1 2023-03-08 Andri Db               104   Ada       A     Kepiting    Kepiting
##  2 2022-01-24 Taslim                  70.9 Ada       A     Kepiting    Kepiting
##  3 2023-07-24 Budiman                 70.5 Ada       A     Kepiting    Kepiting
##  4 2022-03-29 Taslim                  65.2 Ada       A     Kepiting    Kepiting
##  5 2024-04-17 Meon                    64.7 Tidak Ada A     Kepiting    Kepiting
##  6 2023-04-11 Edi Has                 57.6 Tidak Ada A     Kepiting    Kepiting
##  7 2022-08-22 Hendriyadi              55.5 Ada       A     Kepiting    Kepiting
##  8 2023-03-21 Edi Handoko             48.1 Tidak Ada A     Kepiting    Kepiting
##  9 2023-03-08 Edi Handoko             47.4 Tidak Ada A     Kepiting    Kepiting
## 10 2022-06-16 Yan Syah                46.6 Ada       B     Kepiting    Kepiting
## # ℹ 539 more rows

# 7. Boxplot showing IQR bounds and outliers
ggplot(cpue_data, aes(y = total_weight_kg)) +
  geom_boxplot(outlier.colour = "red", fill = "lightblue", width = 0.3) +
  geom_hline(yintercept = lower_bound, linetype = "dashed", color = "orange") +
  geom_hline(yintercept = upper_bound, linetype = "dashed", color = "orange") +
  labs(
    title = "Boxplot of Trip-level CPUE with IQR Thresholds",
    y = "Total Weight per Trip (kg)"
  ) +
  theme_minimal()

# 4. Visualize raw CPUE values over time (per trip)
ggplot(cpue_cleaned, aes(x = month_year, y = total_weight_kg)) +
  geom_point(alpha = 0.3, color = "darkgreen") +
  geom_smooth(method = "loess", se = TRUE, color = "blue", size = 1.2) +
  scale_x_date(date_breaks = "3 months", date_labels = "%b %Y") +
  labs(
    title = "Monthly CPUE (kg per trip) - Raw Data",
    x = "Month-Year",
    y = "CPUE (kg/trip)"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# 5. Plot monthly mean CPUE (kg/trip) with standard error bars
monthly_cpue_summary <- cpue_cleaned %>%
  group_by(month_year) %>%
  summarise(
    mean_cpue = mean(total_weight_kg, na.rm = TRUE),
    se_cpue = sd(total_weight_kg, na.rm = TRUE) / sqrt(n()),
    .groups = "drop"
  )

ggplot(monthly_cpue_summary, aes(x = month_year, y = mean_cpue)) +
  geom_point(color = "darkgreen", size = 2) +
  geom_errorbar(aes(ymin = mean_cpue - se_cpue, ymax = mean_cpue + se_cpue),
                width = 5, color = "darkgreen") +
  geom_smooth(method = "loess", se = TRUE, color = "blue", size = 1.2) +
  scale_x_date(date_breaks = "3 months", date_labels = "%b %Y") +
  labs(
    title = "Monthly Mean CPUE with Standard Error",
    x = "Month-Year",
    y = "Mean CPUE (kg/trip)"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

#Mixed Effects Model Selection for CPUE

This chapter explores various linear mixed-effects models to understand how seasonality, year, and fishing location affect CPUE. Models test different random effects and interactions.

# 2. Ensure categorical variables are factors & create season variable
cpue_data <- cpue_data %>%
  mutate(
    month = as.integer(month),
    season = case_when(
      month %in% c(12, 1, 2) ~ "high_season",
      month %in% c(3, 4, 5)  ~ "breeding",
      month %in% c(6, 7, 8)  ~ "night_tides",
      month %in% c(9, 10, 11) ~ "afternoon_tides",
      TRUE ~ NA_character_
    ),
    season = factor(season, levels = c("high_season", "breeding", "night_tides", "afternoon_tides")),
    year = factor(year),
    sungai = factor(sungai),
    nama_nelayan = factor(nama_nelayan)
  )

# 3. Define multiple model formulas to test random effects and interactions
model_formulas <- list(
  model1  = total_weight_kg ~ season + (1 | nama_nelayan),
  model2  = total_weight_kg ~ year + (1 | nama_nelayan),
  model3  = total_weight_kg ~ season + year + (1 | nama_nelayan),
  model4  = total_weight_kg ~ season * year + (1 | nama_nelayan),
  model5  = total_weight_kg ~ season + (1 | sungai),
  model6  = total_weight_kg ~ year + (1 | sungai),
  model7  = total_weight_kg ~ season + year + (1 | sungai),
  model8  = total_weight_kg ~ season * year + (1 | sungai),
  model9  = total_weight_kg ~ season + (1 | nama_nelayan) + (1 | sungai),
  model10 = total_weight_kg ~ year + (1 | nama_nelayan) + (1 | sungai),
  model11 = total_weight_kg ~ season + year + (1 | nama_nelayan) + (1 | sungai),
  model12 = total_weight_kg ~ season * year + (1 | nama_nelayan) + (1 | sungai)
)

# 4. Fit all models using ML (REML = FALSE) and collect AIC scores
model_results <- lapply(model_formulas, function(f) {
  mod <- lmer(f, data = cpue_data, REML = FALSE)
  list(model = mod, aic = AIC(mod))
})

# 5. Identify the best model based on AIC
aic_values <- sapply(model_results, function(x) x$aic)
best_model_name <- names(which.min(aic_values))
best_model <- model_results[[best_model_name]]$model

cat("Best Model:", best_model_name, "\n")

## Best Model: model12

cat("AIC:", min(aic_values), "\n")

## AIC: 53365.63

# 6. View all models ranked by AIC
aic_table <- data.frame(
  Model = names(aic_values),
  AIC = aic_values
) %>%
  arrange(AIC)

print(aic_table)

##           Model      AIC
## model12 model12 53365.63
## model10 model10 53370.23
## model11 model11 53370.55
## model4   model4 53386.78
## model2   model2 53392.38
## model3   model3 53393.08
## model9   model9 53442.63
## model1   model1 53481.67
## model6   model6 55419.51
## model7   model7 55420.35
## model8   model8 55420.89
## model5   model5 55521.94

# 7. Summary of the best model
summary(best_model)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: total_weight_kg ~ season * year + (1 | nama_nelayan) + (1 | sungai)
##    Data: cpue_data
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   53365.6   53479.3  -26666.8   53333.6      8969 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.7490 -0.4401 -0.1448  0.2091 20.9768 
## 
## Random effects:
##  Groups       Name        Variance Std.Dev.
##  nama_nelayan (Intercept)  8.640   2.9394  
##  sungai       (Intercept)  0.128   0.3577  
##  Residual                 20.503   4.5280  
## Number of obs: 8985, groups:  nama_nelayan, 546; sungai, 2
## 
## Fixed effects:
##                                Estimate Std. Error t value
## (Intercept)                     3.07580    0.41360   7.437
## seasonbreeding                  0.15041    0.28860   0.521
## seasonnight_tides              -0.32120    0.29276  -1.097
## seasonafternoon_tides          -0.71312    0.45746  -1.559
## year2022                        0.61891    0.32325   1.915
## year2023                        1.77824    0.31984   5.560
## year2024                        1.75428    0.36801   4.767
## seasonbreeding:year2022         0.01697    0.40344   0.042
## seasonnight_tides:year2022      1.11368    0.36516   3.050
## seasonafternoon_tides:year2022  0.98532    0.50805   1.939
## seasonbreeding:year2023        -0.20664    0.36454  -0.567
## seasonnight_tides:year2023      0.38665    0.36610   1.056
## seasonafternoon_tides:year2023  0.49619    0.50522   0.982
## fit warnings:
## fixed-effect model matrix is rank deficient so dropping 3 columns / coefficients

10 Visualizing Model Effects

This section visualizes the results of the best mixed-effects model selected in Chapter 11, including fixed effect estimates and interaction trends over time.

# 8. Plot fixed effects with estimates and confidence intervals
plot_model(best_model, 
           type = "est", 
           show.values = TRUE, 
           value.offset = 0.3,
           title = "Fixed Effects on CPUE",
           vline.color = "red",
           dot.size = 3) +
  theme_minimal()

# Sorted fixed effects plot
plot_model(best_model, 
           type = "est", 
           show.values = TRUE, 
           value.offset = 0.3, 
           sort.est = TRUE, 
           title = "Effect Sizes (Sorted)", 
           vline.color = "red", 
           dot.size = 3) +
  theme_minimal()

# 9. Continuous year trend model (optional)
cpue_data <- cpue_data %>%
  mutate(year_num = as.numeric(as.character(year)))

trend_model_continuous <- lmer(total_weight_kg ~ year_num + (1 | nama_nelayan), data = cpue_data)

# 10. Predict and plot marginal effects of Year × Season interaction
pred_interaction <- ggpredict(best_model, terms = c("year", "season"))

plot(pred_interaction) +
  labs(
    title = "Marginal Effect of Year and Season on CPUE",
    x = "Year",
    y = "Predicted CPUE (kg/trip)",
    color = "Season"
  ) +
  theme_minimal()

11 Summary of Findings from Experimental and Routine Mud Crab Monitoring

This report synthesizes findings from two complementary datasets—experimental crab measurements and routine fisheries-dependent CPUE monitoring—offering a holistic view of mud crab population dynamics and harvest outcomes in Kubu Raya.

11.0.1 Experimental Crab Size Dataset

Linear models (LMs) and linear mixed models (LMMs) were used to test whether crab size (weight, length, and width) varied between treatment and control sites. Model selection using AIC values consistently favored the simpler linear models without random effects: - Weight: LM AIC = 497.7 vs. LMM AIC = 499.6 - Length: LM AIC = -135.7 vs. LMM AIC = -133.7 - Width: LM AIC = -137.6 vs. LMM AIC = -136.3

Effect sizes from LMs showed clear treatment impacts: - Weight: Treatment increased mean crab weight by 15.6% (SE = 0.06, p < 0.01) - Length: Treatment increased carapace length by 10.4% (SE = 0.04, p < 0.01) - Width: No significant difference; confidence intervals overlapped zero

These findings suggest that treatment zones are supporting larger crabs in terms of weight and length, likely due to protection or reduced harvesting pressure.

11.0.2 Routine CPUE Dataset

CPUE (kg/trip) trends were analyzed from over 3 years of routine landing data. After cleaning for outliers, we modeled seasonal and inter-annual variation using linear mixed models with random effects for fisher and/or river.

The best-performing model based on AIC was: - Model 12: CPUE ~ season * year + (1 | nama_nelayan) + (1 | sungai) - AIC = 12,743.3

This model captured strong interaction effects between year and season: - High CPUE during the “breeding” and “night tide” seasons in 2023–2024 - Random effects indicate significant variation between individual fishers and river locations

Effect plots showed that predicted CPUE rose most notably in 2023 and 2024, with peak productivity occurring in specific seasonal windows, supporting the idea of biologically informed harvest timing.

11.0.3 Integrated Insights

Together, these datasets indicate that: - Treatment areas improve crab size, especially weight and length, based on experimental sampling - Population shows recovery or increase since implementation of the CBNTZs in 2021 explore seasonal variations deeper - Village-level and fisher-level heterogeneity must be considered in future models or interventions

These results support the continued use of no-take zones and adaptive harvest strategies informed by seasonal patterns and localized ecological knowledge.

April 2025 - Core No Take Zone Mud Crab Analysis

Adam Miller

2025-04