Kinabalu Birdwing Butterfly. Credit: Murphy Ng
Kinabalu Birdwing Butterfly. Credit: Murphy Ng

Resilience and abundance of a rare butterfly in Borneo

Studies on a rare butterfly were conducted in 4 areas of Borneo: East and West Kalimantan (Indonesia), Sabah and Sarawak (Malaysia) (figure 1)

Figure 1: location of rare butterfly surveys. Top, locator box with red box showing location of Borneo. Bottom: location of 4 survey areas on the island of Borneo. West and East Kalimantan in Indonesia and Sabah and Sarawak in Malaysia.

Surveys were conducted weekly in each area for 104 weeks (2 years). With the adult butterfly having a life expectancy of one week, each week was considered a new generation. Data has been compiled into two data sets (abundance and resilience).

Two hypothesis are to be tested:

  1. The abundance of this rare butterfly species is different between monitoring sites.
  2. The resilience of this rare butterfly species is different between monitoring sites.

Resilience is measured by comparing the per capita growth rate from one generation to the next and abundance. A population is considered resilient if the population growth rate is low at high abundance (owinig to factors such as density dependence and inter-specific competition) and high at low abundance.

Setup

Firstly, set up the data for analysis.

Load datasets for resilience and abundance

abundance <- read.csv ("/Users/beckytooby/Library/CloudStorage/OneDrive-SwanseaUniversity/3. Year Three/BIO334_Data Analysis/BIO334_CW2/j.c.bull_15112024_butterfly_counts.csv")

resilience <- read.csv ("/Users/beckytooby/Library/CloudStorage/OneDrive-SwanseaUniversity/3. Year Three/BIO334_Data Analysis/BIO334_CW2/j.c.bull_15112024_butterfly_dynamics.csv")

Load required libraries (and install if required)

# List of required packages
required_packages <- c("ggplot2", "dplyr", "tidyverse", "RColorBrewer", "dunn.test", "report")

# Check and install missing packages
install_if_missing <- function(packages) {
  for (pkg in packages) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
      install.packages(pkg)}
  }
}

# Install packages that are missing
install_if_missing(required_packages)

# Load required libraries
lapply(required_packages, library, character.only = TRUE)

Set colours for all graphs to ensure consistency:

# Generate colours using the RColorBrewer Set 2 palette
site_colours <- brewer.pal(4, "Set2")  
names(site_colours) <- c("East.Kalimantan", "Sabah", "Sarawak", "West.Kalimantan")

Abundance

This dataset will be used to determine:

Hypothesis 1: The abundance of this rare butterfly species is different between monitoring sites

Key Questions:

1) What is the expected model formula?

  • that ABUNDANCE is a function of SITE: abundance~site

  • the null hypothesis is that there is no difference in abundance between sites.

2) What is the expected relationship between response and explanatory variables?

  • variable is categorical so we are testing for difference between sites, not a relationship.

3) What distribution best describes the data?

  • the type (e.g. count) of the response variable data.

    • our response variable is COUNTS, so non-negative integers

Firstly, inspect the data:

str(abundance)   # inspect data
## 'data.frame':    104 obs. of  5 variables:
##  $ Week           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sabah          : int  11 9 10 10 10 12 16 13 14 5 ...
##  $ Sarawak        : int  110 99 103 117 132 117 137 114 100 98 ...
##  $ West.Kalimantan: int  12 16 17 13 13 4 14 20 16 16 ...
##  $ East.Kalimantan: int  8 3 11 11 9 12 14 10 13 10 ...
names(abundance) # check headers
## [1] "Week"            "Sabah"           "Sarawak"         "West.Kalimantan"
## [5] "East.Kalimantan"
summary(abundance) # summarise data
##       Week            Sabah          Sarawak       West.Kalimantan
##  Min.   :  1.00   Min.   : 4.00   Min.   : 10.00   Min.   : 4.0   
##  1st Qu.: 26.75   1st Qu.: 8.00   1st Qu.: 22.00   1st Qu.:12.0   
##  Median : 52.50   Median :10.00   Median : 33.00   Median :14.0   
##  Mean   : 52.50   Mean   :10.19   Mean   : 52.82   Mean   :14.6   
##  3rd Qu.: 78.25   3rd Qu.:12.00   3rd Qu.: 96.00   3rd Qu.:17.0   
##  Max.   :104.00   Max.   :17.00   Max.   :137.00   Max.   :27.0   
##  East.Kalimantan 
##  Min.   : 3.000  
##  1st Qu.: 8.000  
##  Median :10.000  
##  Mean   : 9.596  
##  3rd Qu.:12.000  
##  Max.   :18.000

There are 4 sites, and 104 weeks of observations at each location. Each observation is an individual generation and so is independent.

For easier analysis, transform the data to long format with only 3 columns: Week, Abundance and Site

# Reshape data to long format
ab_long <- abundance %>%
  pivot_longer(cols = c(Sabah, Sarawak, West.Kalimantan, East.Kalimantan), 
               names_to = "Site", 
               values_to = "Abundance")

# Check conversion completed successfuly

str(ab_long)  
## tibble [416 × 3] (S3: tbl_df/tbl/data.frame)
##  $ Week     : int [1:416] 1 1 1 1 2 2 2 2 3 3 ...
##  $ Site     : chr [1:416] "Sabah" "Sarawak" "West.Kalimantan" "East.Kalimantan" ...
##  $ Abundance: int [1:416] 11 110 12 8 9 99 16 3 10 103 ...
names(ab_long) 
## [1] "Week"      "Site"      "Abundance"
summary(ab_long) 
##       Week            Site             Abundance    
##  Min.   :  1.00   Length:416         Min.   :  3.0  
##  1st Qu.: 26.75   Class :character   1st Qu.: 10.0  
##  Median : 52.50   Mode  :character   Median : 13.0  
##  Mean   : 52.50                      Mean   : 21.8  
##  3rd Qu.: 78.25                      3rd Qu.: 18.0  
##  Max.   :104.00                      Max.   :137.0
attach(ab_long)

Just to understand the dynamics of what is happening at each site, lets have a look at the abundance over time.

# Plot Abundance overtime with different colours for different sites: 

ggplot(ab_long, aes(x = Week, y = Abundance, colour = Site)) +
  geom_line() +
  labs(
    title = "", 
    x = "Week", 
    y = "Abundance"
  ) +
  scale_colour_manual(values = site_colours) +  
  theme_minimal() +  
  theme(
    panel.grid.major = element_blank(),   
    panel.grid.minor = element_blank(),   
    axis.line = element_line(colour = "lightgrey"),  
    axis.ticks = element_line(colour = "lightgrey"),  
    legend.position = "right",
    panel.background = element_rect(fill = "white", colour = "white")
  )

Create histogram to understand distributions:

# Generate ggplot histograms for each site
ggplot(ab_long, aes(x = Abundance, fill = Site)) +
  geom_histogram(binwidth = 1, colour = "black", size = 0.1, alpha = 0.8) + 
  scale_fill_manual(values = site_colours) + 
  facet_wrap(~ Site, ncol = 2, scales = "fixed") + 
  labs(
    title = "",
    x = "Abundance",
    y = "Count") +
  theme_minimal()+
  theme(
    panel.grid = element_blank(), 
    panel.background = element_blank(), 
    panel.border = element_blank(), 
    axis.line = element_line(colour = "black"), 
    axis.ticks = element_line(colour = "black"), 
    strip.background = element_blank(), 
    strip.text = element_text(face = "bold"), 
    panel.spacing = unit(1, "lines"),
    legend.position = "none"
  )

Sarawak is not normally distributed while the others appear to be normal.

Create box plot to visualise data another way:

# Visualise results in a boxplot 
ggplot(ab_long, aes(x = Site, y = Abundance, fill = Site)) +
  geom_boxplot() +
  labs(x = "Sites", y = "Abundance") +
  theme_bw() + 
  scale_fill_manual(values = site_colours)+
  theme(
    panel.grid.major = element_blank(),   
    panel.grid.minor = element_blank(),   
    axis.line = element_line(colour = "black"),  
    axis.ticks = element_line(colour = "black"),  
    legend.position = "right"
  )

Looks like a large difference between Sarawak and other sites. Lets check the relationship:

Another way to test if data is normally distributed using Shapiro Wilk test (where p values of less than 0.05 shows non parametric date

# Apply Shapiro-Wilk test for normality to each site
by_site_normality <- ab_long %>%
  group_by(Site) %>%
  summarise(p_value = shapiro.test(Abundance)$p.value)

print(by_site_normality)
## # A tibble: 4 × 2
##   Site                  p_value
##   <chr>                   <dbl>
## 1 East.Kalimantan 0.0776       
## 2 Sabah           0.0827       
## 3 Sarawak         0.00000000462
## 4 West.Kalimantan 0.143

As Sarawak is not normally distributed (p<0.05), a non parametric Kruskal Wallis test is used to test significance:

# Kruskal-Wallis test
kruskal.test(Abundance ~ Site, data = ab_long)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Abundance by Site
## Kruskal-Wallis chi-squared = 257.32, df = 3, p-value < 2.2e-16

The p-value is < 0.05, so the null hypothesis (there is no difference in abundance between sites) can be rejected. This means there is a significant difference in butterfly abundance between at least two of the monitoring sites. To determine which sites, a post hoc test is used (Dunn Test):

# Dunn test 
dunn_test_result <- dunn.test(Abundance, Site)
##   Kruskal-Wallis rank sum test
## 
## data: Abundance and Site
## Kruskal-Wallis chi-squared = 257.3239, df = 3, p-value = 0
## 
## 
##                         Comparison of Abundance by Site                        
##                                 (No adjustment)                                
## Col Mean-|
## Row Mean |   East.Kal      Sabah    Sarawak
## ---------+---------------------------------
##    Sabah |  -0.714080
##          |     0.2376
##          |
##  Sarawak |  -14.02971  -13.31563
##          |    0.0000*    0.0000*
##          |
## West.Kal |  -7.159291  -6.445210   6.870423
##          |    0.0000*    0.0000*    0.0000*
## 
## alpha = 0.05
## Reject Ho if p <= alpha/2
# Display results
dunn_test_result
## $chi2
## [1] 257.3239
## 
## $Z
## [1]  -0.7140803 -14.0297145 -13.3156342  -7.1592910  -6.4452107   6.8704235
## 
## $P
## [1] 2.375887e-01 5.128178e-45 9.388559e-41 4.054769e-13 5.771998e-11
## [6] 3.200578e-12
## 
## $P.adjusted
## [1] 2.375887e-01 5.128178e-45 9.388559e-41 4.054769e-13 5.771998e-11
## [6] 3.200578e-12
## 
## $comparisons
## [1] "East.Kalimantan - Sabah"           "East.Kalimantan - Sarawak"        
## [3] "Sabah - Sarawak"                   "East.Kalimantan - West.Kalimantan"
## [5] "Sabah - West.Kalimantan"           "Sarawak - West.Kalimantan"
Comparison Z p-value Significant (at p<0.05)
East Kalimantan vs Sabah -0.71 0.2376 No
East Kalimantan vs Sarawak -14.03 5.13 x 10-45 Yes
East Kalimantan vs West Kalimantan -7.16 4.05 x 10-13 Yes
Sabah vs Sarawak -13.32 9.39 x 10-41 Yes
Sabah vs West Kalimantan -6.45 5.77 x 10-11 Yes
Sarawak vs West Kalimantan 6.87 3.20 x 10-12 Yes

The abundance between East Kalimantan and Sabah was not significantly different (p = 0.2376), while significant differences were found between the other pairs of sites (East Kalimantan vs Sarawak, East Kalimantan vs West Kalimantan, Sabah vs Sarawak, Sabah vs West Kalimantan, and Sarawak vs West Kalimantan).


Resilience

Resilience is determined by the relationship between abundance and population growth rate (PGR).

This dataset will be used to determine:

Hypothesis 2: The resilience of this rare butterfly species is different between monitoring sites

Key Questions:

1) What is the expected model formula?

  • PGR is a function of ABUNDANCE at different sites
  • so formula would be PGR ~ Abundance * Site
  • the null hypothesis is that there is no difference in resilience at different sites

2) What is the expected relationship between response and explanatory variables?

  • looks like might be a negative linear relationship based on scatter plots below

3) What distribution best describes the data?

  • Response variable is a continuous variable with a normal distribution
  • Explanatory variables are count (non negative intergers) (Abundance) and categorical (Site)

Firstly, inspect the data:

str(resilience)   # inspect data
## 'data.frame':    103 obs. of  9 variables:
##  $ Week            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sabah           : int  11 9 10 10 10 12 16 13 14 5 ...
##  $ PGR.Sabah       : num  -0.201 0.105 0 0 0.182 0.288 -0.208 0.074 -1.03 0.182 ...
##  $ Sarawak         : int  110 99 103 117 132 117 137 114 100 98 ...
##  $ PGR.Sarawak     : num  -0.105 0.04 0.127 0.121 -0.121 0.158 -0.184 -0.131 -0.02 -0.052 ...
##  $ West.Kalimantan : int  12 16 17 13 13 4 14 20 16 16 ...
##  $ PGR.W.Kalimantan: num  0.288 0.061 -0.268 0 -1.179 ...
##  $ East.Kalimantan : int  8 3 11 11 9 12 14 10 13 10 ...
##  $ PGR.E.Kalimantan: num  -0.981 1.299 0 -0.201 0.288 ...
names(resilience) # check headers
## [1] "Week"             "Sabah"            "PGR.Sabah"        "Sarawak"         
## [5] "PGR.Sarawak"      "West.Kalimantan"  "PGR.W.Kalimantan" "East.Kalimantan" 
## [9] "PGR.E.Kalimantan"
summary(resilience) # summary of data
##       Week           Sabah         PGR.Sabah             Sarawak      
##  Min.   :  1.0   Min.   : 4.00   Min.   :-1.1790000   Min.   : 10.00  
##  1st Qu.: 26.5   1st Qu.: 8.00   1st Qu.:-0.2230000   1st Qu.: 22.00  
##  Median : 52.0   Median :10.00   Median : 0.0000000   Median : 33.00  
##  Mean   : 52.0   Mean   :10.19   Mean   :-0.0009709   Mean   : 53.16  
##  3rd Qu.: 77.5   3rd Qu.:12.00   3rd Qu.: 0.2370000   3rd Qu.: 96.00  
##  Max.   :103.0   Max.   :17.00   Max.   : 1.3220000   Max.   :137.00  
##   PGR.Sarawak       West.Kalimantan PGR.W.Kalimantan     East.Kalimantan 
##  Min.   :-0.48000   Min.   : 4.00   Min.   :-1.1790000   Min.   : 3.000  
##  1st Qu.:-0.12600   1st Qu.:12.00   1st Qu.:-0.2230000   1st Qu.: 8.000  
##  Median :-0.03000   Median :14.00   Median : 0.0000000   Median :10.000  
##  Mean   :-0.01757   Mean   :14.62   Mean   : 0.0000291   Mean   : 9.602  
##  3rd Qu.: 0.11450   3rd Qu.:17.00   3rd Qu.: 0.2700000   3rd Qu.:12.000  
##  Max.   : 0.45200   Max.   :27.00   Max.   : 1.2530000   Max.   :18.000  
##  PGR.E.Kalimantan   
##  Min.   :-1.386000  
##  1st Qu.:-0.336000  
##  Median : 0.000000  
##  Mean   : 0.001097  
##  3rd Qu.: 0.318000  
##  Max.   : 1.386000

There are 4 sites, and 103 weeks of observations at each location. Each observation is an individual generation.

Reshape data for easier graphing and analysis:

# Reshape Abundance values
res_long <- resilience %>%
  select(Week, Sabah, Sarawak, West.Kalimantan, East.Kalimantan) %>%
  pivot_longer(
    cols = -Week, 
    names_to = "Site", 
    values_to = "Abundance")

# Reshape PGR values
PGR_long <- resilience %>%
  select(Week, PGR.Sabah, PGR.Sarawak, PGR.W.Kalimantan, PGR.E.Kalimantan) %>%
  pivot_longer(cols = -Week, 
    names_to = "Site", 
    values_to = "PGR" ) %>%
  mutate(
    Site = recode(Site,
      PGR.Sabah = "Sabah",
      PGR.Sarawak = "Sarawak",
      PGR.W.Kalimantan = "West.Kalimantan",
      PGR.E.Kalimantan = "East.Kalimantan"))

# Combine abundance and PGR data
res_long <- res_long %>%
  left_join(PGR_long, by = c("Week", "Site"))

# Check dataset 
str(res_long)
## tibble [412 × 4] (S3: tbl_df/tbl/data.frame)
##  $ Week     : int [1:412] 1 1 1 1 2 2 2 2 3 3 ...
##  $ Site     : chr [1:412] "Sabah" "Sarawak" "West.Kalimantan" "East.Kalimantan" ...
##  $ Abundance: int [1:412] 11 110 12 8 9 99 16 3 10 103 ...
##  $ PGR      : num [1:412] -0.201 -0.105 0.288 -0.981 0.105 ...
names(res_long)
## [1] "Week"      "Site"      "Abundance" "PGR"
summary(res_long)
##       Week         Site             Abundance           PGR           
##  Min.   :  1   Length:412         Min.   :  3.00   Min.   :-1.386000  
##  1st Qu.: 26   Class :character   1st Qu.: 10.00   1st Qu.:-0.216250  
##  Median : 52   Mode  :character   Median : 13.00   Median : 0.000000  
##  Mean   : 52                      Mean   : 21.89   Mean   :-0.004354  
##  3rd Qu.: 78                      3rd Qu.: 18.00   3rd Qu.: 0.223000  
##  Max.   :103                      Max.   :137.00   Max.   : 1.386000

Create histogram to understand distributions:

# Generate ggplot histograms for each site
ggplot(res_long, aes(x = PGR, fill = Site)) +
  geom_histogram(binwidth = 0.2, colour = "black", size = 0.1, alpha = 0.8) + 
  scale_fill_manual(values = site_colours) + 
  facet_wrap(~ Site, ncol = 2, scales = "fixed") + 
  labs(
    title = "",
    x = "Population Growth Rate",
    y = "Count") +
  theme_minimal()+
  theme(
    panel.grid = element_blank(), 
    panel.background = element_blank(), 
    panel.border = element_blank(), 
    axis.line = element_line(colour = "black"), 
    axis.ticks = element_line(colour = "black"), 
    strip.background = element_blank(), 
    strip.text = element_text(face = "bold"), 
    panel.spacing = unit(1, "lines"),
    legend.position = "none"
  )

Data looks normally distributed (Gaussian) . Look at scatter plot of data:

# Create scatter plots for each site
ggplot(res_long, aes(x = Abundance, y = PGR, colour = Site)) +
  geom_point(alpha = 0.6) +                          
  facet_wrap(~Site, scales = "free") +              
  labs(
    title = "",
    x = "Abundance", 
    y = "Population Growth Rate (PGR)"
  ) +
  scale_colour_manual(values = site_colours) +     
  theme_minimal() +                                
  theme(
    panel.grid = element_blank(),                  
    panel.background = element_blank(),            
    panel.border = element_blank(),                
    strip.text = element_text(face = "bold"),      
    legend.position = "none",                      
    strip.background = element_blank(),            
    axis.line = element_line(colour = "lightgrey"),
    axis.ticks = element_line(colour = "lightgrey")
  )

These all look linear relationships so lets see what they look like together, with linear regression lines:

# Scatter plot with GLM lines
ggplot(res_long, aes(x = Abundance, y = PGR, colour = Site, group = Site)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) + 
  labs(
    title = "",
    x = "Abundance",
    y = "Population Growth Rate (PGR)"
  ) +
  scale_colour_manual (values = site_colours) + 
  theme_classic() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(colour = "lightgrey"),
    axis.ticks = element_line(colour = "lightgrey"),
    legend.title = element_text(size = 11))
## `geom_smooth()` using formula = 'y ~ x'

Lets start with a linear model and check the diagnostics and summary information

attach(res_long)

# Linear model with interaction 
m1 <- lm(PGR ~ Abundance * Site)

# inspect diagnostics
par( mfrow = c(2, 2) )
plot(m1)

# check summary
summary(m1)
## 
## Call:
## lm(formula = PGR ~ Abundance * Site)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.29582 -0.14643  0.01079  0.18670  0.69559 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    1.003646   0.095422  10.518  < 2e-16 ***
## Abundance                     -0.104411   0.009468 -11.028  < 2e-16 ***
## SiteSabah                      0.124612   0.146492   0.851  0.39547    
## SiteSarawak                   -1.004223   0.107562  -9.336  < 2e-16 ***
## SiteWest.Kalimantan            0.049562   0.149478   0.332  0.74039    
## Abundance:SiteSabah           -0.006361   0.014157  -0.449  0.65345    
## Abundance:SiteSarawak          0.104091   0.009498  10.959  < 2e-16 ***
## Abundance:SiteWest.Kalimantan  0.032381   0.012150   2.665  0.00801 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2944 on 404 degrees of freedom
## Multiple R-squared:  0.4438, Adjusted R-squared:  0.4341 
## F-statistic: 46.04 on 7 and 404 DF,  p-value: < 2.2e-16

Looks good but just to check, we can check with the additive model:

# Linear model with addition 
m2 <- lm(PGR ~ Abundance + Site)
par(mfrow = c(2,2))
plot(m2)

par (mfrow = c(1,1))

The second model (m2) doesn’t look right on the diagnostic plots. We can compare with model 1 to check

anova(m1,m2)
## Analysis of Variance Table
## 
## Model 1: PGR ~ Abundance * Site
## Model 2: PGR ~ Abundance + Site
##   Res.Df    RSS Df Sum of Sq     F    Pr(>F)    
## 1    404 35.006                                 
## 2    407 62.144 -3   -27.138 104.4 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The interaction model (m1) is the better model based on diagnostic plots, plus the p-value (<0.05) shows that m1 improves the model significantly over m2.

There is a site specific relationship between abundance and PGR, and we can look at the summary aov:

summary (aov(m1))
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## Abundance        1   0.55   0.553   6.379 0.0119 *  
## Site             3   0.24   0.079   0.911 0.4354    
## Abundance:Site   3  27.14   9.046 104.398 <2e-16 ***
## Residuals      404  35.01   0.087                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Abundance has a significant effect on PGR so when there is a higher abundance, there tends to be a lower rate of population growth.

Site is not significant on its own with regards to PGR, there is no clear overall difference in PGR between the four sites.

However, the interaction between the two (abundance AND site) is highly significant. So the effect of butterfly abundance on population growth (the resilience of the population) does vary across the four regions. The null hypothesis that there is no difference between sites can be rejected.

Looking at the coeficients from the m1 summary for each site:

summary_m1 <- summary(m1)

# Coefficients from the model summary
coefs <- summary_m1$coefficients

# Main effect for Abundance
abundance_coef <- coefs["Abundance", "Estimate"]
abundance_se <- coefs["Abundance", "Std. Error"]
abundance_t <- coefs["Abundance", "t value"]
abundance_p <- coefs["Abundance", "Pr(>|t|)"]

# Interaction coefficients for each site
sab <- coefs["Abundance:SiteSabah", "Estimate"]
sar <- coefs["Abundance:SiteSarawak", "Estimate"]
wkal <- coefs["Abundance:SiteWest.Kalimantan", "Estimate"]

# Interaction t-values and p-values
sab_t <- coefs["Abundance:SiteSabah", "t value"]
sab_p <- coefs["Abundance:SiteSabah", "Pr(>|t|)"]
sar_t <- coefs["Abundance:SiteSarawak", "t value"]
sar_p <- coefs["Abundance:SiteSarawak", "Pr(>|t|)"]
wkal_t <- coefs["Abundance:SiteWest.Kalimantan", "t value"]
wkal_p <- coefs["Abundance:SiteWest.Kalimantan", "Pr(>|t|)"]

# Standard errors for the interaction terms
sab_se <- coefs["Abundance:SiteSabah", "Std. Error"]
sar_se <- coefs["Abundance:SiteSarawak", "Std. Error"]
wkal_se <- coefs["Abundance:SiteWest.Kalimantan", "Std. Error"]

# Adjusted gradients for each site (main effect + interaction term)
gradients <- data.frame(
  Site = c("East Kalimantan", "Sabah", "Sarawak", "West Kalimantan"),
  Gradient = c(
    abundance_coef,  # East Kalimantan (reference site)
    abundance_coef + sab,  # Sabah
    abundance_coef + sar,  # Sarawak
    abundance_coef + wkal),   # West Kalimantan
   SE = c(
    abundance_se,  
    sab_se,  
    sar_se,  
    wkal_se), 
  t_value = c(
    abundance_t,  
    sab_t,  
    sar_t,  
    wkal_t ), 
  p_value = c(
    abundance_p,  
    sab_p,  
    sar_p,  
    wkal_p ))

# display results
print(gradients)
##              Site      Gradient          SE     t_value      p_value
## 1 East Kalimantan -0.1044110357 0.009467594 -11.0282544 6.694885e-25
## 2           Sabah -0.1107719655 0.014157074  -0.4493111 6.534482e-01
## 3         Sarawak -0.0003197324 0.009497881  10.9594234 1.204349e-24
## 4 West Kalimantan -0.0720301743 0.012150064   2.6650774 8.005564e-03
Site Gradient SE t-value p-value Significant?
East Kalimantan -0.1044 0.009 -11.03 6.69 x 10-25 Yes
Sabah -0.1108 0.014 -0.45 0.65 No
Sarawak -0.0003 0.009 10.96 1.20 x 10-24 Yes
West Kalimantan -0.0720 0.12 2.67 8.01 x 10-3 Yes

Carrying capacity

To work out the carrying capacity, we need to work out the abundance where the linear regression line crosses the 0.0 population growth rate line (y axis). As shown below:

# Fit the linear model for each Site
m1_by_site <- res_long %>%
  group_by(Site) %>%
  do(model = lm(PGR ~ Abundance, data = .))

par(mfrow = c(2, 2), mai = c(0.5, 0.5, 0.5, 0.5))

# Loop through each site to plot
for (site_name in unique(res_long$Site)) {
  site_data <- res_long %>% filter(Site == site_name) 
  site_model <- lm(PGR ~ Abundance, data = site_data) 
  intercept <- coef(site_model)[1] # Extract model coefficients
  slope <- coef(site_model)[2]
  x_intercept <- -intercept / slope  # Calculate the x-intercept (K)
  
  # Plot the scatter plot with regression line 
  plot(
    site_data$Abundance, site_data$PGR,
    col = site_colours[site_name],
    pch = 16, 
    xlab = "Abundance", 
    ylab = "PGR",
    main = site_name, 
    xlim = range(site_data$Abundance, na.rm = TRUE),
    ylim = range(site_data$PGR, na.rm = TRUE) )
  abline(site_model, col = site_colours[site_name], lwd = 2)
  abline(h = 0, lty = 2, col = "grey") # Add dashed horizontal line at PGR = 0
  legend(
    "topright", legend = paste("K =", round(x_intercept, 0)),
    bty = "n", text.col = site_colours[site_name]
  )
}