On this webpage we will look on how sand mining in Stradbroke Island has affected native ants, and we will (statitiscally) explore if rehabilation of mine sties has had any effect on the ants’ abundance and diversity
Figure 1. Ants vs Sand Minining, Who will win? Only time and stats will tell if Sand Mine Rehabilitation is Effective for increasing native ant diversity and abundance
Stradbroke Island, also known as Minjerribah, is located near the Moreton Bay, just 30 km away southeast of Brisbane CBD. Sand Mining on Stradbroke began in the 1950’s. Concerns about environmental impacts began a decade later, with cases of dredge mining. This type of mining levels high sand dunes and stripping native vegetation. Due to the high economic importance of minerals found in Stradbroke’s sands, like Silica, Rutile, Zircon, Ilimenite, sand mining kept going.
Silbelco Australia operated 3 mine sites, including the Enterprise Mine, which operated close to Ramsar-Listed wetlands but had a small buffer zone. After several environmental inicidents and legal disputes, Tte Enterpise Mine closed in 2019. Rehabilitation of the area is still underway.
Figure 2. Location of Stradbroke Island and Beachside Scenery of Stradbroke
To measure the efficacy of mine rehabilitation efforts, students analyze data collected from the sites.
Students of the Masters of Conservation Biology program traveled to Stradbroke Island to survey ant fauna and their response towards mine rehabilitation.
Each site is classified according to the year in which the rehabilitation happened, There are four distinct possibilities for the year of rehabilitation. At each of these possibilities, three representative sites are randomly chosen by the students, and paired to a nearby control site that has never been sandmined. That makes six sites at each “year of rehabilitation”.
Ants are collected using pitfall traps (See Figure 3). Each study site contains 10 traps, separated by a meter from each other. Ants are collected after one night of trapping.
Figure 3. Basic architecture of insect pitfalltrap (c) Bedfordshire Natural History Society
Fisrt lets look on how each species is represented in control and rehabilitation sites.
library("viridis")
## Loading required package: viridisLite
par(mar=c(11,5,2,2))
Xlocations <- barplot(Abundance_at_Control_Sites, beside=T, names.arg=c(names(Species_Names)), ylab="Abundance", xlab="", main="Graph1: Abundance of Ants - Control Sites", ylim=c(0,1500), las=2, cex.names=0.8, col="magma" (5))
text("<-Nylanderia obscura", y=300, x=33)
text("<-Aphaenogaster longiceps", y=1400, x=12)
Xlocations <- barplot(Abundance_at_Mine_Sites, beside=T, names.arg=c(names(Species_Names)), ylab="Abundance", xlab="", main="Graph 2: Abundance of Ants - Mine Sites", ylim=c(0,1500), las=2, cex.names=0.8, col="magma"(5))
text("<-Nylanderia obscura", y=1400, x=33)
text("<-Aphaenogaster longiceps", y=300, x=12)
In both of these graphs we can see that Aphaenogaster longiceps and Nylanderia obscura are amongst the most common abundant species across all sites. Aphaenogaster longiceps is more common in control sites while Nylanderia obscura is more common in the Mine Sites. The “overabundance” of Aphaenogaster longiceps could be explained by ant interspecific competition (Majer, 1985). This hints that either this two species are naturally the most common ones, but it could aslo suggest that the effects of rehabilitation are not signficant to restoring ant diversity.
For our two most common species we can see that the majority of observations lean towards zero, hinting that despite their supposed abundance, the records for species is quite small.
Lets take a closer look on how Species diversity compares to abundance
Ants<- read.csv("Ants.txt")
attach(Ants)
## The following objects are masked from Ants_2015_2019:
##
## Rehab_Year, Rep, Rep_Year, Samples, Site, Treatment
library("viridis")
par(mar=c(5.1, 6.1, 4.1, 4.1))
plot(Abundance,Diversity,main="Abundance vs Diversity",font=3,cex=1,cex.lab=1,pch=19,col=magma(50))
In this graph we can observe that Ant Diversity is clusterred towards lower Abundance. While this could be attributed to interspecific competition (Majer, 1985), we would need to transform the data to have a bettter understanding of the different effects (Site, Rehabilitation Year etc) are doing to Ant Diversity.
Here we will use scatter plots to see if the assumptions of our F-test have been violated.
Ants<- read.csv("Ants.txt")
attach(Ants) # make 'Ants' the default data set
library("viridis")
fit1 <- lm(Abundance ~ Diversity) # fit the regression line of best fit through the data
plot(fit1$fitted.values,fit1$residuals, col=magma(5))
qqnorm(fit1$residuals, col=magma(500))
Here we can observe that the dots on “qqnomr” plot do not form a straight line. This is telling us that are residuals are not gaussian, indicating that there is no statistical association between diversity and abundance.
Here we wil transform our Abundance data with “Log-Tansformation”. Since cannot calculate “Log 0” We add the value of +1 incase any of our exisitng values are zero . Let’s have a look at our new graph…
fit2 <- (log(Abundance+1) ~ Diversity)
plot(fit2,bty="n",main="Log Transformed Abundance vs Diversity",font=3,cex=1,cex.lab=1,pch=19,col=magma(50))
It seems that diversity and Abundance are positively associated with each other, but the graph does not gives us concrete evidence of such (maybe a small P value will confirm our theory)
Now lets do some diagnostics…
fit2 <- lm(log(Abundance+1) ~ Diversity) # try log-transforming 'Abundance'
plot(fit2$fitted.values,fit2$residuals, col=magma(500))
qqnorm(fit2$residuals, col=magma(500))
Again our qqplot does not form a straight line, indicating that our residuals are not gaussian. Our log-transformed graph at the beginning of this section is indicating us a possible association between Species and Diversity, we will now use an ANOVA to verify this. However, the scatter of our QQPLOT shows us that we need to be careful and we can only take P-Values that are not borderline (i.e. close to 0.05)
Ants<- read.csv("Ants.txt")
attach(Ants) # make 'Ants' the default data set
anova(fit2)
## Analysis of Variance Table
##
## Response: log(Abundance + 1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Diversity 1 96.06 96.057 83.975 < 2.2e-16 ***
## Residuals 789 902.52 1.144
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit2)
##
## Call:
## lm(formula = log(Abundance + 1) ~ Diversity)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0270 -1.0270 -0.3338 0.5825 5.3497
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.02698 0.04413 23.274 <2e-16 ***
## Diversity 1.01184 0.11042 9.164 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.07 on 789 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.09619, Adjusted R-squared: 0.09505
## F-statistic: 83.97 on 1 and 789 DF, p-value: < 2.2e-16
If we look at our Pvalue and our regression coefficient, we can observe that Ant Abundance and Diversity have a strong positive association. Even tho we used a log-transformation on Abundance, we can assume sand mining has the same effect because the “Abdunadce” and “logAbundance+1” have a one to one correspondance.
If we plot the results from our last Anova, we have a an overlap between the abundance of ants in control and rehabilitated site…
library("plotrix")
## Warning: package 'plotrix' was built under R version 4.1.1
Ants<- read.csv("Ants.txt")
attach(Ants) # make 'Ants' the default data set
groups <- split(log(Abundance+1),Treatment) # split data into "Treatment" groups
n <- sapply(groups,length) # calculate the sample size of each group
MSE <- deviance(fit2)/fit2$df.residual # calculate the residual error variance
se <- sqrt(MSE/n) # convert it to a standard error
t <- qt(0.975,df=fit2$df.residual) # use the t-distribution to define a 95% CI
size = t*se # set the size of each error bar
means <- sapply(groups ,mean, na.rm=TRUE) # calculate a mean at each level of Treatment
xloc = seq(1:length(groups)) # set locations on X-axis for the two treatments
plotCI( xloc, means, uiw=size, gap=0.1, xlim=c(min(xloc)-0.5,max(xloc)+0.5),
ylab=list("log(Abundance+1)",cex=1.75), xlab=list("Treatment",cex=1.75), xaxt="n")
# plot means and error bars
axis(side=1, at=xloc, labels=names(groups), cex.axis=1.5) # label the x-axis
This is telling us that Ant Abundance is not changing across sites, perphas hinting that the effect of Rehabilitation Efforts is not helping restore and Abundance. Since we already know that Ant Abundance is positively associated with Diversity, we can assume that changes in Ant abundance can reflect changes in Ant Diversity.
In the ANOVA table below we an observe that there is a highly significant interaction between the treatment and year of rehabilitation (P= 2.870e-05) .
Ants<- read.csv("Ants.txt")
attach(Ants) # make 'Ants' the default data set
full_model <- lm(log(Abundance+1) ~ Treatment * Rep_Year * Rehab_Year )
anova(full_model)
## Analysis of Variance Table
##
## Response: log(Abundance + 1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 1 1.46 1.460 1.3883 0.239054
## Rep_Year 2 27.23 13.614 12.9456 2.955e-06 ***
## Rehab_Year 3 16.63 5.545 5.2724 0.001327 **
## Treatment:Rep_Year 2 22.30 11.150 10.6024 2.870e-05 ***
## Treatment:Rehab_Year 3 95.60 31.867 30.3026 < 2.2e-16 ***
## Rep_Year:Rehab_Year 6 14.64 2.441 2.3209 0.031521 *
## Treatment:Rep_Year:Rehab_Year 6 14.12 2.353 2.2375 0.037897 *
## Residuals 767 806.60 1.052
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since Rehabilitation_Year was had a strong effect on abundance, lets see how that relationship looks in a graph
Ants<- read.csv("Ants.txt")
attach(Ants) # make 'Ants' the default data set
library(plotrix)
full_model <- lm(log(Abundance+1) ~ Treatment * Rep_Year * Rehab_Year )
groups <- split(log(Abundance+1) , paste(Rehab_Year, Treatment)) # split data into groups
n <- sapply(groups,length) # calculate a sample size within each group
means <- sapply(groups ,mean,na.rm=TRUE) # calculate a mean within each group
MSE <- deviance(full_model)/df.residual(full_model) # best estimate of the residual variance
se <- sqrt(MSE/n) # calculate the standard error of each mean
t <- qt(0.975,df=full_model$df.residual) # define 95% CI using t-distribution
width = t*se # set the width of each error bar
xloc <- seq(1:length(groups)) # X-axis locations
plotCI(xloc, means, uiw=width, xaxt="n",xlim=c(0.5,max(xloc)+0.5),ylab=list("mean log(Abundance+1)",cex=1.25),,cex=1.5,
xlab=list("Rehabilitation Year",cex=1.75),col=c("blue","purple"))
axis(side=1, at=c(1.5,3.5,5.5,7.5), labels=unique(Rehab_Year), cex.axis=1.5) # add X-axis labels
legend("topleft",legend=unique(Treatment),pch=1,col=c("blue","purple"),cex=1.5 )
for (i in seq(1,length(groups),2)){
segments(xloc[i],means[i],xloc[i+1],means[i+1],lty="dashed",lwd=1.2) }
This is telling us that Ant Abundance is strongly affected by Treatment and Rehabilitation Year. This means that the longer the period has been rehabilitated, the more Ants we can observe, This is also telling us that Rehabilitation Alone has no effect and the best "Rehabilitation: is to give the Ants time to recolonize.
If we want to sure that this statistical relationship is true, we need to consider Sites into the analysis. We havent compared if variance between sites has caused changes in our P-Values. Perhaps by analyzing our data using mixed models, we can state that “Sites” are a “random explanatory variable”.
Let’s get started (trust me is that the last bit of hard code, congrats on making it this far!)
Ants<- read.csv("Ants.txt")
attach(Ants) # make 'Ants' the default data set
library(lmerTest)
## Warning: package 'lmerTest' was built under R version 4.1.1
## Warning: package 'lme4' was built under R version 4.1.1
mixed_model <- lmer(log(Abundance+1) ~ Treatment * Rehab_Year * Rep_Year + (1|Site) , data=Ants )
anova(mixed_model)
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## Treatment 0.2015 0.2015 1 15.98 0.2150 0.64917
## Rehab_Year 2.4200 0.8067 3 15.98 0.8603 0.48172
## Rep_Year 26.9714 13.4857 2 750.99 14.3830 7.417e-07
## Treatment:Rehab_Year 13.9610 4.6537 3 15.98 4.9633 0.01271
## Treatment:Rep_Year 22.6802 11.3401 2 750.99 12.0946 6.764e-06
## Rehab_Year:Rep_Year 14.7683 2.4614 6 750.99 2.6252 0.01589
## Treatment:Rehab_Year:Rep_Year 14.1676 2.3613 6 750.99 2.5184 0.02026
##
## Treatment
## Rehab_Year
## Rep_Year ***
## Treatment:Rehab_Year *
## Treatment:Rep_Year ***
## Rehab_Year:Rep_Year *
## Treatment:Rehab_Year:Rep_Year *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now lets look at the residuals and their coefficients
summary(mixed_model)$coefficients
## Estimate Std. Error df
## (Intercept) 1.55197342 0.2896300 26.6036
## TreatmentMine -1.15127520 0.4095986 26.6036
## Rehab_YearY_1980 -0.23097124 0.4095986 26.6036
## Rehab_YearY_1996 -0.18529235 0.4095986 26.6036
## Rehab_YearY_1997 -0.52041120 0.4095986 26.6036
## Rep_YearR2016 -0.51581115 0.2383803 750.9804
## Rep_YearR2017 0.21663969 0.2383803 750.9804
## TreatmentMine:Rehab_YearY_1980 0.72763685 0.5792599 26.6036
## TreatmentMine:Rehab_YearY_1996 0.82506659 0.5792599 26.6036
## TreatmentMine:Rehab_YearY_1997 1.24534357 0.5792599 26.6036
## TreatmentMine:Rep_YearR2016 0.85668872 0.3371206 750.9804
## TreatmentMine:Rep_YearR2017 0.12571638 0.3371206 750.9804
## Rehab_YearY_1980:Rep_YearR2016 0.32293538 0.3371206 750.9804
## Rehab_YearY_1996:Rep_YearR2016 0.03271149 0.3371206 750.9804
## Rehab_YearY_1997:Rep_YearR2016 0.15914137 0.3371206 750.9804
## Rehab_YearY_1980:Rep_YearR2017 0.54980275 0.3371206 750.9804
## Rehab_YearY_1996:Rep_YearR2017 -0.30896232 0.3371206 750.9804
## Rehab_YearY_1997:Rep_YearR2017 -0.07270681 0.3371206 750.9804
## TreatmentMine:Rehab_YearY_1980:Rep_YearR2016 -0.84575391 0.4767606 750.9804
## TreatmentMine:Rehab_YearY_1996:Rep_YearR2016 0.08538906 0.4777077 750.9988
## TreatmentMine:Rehab_YearY_1997:Rep_YearR2016 0.60521730 0.4767606 750.9804
## TreatmentMine:Rehab_YearY_1980:Rep_YearR2017 -0.57509931 0.4767606 750.9804
## TreatmentMine:Rehab_YearY_1996:Rep_YearR2017 0.16244479 0.4767606 750.9804
## TreatmentMine:Rehab_YearY_1997:Rep_YearR2017 1.05669878 0.4767606 750.9804
## t value Pr(>|t|)
## (Intercept) 5.35846967 1.218355e-05
## TreatmentMine -2.81073986 9.158600e-03
## Rehab_YearY_1980 -0.56389650 5.775478e-01
## Rehab_YearY_1996 -0.45237541 6.546659e-01
## Rehab_YearY_1997 -1.27053941 2.148917e-01
## Rep_YearR2016 -2.16381629 3.079309e-02
## Rep_YearR2017 0.90879867 3.637480e-01
## TreatmentMine:Rehab_YearY_1980 1.25614910 2.199815e-01
## TreatmentMine:Rehab_YearY_1996 1.42434601 1.659716e-01
## TreatmentMine:Rehab_YearY_1997 2.14988726 4.082106e-02
## TreatmentMine:Rep_YearR2016 2.54119337 1.124755e-02
## TreatmentMine:Rep_YearR2017 0.37291213 7.093191e-01
## Rehab_YearY_1980:Rep_YearR2016 0.95792231 3.384102e-01
## Rehab_YearY_1996:Rep_YearR2016 0.09703199 9.227269e-01
## Rehab_YearY_1997:Rep_YearR2016 0.47206062 6.370207e-01
## Rehab_YearY_1980:Rep_YearR2017 1.63087836 1.033353e-01
## Rehab_YearY_1996:Rep_YearR2017 -0.91647406 3.597125e-01
## Rehab_YearY_1997:Rep_YearR2017 -0.21567002 8.293035e-01
## TreatmentMine:Rehab_YearY_1980:Rep_YearR2016 -1.77395942 7.647500e-02
## TreatmentMine:Rehab_YearY_1996:Rep_YearR2016 0.17874752 8.581842e-01
## TreatmentMine:Rehab_YearY_1997:Rep_YearR2016 1.26943655 2.046787e-01
## TreatmentMine:Rehab_YearY_1980:Rep_YearR2017 -1.20626441 2.280954e-01
## TreatmentMine:Rehab_YearY_1996:Rep_YearR2017 0.34072615 7.334050e-01
## TreatmentMine:Rehab_YearY_1997:Rep_YearR2017 2.21641394 2.696211e-02
summary(mixed_model)$sigma^2
## [1] 0.9376152
By declaring “Site” as a “random explanatory variable” , we have increased the statistical significance of our variables. Our residual variance has been reduced, and our new model considers that some of the “noise” in the data is due to the “random explanatory variables”. In this case, Site has been treated as an independent level of replication, so our new code has removed “Site” variation from residual variance.
Compared to previous ANOVA Tables, we can see that the interaction between Treatment:Rehabilitation_Year has become more significant (P: 6.764e-06), indicating that even with adjusting “Site” as a r“random explanatory variable”, the interaction between Treatment:Rehabilitation_Year is highly signficant
First of all :CONGRATUALTIONS! You were brave enough to follow me in this journey of Ants (but mostly Stats) Now that we’ve made this far…What can we conclude? Firstly that we need a good comprehension of stats. The Ants of Stradie need us to do the hard stats. We can condense the data and prove that the rehabilitation of mine sites hasnt helped our eight-legged friends.
We learnt that we need to be careful into considering different levels of replication. Including “Sites” as random explanatory variable helped us to show a stronger relationship between “Treatment:Rehabilitation_Year” and “LogAntAbundance+1” . Perhaps the rehabilitate has not been implemented equally across all sites, or maybe different ants species respond differently to ecological disturbances (we did see that Aphaenogaster longiceps and Nylanderia obscura were the most common ants ). This indicates that ant ecology and/or behaviour influences diversity and abundance on sites (Majer, 1985)
Where off to next? Maybe we could look at the response from different species across all the sites (perhaps use your most eager student…or the one you hate the most). Maybe if we have data of the Response of each species to rehabilitation efforts (and treatment, and treamtent:rehabilitation_year, etc) we can conclude how effective ant restoration has been in Stradie.
Thank you for time, now go have break, take a walk, have a cuppa and help spread the better use of stats (and Rmarkdown!)
()
MAJER, J.D. (1985), Recolonization by ants of rehabilitated mineral sand mines on North Stradbroke Island, Queensland, with particular reference to seed removal. Australian Journal of Ecology, 10: 31-48. https://doi.org/10.1111/j.1442-9993.1985.tb00861.x