AR1 for catch data

Joyce Ong 29 October 2019

Introduction

This is an R Markdown document containing the results for the AR1 analyses of FAO catch data within 59 LMEs for the period 1955-2014 (60 year timeseries), consisting of 3789 species, with at least 55 years of data for each species timeseries.

library(ggplot2)
library(dplyr, warn.conflicts = F, quietly = T)
## Warning: package 'dplyr' was built under R version 3.5.3
library(reshape2)

Dataset used

FAO catch data was taken from Sea Around Us Project, which organized the catch data into 65 LMEs globally. This was taken from Chris Free’s Github. This dataset contained only marine finfish and invertebrates. Data were filtered according to the following conditions:

  • Time period 1955-2014 (60 years)

  • Each species time series contained <1/4 NAs

  • Unidentified and miscellaneous groups were removed

Latitudes for each LME were taken from centroids.

Function ar() was used with order=1 to generate coefficients for each species timeseries

lme.ar.lat2<-read.csv("lme_species_lat.csv")
lme.ar.lat2<-lme.ar.lat2[,-1]
lme.ar.lat2$lat_fct<-as.factor(lme.ar.lat2$lat_dd2)
str(lme.ar.lat2)
## 'data.frame':    3789 obs. of  7 variables:
##  $ species: Factor w/ 927 levels "Ablennes hians",..: 3 4 9 41 46 55 69 74 76 84 ...
##  $ arcoef : num  0.627 0.7 0.787 0.777 0.784 ...
##  $ lme2   : Factor w/ 59 levels "Agulhas Current",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ lat_dd : num  -22.1 -22.1 -22.1 -22.1 -22.1 ...
##  $ lat_fct: Factor w/ 59 levels "-46.33","-41.03",..: 9 9 9 9 9 9 9 9 9 9 ...
##  $ lat_dd2: num  -22.1 -22.1 -22.1 -22.1 -22.1 ...
##  $ lat2   : num  486 486 486 486 486 ...
lme.mean.ar<-lme.ar.lat2 %>% group_by(lat_dd2) %>%
  summarize(mean=mean(arcoef), med=median(arcoef), sd=sd(arcoef)) %>% as.data.frame()
str(lme.mean.ar)#mean, median and sd of ar1 coeff within each LME grouping
## 'data.frame':    59 obs. of  4 variables:
##  $ lat_dd2: num  -46.3 -41 -40.8 -35.2 -32.5 ...
##  $ mean   : num  0.734 0.825 0.869 0.819 0.638 ...
##  $ med    : num  0.786 0.885 0.881 0.884 0.67 ...
##  $ sd     : num  0.1569 0.1519 0.0715 0.1627 0.1881 ...
lme.mean.ar$lat_fct<-as.factor(lme.mean.ar$lat_dd2)

Data exploration

Number of species within each LME.

de1<-lme.ar.lat2 %>% group_by(lme2) %>% summarise(nsp=n_distinct(species))
head(de1)
## # A tibble: 6 x 2
##   lme2               nsp
##   <fct>            <int>
## 1 Agulhas Current    115
## 2 Aleutian Islands    12
## 3 Arabian Sea        190
## 4 Baltic Sea          25
## 5 Barents Sea         24
## 6 Bay of Bengal      157

Graph of number of species within each LME

lmekeysub<-read.csv("lmekey.csv")
lme.nsp<-de1 %>% left_join(lmekeysub, by=c("lme2" = "name"))
## Warning: Column `lme2`/`name` joining factors with different levels,
## coercing to character vector
nsp1<-ggplot(lme.nsp, aes(lat_dd, nsp))
nsp1 + geom_point() + theme_bw()

Boxplot of spread of ar1 coefficient across latitudes from -50 (left) to 80 (right).

Note, X-axis is messy because of all the factor labels for latitude.

p3<-ggplot(lme.ar.lat2, aes(lat_fct, arcoef))
p3 + geom_boxplot() + theme_bw()

Another visualization of spread of ar1 coefficients (arcoef), this time with latitudes on the y axis

p4<-ggplot(lme.ar.lat2, aes(arcoef, lat_dd2))
p4 + geom_point() + theme_bw()

Scatterplots of mean, median and sd of ar1 coeff within each LME grouping

p5a<-ggplot(lme.mean.ar, aes(lat_dd2, mean))
p5a + geom_point() + theme_bw()

p5b<-ggplot(lme.mean.ar, aes(lat_dd2, med))
p5b + geom_point() + theme_bw()

p5c<-ggplot(lme.mean.ar, aes(lat_dd2, sd))
p5c + geom_point() + theme_bw()
## Warning: Removed 1 rows containing missing values (geom_point).

Note that for sd plot, Canadian High artic only had 1 species time series so there is no value for sd. Also, pattern for sd is probably due to the number of species, especially since at the higher latitudes, there are much lower number of species.

Linear regression models

Using all 3789 data points

m1<-lm(arcoef ~ lat_dd, data=lme.ar.lat2)
summary(m1)#low Adj Rsq, p=0.028, coef=-0.0002055
## 
## Call:
## lm(formula = arcoef ~ lat_dd, data = lme.ar.lat2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.60346 -0.07375  0.05063  0.12489  0.20090 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.827e-01  2.891e-03 270.713   <2e-16 ***
## lat_dd      -2.055e-04  9.353e-05  -2.197   0.0281 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1621 on 3787 degrees of freedom
## Multiple R-squared:  0.001273,   Adjusted R-squared:  0.001009 
## F-statistic: 4.826 on 1 and 3787 DF,  p-value: 0.0281
par(mfrow=c(2,2), oma=c(0,0,1,0))
plot(m1)

Note that I tried using quadratic equations and even a gam function, but the linear model worked best.

Categorizing LMEs into broader factors of tropical versus higher latitudes

Note, I first tried categorizing into 4 categories of tropical, subtropical, temperate and polar, as well as 7 categories of N & S tropics, N & S subtropics, N and S temperate and N polar (no data exists for S polar), however there was not much difference between these regions, see plots below. Categories were taken from this website.

lme.ar.lat2$lat_abs<-abs(lme.ar.lat2$lat_dd2)
summary(lme.ar.lat2$lat_abs)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.42   11.94   24.67   26.21   37.11   80.54
lme.ar.lat2$climate<-ifelse(lme.ar.lat2$lat_abs<23.5, "tropics",
                            ifelse(lme.ar.lat2$lat_abs<40, "subtropics",
                                   ifelse(lme.ar.lat2$lat_abs<60, "temperate", "polar")))
lme.ar.lat2$clim_fct<-as.factor(lme.ar.lat2$climate)
lme.ar.lat2$clim_fct<-factor(lme.ar.lat2$clim_fct, levels=c("tropics", "subtropics", "temperate", "polar"))
de2<-lme.ar.lat2 %>% group_by(clim_fct) %>% summarise(nsp=n_distinct(species))%>% as.data.frame()
de2#number of species in each climate zone
##     clim_fct nsp
## 1    tropics 604
## 2 subtropics 541
## 3  temperate 353
## 4      polar  54
p7<-ggplot(lme.ar.lat2, aes(clim_fct, arcoef))
p7 + geom_boxplot() + theme_bw()#boxplot of 4 categories

Boxplots for 7 climate categories, separated by hemispheres.

lme.ar.lat2$climNS<-ifelse(lme.ar.lat2$lat_dd2<23.5 & lme.ar.lat2$lat_dd2>0, "N.trop",
                            ifelse(lme.ar.lat2$lat_dd2>(-23.5) & lme.ar.lat2$lat_dd2<0, "S.trop",
                                   ifelse(lme.ar.lat2$lat_dd2>23.5 & lme.ar.lat2$lat_dd2<40, "N.subtrop",
                                          ifelse(lme.ar.lat2$lat_dd2<(-23.5) & lme.ar.lat2$lat_dd2>(-40), "S.subtrop",
                                                 ifelse(lme.ar.lat2$lat_dd2>40 & lme.ar.lat2$lat_dd2<60, "N.temp",
                                                        ifelse(lme.ar.lat2$lat_dd2<(-40) & lme.ar.lat2$lat_dd2>(-60), "S.temp", "N.polar"))))))
lme.ar.lat2$climNS_fct<-as.factor(lme.ar.lat2$climNS)
lme.ar.lat2$climNS_fct<-factor(lme.ar.lat2$climNS_fct, 
                               levels=c("S.temp", "S.subtrop", "S.trop",
                                        "N.trop", "N.subtrop", "N.temp", "N.polar"))
p9<-ggplot(lme.ar.lat2, aes(climNS_fct, arcoef))
p9 + geom_boxplot() + theme_bw()#boxplot of 7 categories by N and S hemispheres

t-tests of broadest categories of tropical versus not tropical

lme.ar.lat2$clim2<-ifelse(lme.ar.lat2$clim_fct=="tropics", "trop", "non-trop")
t.test(arcoef ~ clim2, data=lme.ar.lat2)#unequal variances
## 
##  Welch Two Sample t-test
## 
## data:  arcoef by clim2
## t = -2.5775, df = 3742.8, p-value = 0.009991
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.023936362 -0.003253744
## sample estimates:
## mean in group non-trop     mean in group trop 
##              0.7735909              0.7871859

This dataset consists of 3789 species timeseries globally, with 1803 sp timeseries in the tropics (-23.5 to 23.5 latitude), and 1986 species timeseries not in the tropics. Of these, there were 604 unique species in tropics, and 645 unique species at higher latitudes. Biologically, the difference in ar1 coeff is not that different, with tropics mean = 0.79 and non-tropics mean = 0.77.

p10<-ggplot(lme.ar.lat2, aes(clim2, arcoef))
p10 + geom_violin(draw_quantiles=0.5, scale="area") + theme_bw()