Summary of Data exploratory exercise for bottom trawling research

Foreword

The main purpose of this document is for me to manage the work: mentally map out the many datasets available, where I am with them, what are the pros and cons with each of them, how they are connected with each other.

This document will be regularly updated and published online for you to keep track of the progress of the work, if you feel necessary. Again, this is mainly for me to organize my brain. However, I will use this as a template to discuss with you.

The analysis is done in both R and ArcGIS Pro. Many of the work is automated using codes, and I am keeping codes chunks in this document for future reference. However, the ArcGIS workflow is point and click, and only map products are displayed here.

1. Fishing effort data - Global Fishing Watch

Background of GFW.Global Fishing Watch provides data on apparent fishing effort based on transmissions broadcast using the automatic identification system (AIS). Raw AIS data was run through GFW algorithms to “clean” the data, before running through convolutional neural networks (CNN — a machine learning model) to classify fishing vessels and predict when they are fishing.

I requested data download data from GFW and received their data in csv format .

There are 2 data formats:

Fishing effort by fleet (flag state and gear type) at 100th degree resolution
Fishing effort by MMSI at 10th degree resolution

I use the first format as it can be broken down into gear types, and we are keen on trawlers.

For fleet, there are 3 versions formats 1.0, 1.5, 2.0, and I use v2.0 as it is the latest, most updated version.

1.1. Summary of Fishing effort data

When downloaded, data was organized by year. For each year, there are separate csv file for each day. In total, there are about 365x9=3285 csv files and each of them contain data like the following

Below is a summary of gear types in the sample dataset for 1st Jan 2020. There are 16 different gear types

And a summary of flags. Note that there are quite a number of NA’s

The dimension of each file is between 10,000-500,000 rows of data. Below is the dimension for day 1 Jan, 2020 (over 404k rows)

## [1] 404738      8

1.2. Data wrangling

I created a loop to to through each folder (that corresponds to data for each year). This loop would:

In each folder, R reads in all 365 (or 366) csv files and merge them together to create a gigantic file (millions of rows).
Next, this file is too large to handle, so I filter just fishing effort for just trawl.
At this point, we have fishing efforts (in hours) by trawl for each grid cell and each day. For each grid cell (identified by the latitude and longitude of its centroid, size is 0.01 degree), I aggregate the effort to get the accumulative hours in a whole year for each country (based on “flag” variable).
I then added a column that contains data for the year.

This loop ran for about 1 hour, so don’t repeat that unless really necessary. Another loop that did not contain “flag” in grouping method generated dataset grouped by locations only.

library(plyr)
library(dplyr)
for (i in 2012:2020){
  setwd(paste0("W:/Project Seahorse/Active/02_TEAM_DATA/LeNghiem/bottom trawling maps/Global Fishing Watch/Fishing effort/fleet-daily-csvs-100-v2-",i))
  data <-ldply(list.files(),read.csv, header=TRUE)#226,800,974rows
   trawl<-data %>% filter(geartype=="trawlers") %>% 
    group_by(cell_ll_lat,cell_ll_lon,flag) %>% 
    dplyr::summarise(fishing_hours_year=sum(fishing_hours)) %>% 
    filter(fishing_hours_year>0) %>% 
    mutate(year=i)
  write.csv(trawl,paste("W:/Project Seahorse/Active/02_TEAM_DATA/LeNghiem/bottom trawling maps/Global Fishing Watch/Fishing effort/trawl data/trawl_with_flag",i,".csv"))
}

After that, I merged all the data (9 years together). The final file has about 36.5 million rows and about 1.56gb in size.

The following took another 15 minutes to run, so don’t run it unless have to.

setwd("W:/Project Seahorse/Active/02_TEAM_DATA/LeNghiem/bottom trawling maps/Global Fishing Watch/Fishing effort/trawl data/")
trawl12_20 <-ldply(list.files(),read.csv, header=TRUE)
write.csv(trawl12_20,"W:/Project Seahorse/Active/02_TEAM_DATA/LeNghiem/bottom trawling maps/Global Fishing Watch/Fishing effort/trawl data/trawl12_20/trawl12_20.csv")#36,492,250 rows
}

This is for data with flag. The following took another 15 minutes to run, so don’t run it unless have to. This file is too large that can’t even be opened in Excel.

setwd("W:/Project Seahorse/Active/02_TEAM_DATA/LeNghiem/bottom trawling maps/Global Fishing Watch/Fishing effort/trawl with flag/")
trawl12_20 <-ldply(list.files(),read.csv, header=TRUE)
write.csv(trawl12_20,"W:/Project Seahorse/Active/02_TEAM_DATA/LeNghiem/bottom trawling maps/Global Fishing Watch/Fishing effort/trawl with flag/trawl12_20/trawl12_20.csv")#36,492,250 rows
}

1.3. Snapshot of the type of data we have

Below are the sample trawl intensity maps for 2012 and 2020.

2. Catch data

2.1. FAO catch data

FAO provides their data via the FishstatJ software.There are 5 databases associated with this software:

Food balance sheets of fish and fishery products
Global Fish Processed Products Statistics
Global Fishery and Aquaculture Production Statistics
Global Fish Trade Statistics
Border Rejections

In this section, I will explore database number 3 that reports capture production statistics in tonnes - live weight (TLW), by country, species item and FAO Major Fishing Area (see map below). There are 3415 species item by common name in this dataset, such as: Crimson snapper, Silver hake, Albacore… Catches are reported according to the 28 FAO fishing areas Data is reported per year between 1950-2021

A snapshot of the data is below:

And here is a map of FAO fishing area

Potential problems I identify:

FAO appears to have only landings data, not discard data
The data is not spatially explicit, but grouped into FAO areas.

2.2. Sea around us catch data

I am in the middle of reading Zeller et al. 2016 on the description of their database. It seems like they have both landing and discard data. They claim to provide spatial data with resolution of 0.5x0.5 degree, but I have yet been able to find a way to download their data.

3. Biodiversity/distribution data

3.1. The Global Biodiversity Information Facility

Gina introduced this database to me and I spent a day exploring it.