The network analysis presented below attempts to show the importance (connectivity) of the relationships between nodes (communities at sea) over time. The script displayed here uses the igraph suite of packages for the analysis, and the tidygraph and ggraph packages, to generate a workflow similar to tidyverse and ``ggplot```. The code starts by preparing a dummy database with random values, however the functions and structure are ready to receive real data.
The code starts loading the necessary packages, as an additional note it is important to mention that ggraph requires ggplot.
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library (kableExtra) #this package is for deploying nice tables
To test the performance of this script a random database was generated with the variables indicated by Becca. The ports included in this test are COCA sites and all gear types. As for the years, a sampling period comprising the years 2000-2019 was simulated. For the interactions, the combinatorial of all ports were considered, however at the end of the trial, a random sampling was carried out to simulate “the absence of relations”. Finally, for the variables “num_trips” and “fisherdays” random data with a normalized integer distribution were simulated.
dummy.portori <- c('seabrook','point.pleasant','point.judith','chicoteague','cape.may')
dummy.portlnd <- dummy.portori
dummy.gear <- c('65minus','65plus','dredge','gillnet','midwater','pots.traps')
dummy.year <- c(2000:2019)
set.seed(5)
dummy.data <- expand_grid(dummy.year,dummy.gear,dummy.portori,dummy.portlnd) %>%
rename(year=dummy.year, gear=dummy.gear, port_origin=dummy.portori, portlnd1=dummy.portlnd) %>%
add_column(num_trips = sample(1:50, nrow(.), replace = T),
total_fisherdays = sample(1:40, nrow(.), replace = T)) %>%
# quitting random rows to simulate lack of relations
.[sample(nrow(.),250,replace = F),]
head(dummy.data) %>%
kbl() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| year | gear | port_origin | portlnd1 | num_trips | total_fisherdays |
|---|---|---|---|---|---|
| 2014 | pots.traps | point.judith | cape.may | 6 | 18 |
| 2018 | 65plus | cape.may | seabrook | 44 | 34 |
| 2001 | midwater | seabrook | seabrook | 41 | 18 |
| 2001 | 65minus | seabrook | seabrook | 41 | 22 |
| 2006 | dredge | point.judith | chicoteague | 6 | 28 |
| 2016 | gillnet | cape.may | point.judith | 6 | 26 |
Once the dataset was simulated, a function was generated to calculate the Average Percentage Change over different years for each variable at each port. This new data matrix allows the analysis of the time series by calculating the proportional change of each variable during the analysis period.
apc.data <- dummy.data %>% group_by(gear, port_origin, portlnd1) %>%
summarise(apc_num_trips = mean(c(NA, diff(num_trips)),na.rm = TRUE),
apc_total_fisherdays = mean(c(NA, diff(total_fisherdays)),na.rm = TRUE)) %>%
ungroup()
head(apc.data) %>%
kbl() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| gear | port_origin | portlnd1 | apc_num_trips | apc_total_fisherdays |
|---|---|---|---|---|
| 65minus | cape.may | cape.may | -6.50 | 0.50 |
| 65minus | cape.may | chicoteague | -0.25 | 3.75 |
| 65minus | cape.may | point.judith | NaN | NaN |
| 65minus | cape.may | point.pleasant | 1.00 | 25.00 |
| 65minus | cape.may | seabrook | NaN | NaN |
| 65minus | chicoteague | cape.may | NaN | NaN |
Igraph requires two dataframes for its operation.The first one is the nodes or vertices that will be the actors, in this case the combination of ports and gear types. The second dataset contains the edges or links that will be the links indicated by the port_origin and portlnd variables respectively.
It is important to note that since the network will be based on the nodes defined by the communities at sea, the matrix concatenates the ‘port’ + ‘gear’ to associate the edge with the ‘name’ of the nodes. After this process the ‘Port’ and ‘gear’ are now considered attributes of the nodes and not of the edges, so they are omitted from the links dataframe (I know it’s a bit cryptic but we can discuss it together).
#Vectors to obtain every port and gear type
v.ports <- unique(c(apc.data$port_origin,apc.data$portlnd1))
v.gear <- unique(apc.data$gear)
# Node data base with name = community at sea, port and gear type
nodes <- expand_grid(port=v.ports,gear=v.gear) %>% mutate(name = paste(port,gear,sep = "-")) %>% select(name,port,gear)
head(nodes) %>%
kbl() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| name | port | gear |
|---|---|---|
| cape.may-65minus | cape.may | 65minus |
| cape.may-65plus | cape.may | 65plus |
| cape.may-dredge | cape.may | dredge |
| cape.may-gillnet | cape.may | gillnet |
| cape.may-midwater | cape.may | midwater |
| cape.may-pots.traps | cape.may | pots.traps |
# edge/links dataframe
edges <- apc.data %>% select(port_origin:apc_total_fisherdays,gear) %>%
mutate(port_origin = paste(port_origin, gear ,sep = "-"),
portlnd1 = paste(portlnd1, gear, sep = "-")) %>%
select(-gear)
head(edges) %>%
kbl() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| port_origin | portlnd1 | apc_num_trips | apc_total_fisherdays |
|---|---|---|---|
| cape.may-65minus | cape.may-65minus | -6.50 | 0.50 |
| cape.may-65minus | chicoteague-65minus | -0.25 | 3.75 |
| cape.may-65minus | point.judith-65minus | NaN | NaN |
| cape.may-65minus | point.pleasant-65minus | 1.00 | 25.00 |
| cape.may-65minus | seabrook-65minus | NaN | NaN |
| chicoteague-65minus | cape.may-65minus | NaN | NaN |
The data prepared in the previous section are used for the tbl_graph function. In this function the nodes and edges are indicated and if the relation is directional. In our particular case each vessel has a port of origin and destination, so the relationship is directional. Additionally, the degree of centrality was calculated to know the relative importance of each port given its connectivity.
As a second step the `ggraph function generates a graphical representation of the network. In this, each gear type creates a particular network in which the nodes are the different ports. The relationships are given by the source and destination ports. In addition, the color of the line is based on the average percentage change of the ‘fisherdays’, while the ‘num_trips’ indicates the thickness of the line. (This is fully modifiable).
# Building the network-matrix
net <- tbl_graph(nodes = nodes, edges = edges, directed = TRUE) %>%
mutate(degree = centrality_degree(normalized = T)) #Degree of centrality
# Plotting the network
ggraph(net, layout = 'star') +
geom_edge_link(aes(width = apc_num_trips, color = apc_total_fisherdays), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
scale_edge_color_viridis()+
geom_node_point(aes(size= degree)) +
geom_node_text(aes(label = port), repel = TRUE) +
labs(edge_width = "num_trips") +
facet_nodes(~gear)+
coord_fixed()
Besides the graphical representation, this code concludes with the preparation of a score table for each node. The calculated indices are individual and cluster centrality; authority (sum of the relationship values in each cluster) and closeness to the central node.
V(net)$eig <- evcent(net)$vector # Eigenvector centrality
V(net)$hubs <- hub.score(net)$vector # "Hub" centrality
V(net)$authorities <- authority.score(net)$vector # "Authority" centrality
V(net)$closeness <- closeness(net) # Closeness centrality
V(net)$betweenness <- betweenness(net) # Vertex betweenness centrality
network.stat <- data.frame(centrality = V(net)$degree,
closeness = V(net)$closeness,
betweenness = V(net)$betweenness,
eigenvector = V(net)$eig)
head(network.stat) %>%
kbl() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| centrality | closeness | betweenness | eigenvector | |
|---|---|---|---|---|
| cape.may-65minus | 5 | 0.0013263 | 2.8333333 | 0.0000000 |
| cape.may-65plus | 3 | 0.0013245 | 2.5000000 | 0.0000000 |
| cape.may-dredge | 3 | 0.0013245 | 0.3333333 | 0.0000000 |
| cape.may-gillnet | 4 | 0.0013245 | 0.0000000 | 0.8027136 |
| cape.may-midwater | 5 | 0.0013263 | 2.5000000 | 0.0000000 |
| cape.may-pots.traps | 3 | 0.0013228 | 1.5000000 | 0.0000000 |