library(igraph)
library(ggplot2)
library(ggnetwork)
<- random.graph.game(100, 0.05)
dummy_graph
###ADD SOME FEATURES
V(dummy_graph)$node_feat1 <- degree(dummy_graph, normalized = T) + runif(50)
V(dummy_graph)$node_feat2 <- as.character(cut(betweenness(dummy_graph, normalized = T), 3))
E(dummy_graph)$edge_feat1 <- degree(line.graph(dummy_graph), normalized = T) + runif(ecount(dummy_graph))
E(dummy_graph)$edge_feat2 <- as.character(cut(betweenness(line.graph(dummy_graph), normalized = T), 2))
<- ggplot(ggnetwork(dummy_graph), aes(x = x, y = y, xend = xend, yend = yend)) + geom_edges(aes(color = edge_feat2)) + geom_nodes(aes(size = node_feat1, color = node_feat2)) + theme_void() + guides(size = 'none', color = 'none') + scale_color_manual(values = viridis::viridis(15, direction = -1, option = "E")[c(3, 6, 9, 12, 15)])+ labs(caption = "Dummy Graph") + theme(plot.caption = element_text(hjust = 0.5, size = 12))
plot
plot
Intro to spinner
“The Moirai come to every man: they bring him what is his own.” - Pindar, Olympian Odes
“In the more general subject of Geometric Deep Learning, certain existing neural network architectures can be interpreted as GNNs operating on suitably defined graphs.” Wikipedia
“The world is not made of atoms, it is made of stories.” - Muriel Rukeyser
“Message passing layers are permutation-equivariant layers mapping a graph into an updated representation of the same graph.” - Wikipedia
“The shortest distance between two points is often unbearable.” - Charles Bukowski
The thread of life (and graphs)
Spinner is an implementation of Graph Nets based on torch. Graph Nets1 are a family of neural network architectures designed for processing graphs and other structured data (for a gentle introduction, take a look here). They consist of a set of message-passing operations, which propagate information between nodes and edges in the graph, and a set of update functions, which compute new node and edge features based on the received messages.
The process flow of spinner:
Data Preparation: spinner starts with two tasks: graph sampling and feature extraction. In case of huge graph structures, it is possible to sample a subgraph, by setting the
sampling
value and tuning the graph densitythreshold
who shifts the sampling process from edges to nodes. Then, spinner begins with extracting the features from the igraph format. If the graph does not have any features, the algorithm calculates new features using null values, adjacency embedding, or laplacian embedding. Null values represent the absence of any relevant information (zeroed initialization); adjacency embedding captures the relationship between nodes based on the decomposition of graph’s adjacency matrix, while Laplacian embedding decomposes the Laplacian matrix to capture both local and global properties2. In case of features with missing values, random imputation is performed using the empirical distribution.Graph Net Layers: spinner is then executed with the selected number of layers. Each layer consists of message passing and graph-independent forward network. In message passing, each node in the graph receives messages from its neighboring nodes, and the messages are used to update the node’s feature representation (the
update_order
allows for different options). The combination of updates is based on a linear transformation. The graph-independent forward network takes the updated feature representations and applies a DNN transformation. This process is repeated for the selected number of layers, allowing the algorithm to refine the features and build more complex representations of the graph.Output Transformation: once the Graph Net Layers are completed, an optional skip shortcut can be applied. Skip shortcut enables the algorithm to skip certain layers and connect the inputs directly to the output layer, improving the algorithm’s efficiency3. Finally, the output is transformed by a linear transformation for regression tasks , mapping the output to a continuous range, or a softmax/sigmoid activation for label features, mapping the output to a probability distribution. This step produces the final output of the Graph Net algorithm, representing the predicted values or probabilities for the given graph features.
Some examples with a synthetic graph
Let’s start with a small dummy graph with 100 nodes and some degree of randomness and structured information, represented by two numeric features set on normalized degree centrality and two character features from the cut of betweeness statistic, equally distributed among nodes and edges, no explicit feature value for the context/ the global graph.
The minimal set of required parameters are a graph in igraph format, prediction target (node
or edge
), labels for node, edge and context features, if available (when there are no specific features, new features will be calculated using the embedding method
, with a default embedding size of 5 that can be modified for node, edge and context using relative arguments). In example1, we illustrate a repeated cross-validation with two folds and three repetitions, using all the node and edge features (and initializing the context to a vector of 5 zeros with default options), and a prediction target sets on edge.
library(spinner)
<- spinner(dummy_graph, target = "edge", node_labels = c("node_feat1", "node_feat2"), edge_labels = c("edge_feat1", "edge_feat2"), holdout = 0.6, reps = 3, folds = 2, n_layers = 1) example1
epoch: 10 Train loss: 0.7137018 Val loss: 0.6144015
epoch: 20 Train loss: 0.5944655 Val loss: 0.6190196
epoch: 30 Train loss: 0.6620579 Val loss: 0.5998684
epoch: 40 Train loss: 0.6194906 Val loss: 0.6096552
epoch: 50 Train loss: 0.6580924 Val loss: 0.6060377
early stop at epoch: 55 Train loss: 0.6815769 Val loss: 0.6249834
epoch: 10 Train loss: 0.4619544 Val loss: 0.6726884
epoch: 20 Train loss: 0.4726847 Val loss: 0.6579408
epoch: 30 Train loss: 0.4742717 Val loss: 0.6465809
early stop at epoch: 33 Train loss: 0.466705 Val loss: 0.6466084
epoch: 10 Train loss: 0.57569 Val loss: 0.6114293
epoch: 20 Train loss: 0.5789454 Val loss: 0.6105499
epoch: 30 Train loss: 0.5311384 Val loss: 0.5529644
early stop at epoch: 31 Train loss: 0.5587688 Val loss: 0.6154771
epoch: 10 Train loss: 0.6228946 Val loss: 0.6434678
epoch: 20 Train loss: 0.5117735 Val loss: 0.5998991
epoch: 30 Train loss: 0.5744619 Val loss: 0.6196617
early stop at epoch: 36 Train loss: 0.5466098 Val loss: 0.627533
epoch: 10 Train loss: 0.3084492 Val loss: 0.333145
epoch: 20 Train loss: 0.2644525 Val loss: 0.3095223
epoch: 30 Train loss: 0.3161576 Val loss: 0.3185408
early stop at epoch: 31 Train loss: 0.3645605 Val loss: 0.3318309
epoch: 10 Train loss: 0.5694327 Val loss: 0.5735918
epoch: 20 Train loss: 0.6654618 Val loss: 0.6981404
epoch: 30 Train loss: 0.5631 Val loss: 0.7350405
early stop at epoch: 30 Train loss: 0.5631 Val loss: 0.7350405
epoch: 10 Train loss: 0.7765082 Val loss: 0.7766961
epoch: 20 Train loss: 0.7349857 Val loss: 0.7830988
epoch: 30 Train loss: 0.7914677 Val loss: 0.754347
early stop at epoch: 34 Train loss: 0.8173763 Val loss: 0.7861751
time: 119.1 sec elapsed
The outuput includes graph (original or sampled), model description and summary, prediction function for new graph data, cross-validation and summary error, history plot for the loss function (for final training and testing) and time log.
names(example1)
[1] "graph" "model_description" "model_summary"
[4] "pred_fun" "cv_errors" "summary_errors"
[7] "history" "time_log"
Model description and model summary give some level of general and detail information about the model structure and the parameter numbers. The default option for n_layers is equal to 3 Graph Net layers with updating order edge-node-context for the message passing (“enc”).
$model_description example1
[1] "model with 1 GraphNet layers, 1 classification tasks and 1 regression tasks (1029 parameters)"
$model_summary example1
$GraphNetLayer1
An `nn_module` containing 1,027 parameters.
-- Modules ---------------------------------------------------------------------
* context_to_edge: <nn_pooling_from_context_to_edges_layer> #18 parameters
* context_to_node: <nn_pooling_from_context_to_nodes_layer> #24 parameters
* edge_to_context: <nn_pooling_from_edges_to_context_layer> #20 parameters
* edge_to_node: <nn_pooling_from_edges_to_nodes_layer> #16 parameters
* node_to_context: <nn_pooling_from_nodes_to_context_layer> #25 parameters
* node_to_edge: <nn_pooling_from_nodes_to_edges_layer> #15 parameters
* node_fusion: <nn_linear> #3 parameters
* edge_fusion: <nn_linear> #3 parameters
* context_fusion: <nn_linear> #3 parameters
* independent_layer: <nn_graph_independent_forward_layer> #900 parameters
$classif1
An `nn_module` containing 0 parameters.
$regr1
An `nn_module` containing 2 parameters.
-- Parameters ------------------------------------------------------------------
* weight: Float [1:1, 1:1]
* bias: Float [1:1]
The error and loss function are based on an entropy-weighted normalized loss, where the original loss functions are mse for scalar features and binary cross-entropy for label features. The normalization is based on 1 - 1/exp(loss) for each predicted feature. The entropy weights are calculated using the Laplace method4.
$cv_errors example1
reps folds train validation
1 1 1 0.6815769 0.6249834
2 1 2 0.4667050 0.6466084
3 2 1 0.5587688 0.6154771
4 2 2 0.5466098 0.6275330
5 3 1 0.3645605 0.3318309
6 3 2 0.5631000 0.7350405
$summary_errors example1
train validation test
0.5302202 0.5969122 0.7861751
$history example1
Now, let’s try to predict on new graph data: to do so, we add three new nodes and three news edges to our dummy graph, feed the new graph into pred_fun
and get the predicted values for edge_feat1
and edge_feat2
. Easy.
<- add.vertices(dummy_graph, 3)
new_dummy_graph <- add.edges(new_dummy_graph, c(51, 10, 52, 30, 53, 40))
new_dummy_graph
$pred_fun(new_dummy_graph) example1
$edge_feat1
from to edge_feat1
[1,] 10 51 -0.7320095
[2,] 30 52 -0.7320095
[3,] 40 53 -1.1389960
$edge_feat2
from to edge_feat2_..2.67e.05.0.0134. edge_feat2_.0.0134.0.0268.
[1,] 10 51 0.7406420 0.2593580
[2,] 30 52 0.7003825 0.2996176
[3,] 40 53 0.3513239 0.6486760
Now, let’s try another predictive model, this time on a different architecture: we use a number of Graph Net layers equal to the graph diameter minus one (and a skip shortcut), while the forward net is based on three layers with more specific activation functions and drop outs. We also increase the L2-regularization with weight_decay
and opt for a different update order (“nce”: first the messages pass on nodes, then on context and in the end on edges).
<- spinner(dummy_graph, target = "edge", node_labels = c("node_feat1", "node_feat2"), edge_labels = c("edge_feat1", "edge_feat2"), reps = 3, folds = 2, n_layers = diameter(dummy_graph), dnn_form = c(32, 64, 32), dnn_drop = c(0.3, 0.5, 0.3), skip_shortcut = T, weight_decay = 0.05, update_order = "nce", holdout = 0.3) example2
epoch: 10 Train loss: 0.4771449 Val loss: 0.5281988
epoch: 20 Train loss: 0.5203946 Val loss: 0.5565221
epoch: 30 Train loss: 0.5175064 Val loss: 0.4081402
early stop at epoch: 32 Train loss: 0.4814069 Val loss: 0.5704495
epoch: 10 Train loss: 0.3075961 Val loss: 0.3461457
epoch: 20 Train loss: 0.3171342 Val loss: 0.3589231
epoch: 30 Train loss: 0.3137291 Val loss: 0.346644
early stop at epoch: 33 Train loss: 0.3260146 Val loss: 0.3676803
epoch: 10 Train loss: 0.2746742 Val loss: 0.272223
epoch: 20 Train loss: 0.2873681 Val loss: 0.2948366
epoch: 30 Train loss: 0.297682 Val loss: 0.2497663
epoch: 40 Train loss: 0.261989 Val loss: 0.2963362
early stop at epoch: 40 Train loss: 0.261989 Val loss: 0.2963362
epoch: 10 Train loss: 0.6362951 Val loss: 0.653718
epoch: 20 Train loss: 0.6636615 Val loss: 0.6275757
epoch: 30 Train loss: 0.6286563 Val loss: 0.6419924
epoch: 40 Train loss: 0.6809555 Val loss: 0.6062608
early stop at epoch: 42 Train loss: 0.6995168 Val loss: 0.6282438
epoch: 10 Train loss: 0.5628222 Val loss: 0.5956068
epoch: 20 Train loss: 0.5892258 Val loss: 0.5917251
epoch: 30 Train loss: 0.5953156 Val loss: 0.6059892
early stop at epoch: 39 Train loss: 0.5678455 Val loss: 0.596338
epoch: 10 Train loss: 0.5379111 Val loss: 0.558876
epoch: 20 Train loss: 0.5317364 Val loss: 0.5530674
epoch: 30 Train loss: 0.588866 Val loss: 0.5431762
early stop at epoch: 31 Train loss: 0.5304825 Val loss: 0.5873731
epoch: 10 Train loss: 0.1922428 Val loss: 0.19973
epoch: 20 Train loss: 0.195193 Val loss: 0.1873679
epoch: 30 Train loss: 0.1985004 Val loss: 0.217934
early stop at epoch: 39 Train loss: 0.2190993 Val loss: 0.1921318
time: 1381.67 sec elapsed
Despite we are adding “some” parameters (!), the second model doesn’t show a clear improvement in term of loss. You have to work on the fine-tuning of hyper-parameters to find the best architecture to solve your specific problem.
$model_description example2
[1] "model with 8 GraphNet layers, 1 classification tasks and 1 regression tasks (108826 parameters)"
$history example2
Concluding thought: why did the Graph Neural Network refuse to go to the party? Because it didn’t know how to … network! (yeah, it’s a joke).
Enzoi!
Footnotes
A brief (and uncomplete) list with some general references on the topic:
Battaglia, P. W., et al. “Relational inductive biases, deep learning, and graph networks.” arXiv preprint arXiv:1806.01261 (2018);
Xu, K., et al. “How Powerful are Graph Neural Networks?” International Conference on Learning Representations, 2019;
Hamilton, W. L., et al. “Inductive representation learning on large graphs.” Advances in Neural Information Processing Systems, 2017, pp. 1024-1034;
Zhou, J., et al. “Graph Neural Networks: A Review of Methods and Applications.” arXiv preprint arXiv:1812.08434 (2018).
For reference on entropy, take a look to this package here.↩︎