Intro to spinner

Author

Giancarlo Vercellino

Published

March 7, 2023

“The Moirai come to every man: they bring him what is his own.” - Pindar, Olympian Odes

“In the more general subject of Geometric Deep Learning, certain existing neural network architectures can be interpreted as GNNs operating on suitably defined graphs.” Wikipedia

“The world is not made of atoms, it is made of stories.” - Muriel Rukeyser

“Message passing layers are permutation-equivariant layers mapping a graph into an updated representation of the same graph.” - Wikipedia

“The shortest distance between two points is often unbearable.” - Charles Bukowski

The thread of life (and graphs)

Spinner is an implementation of Graph Nets based on torch. Graph Nets¹ are a family of neural network architectures designed for processing graphs and other structured data (for a gentle introduction, take a look here). They consist of a set of message-passing operations, which propagate information between nodes and edges in the graph, and a set of update functions, which compute new node and edge features based on the received messages.

The process flow of spinner:

Data Preparation: spinner starts with two tasks: graph sampling and feature extraction. In case of huge graph structures, it is possible to sample a subgraph, by setting the sampling value and tuning the graph density threshold who shifts the sampling process from edges to nodes. Then, spinner begins with extracting the features from the igraph format. If the graph does not have any features, the algorithm calculates new features using null values, adjacency embedding, or laplacian embedding. Null values represent the absence of any relevant information (zeroed initialization); adjacency embedding captures the relationship between nodes based on the decomposition of graph’s adjacency matrix, while Laplacian embedding decomposes the Laplacian matrix to capture both local and global properties². In case of features with missing values, random imputation is performed using the empirical distribution.
Graph Net Layers: spinner is then executed with the selected number of layers. Each layer consists of message passing and graph-independent forward network. In message passing, each node in the graph receives messages from its neighboring nodes, and the messages are used to update the node’s feature representation (the update_order allows for different options). The combination of updates is based on a linear transformation. The graph-independent forward network takes the updated feature representations and applies a DNN transformation. This process is repeated for the selected number of layers, allowing the algorithm to refine the features and build more complex representations of the graph.
Output Transformation: once the Graph Net Layers are completed, an optional skip shortcut can be applied. Skip shortcut enables the algorithm to skip certain layers and connect the inputs directly to the output layer, improving the algorithm’s efficiency³. Finally, the output is transformed by a linear transformation for regression tasks , mapping the output to a continuous range, or a softmax/sigmoid activation for label features, mapping the output to a probability distribution. This step produces the final output of the Graph Net algorithm, representing the predicted values or probabilities for the given graph features.

Some examples with a synthetic graph

Let’s start with a small dummy graph with 100 nodes and some degree of randomness and structured information, represented by two numeric features set on normalized degree centrality and two character features from the cut of betweeness statistic, equally distributed among nodes and edges, no explicit feature value for the context/ the global graph.

library(igraph)
library(ggplot2)
library(ggnetwork)

dummy_graph <- random.graph.game(100, 0.05)

###ADD SOME FEATURES
V(dummy_graph)$node_feat1 <- degree(dummy_graph, normalized = T) + runif(50)
V(dummy_graph)$node_feat2 <- as.character(cut(betweenness(dummy_graph, normalized = T), 3))

E(dummy_graph)$edge_feat1 <- degree(line.graph(dummy_graph), normalized = T) + runif(ecount(dummy_graph))
E(dummy_graph)$edge_feat2 <- as.character(cut(betweenness(line.graph(dummy_graph), normalized = T), 2))
  
  
plot <- ggplot(ggnetwork(dummy_graph), aes(x = x, y = y, xend = xend, yend = yend)) + geom_edges(aes(color = edge_feat2)) + geom_nodes(aes(size = node_feat1, color = node_feat2)) + theme_void() + guides(size = 'none', color = 'none') + scale_color_manual(values = viridis::viridis(15, direction = -1, option = "E")[c(3, 6, 9, 12, 15)])+ labs(caption = "Dummy Graph") + theme(plot.caption = element_text(hjust = 0.5, size = 12))

plot

The minimal set of required parameters are a graph in igraph format, prediction target (node or edge), labels for node, edge and context features, if available (when there are no specific features, new features will be calculated using the embedding method, with a default embedding size of 5 that can be modified for node, edge and context using relative arguments). In example1, we illustrate a repeated cross-validation with two folds and three repetitions, using all the node and edge features (and initializing the context to a vector of 5 zeros with default options), and a prediction target sets on edge.

library(spinner)

example1 <- spinner(dummy_graph, target = "edge", node_labels = c("node_feat1", "node_feat2"), edge_labels = c("edge_feat1", "edge_feat2"), holdout = 0.6, reps = 3, folds = 2, n_layers = 1)

epoch:  10    Train loss:  0.7137018    Val loss:  0.6144015 
epoch:  20    Train loss:  0.5944655    Val loss:  0.6190196 
epoch:  30    Train loss:  0.6620579    Val loss:  0.5998684 
epoch:  40    Train loss:  0.6194906    Val loss:  0.6096552 
epoch:  50    Train loss:  0.6580924    Val loss:  0.6060377 
early stop at epoch:  55    Train loss:  0.6815769    Val loss:  0.6249834 
epoch:  10    Train loss:  0.4619544    Val loss:  0.6726884 
epoch:  20    Train loss:  0.4726847    Val loss:  0.6579408 
epoch:  30    Train loss:  0.4742717    Val loss:  0.6465809 
early stop at epoch:  33    Train loss:  0.466705    Val loss:  0.6466084 
epoch:  10    Train loss:  0.57569    Val loss:  0.6114293 
epoch:  20    Train loss:  0.5789454    Val loss:  0.6105499 
epoch:  30    Train loss:  0.5311384    Val loss:  0.5529644 
early stop at epoch:  31    Train loss:  0.5587688    Val loss:  0.6154771 
epoch:  10    Train loss:  0.6228946    Val loss:  0.6434678 
epoch:  20    Train loss:  0.5117735    Val loss:  0.5998991 
epoch:  30    Train loss:  0.5744619    Val loss:  0.6196617 
early stop at epoch:  36    Train loss:  0.5466098    Val loss:  0.627533 
epoch:  10    Train loss:  0.3084492    Val loss:  0.333145 
epoch:  20    Train loss:  0.2644525    Val loss:  0.3095223 
epoch:  30    Train loss:  0.3161576    Val loss:  0.3185408 
early stop at epoch:  31    Train loss:  0.3645605    Val loss:  0.3318309 
epoch:  10    Train loss:  0.5694327    Val loss:  0.5735918 
epoch:  20    Train loss:  0.6654618    Val loss:  0.6981404 
epoch:  30    Train loss:  0.5631    Val loss:  0.7350405 
early stop at epoch:  30    Train loss:  0.5631    Val loss:  0.7350405 
epoch:  10    Train loss:  0.7765082    Val loss:  0.7766961 
epoch:  20    Train loss:  0.7349857    Val loss:  0.7830988 
epoch:  30    Train loss:  0.7914677    Val loss:  0.754347 
early stop at epoch:  34    Train loss:  0.8173763    Val loss:  0.7861751 
time: 119.1 sec elapsed

The outuput includes graph (original or sampled), model description and summary, prediction function for new graph data, cross-validation and summary error, history plot for the loss function (for final training and testing) and time log.

names(example1)

[1] "graph"             "model_description" "model_summary"    
[4] "pred_fun"          "cv_errors"         "summary_errors"   
[7] "history"           "time_log"

Model description and model summary give some level of general and detail information about the model structure and the parameter numbers. The default option for n_layers is equal to 3 Graph Net layers with updating order edge-node-context for the message passing (“enc”).

example1$model_description

[1] "model with 1 GraphNet layers, 1 classification tasks and 1 regression tasks (1029 parameters)"

example1$model_summary

$GraphNetLayer1
An `nn_module` containing 1,027 parameters.

-- Modules ---------------------------------------------------------------------
* context_to_edge: <nn_pooling_from_context_to_edges_layer> #18 parameters
* context_to_node: <nn_pooling_from_context_to_nodes_layer> #24 parameters
* edge_to_context: <nn_pooling_from_edges_to_context_layer> #20 parameters
* edge_to_node: <nn_pooling_from_edges_to_nodes_layer> #16 parameters
* node_to_context: <nn_pooling_from_nodes_to_context_layer> #25 parameters
* node_to_edge: <nn_pooling_from_nodes_to_edges_layer> #15 parameters
* node_fusion: <nn_linear> #3 parameters
* edge_fusion: <nn_linear> #3 parameters
* context_fusion: <nn_linear> #3 parameters
* independent_layer: <nn_graph_independent_forward_layer> #900 parameters

$classif1
An `nn_module` containing 0 parameters.

$regr1
An `nn_module` containing 2 parameters.

-- Parameters ------------------------------------------------------------------
* weight: Float [1:1, 1:1]
* bias: Float [1:1]

The error and loss function are based on an entropy-weighted normalized loss, where the original loss functions are mse for scalar features and binary cross-entropy for label features. The normalization is based on 1 - 1/exp(loss) for each predicted feature. The entropy weights are calculated using the Laplace method⁴.

example1$cv_errors

  reps folds     train validation
1    1     1 0.6815769  0.6249834
2    1     2 0.4667050  0.6466084
3    2     1 0.5587688  0.6154771
4    2     2 0.5466098  0.6275330
5    3     1 0.3645605  0.3318309
6    3     2 0.5631000  0.7350405

example1$summary_errors

     train validation       test 
 0.5302202  0.5969122  0.7861751

example1$history

Now, let’s try to predict on new graph data: to do so, we add three new nodes and three news edges to our dummy graph, feed the new graph into pred_fun and get the predicted values for edge_feat1 and edge_feat2. Easy.

new_dummy_graph <- add.vertices(dummy_graph, 3)
new_dummy_graph <- add.edges(new_dummy_graph, c(51, 10, 52, 30, 53, 40))

example1$pred_fun(new_dummy_graph)

$edge_feat1
     from to edge_feat1
[1,]   10 51 -0.7320095
[2,]   30 52 -0.7320095
[3,]   40 53 -1.1389960

$edge_feat2
     from to edge_feat2_..2.67e.05.0.0134. edge_feat2_.0.0134.0.0268.
[1,]   10 51                     0.7406420                  0.2593580
[2,]   30 52                     0.7003825                  0.2996176
[3,]   40 53                     0.3513239                  0.6486760

Now, let’s try another predictive model, this time on a different architecture: we use a number of Graph Net layers equal to the graph diameter minus one (and a skip shortcut), while the forward net is based on three layers with more specific activation functions and drop outs. We also increase the L2-regularization with weight_decay and opt for a different update order (“nce”: first the messages pass on nodes, then on context and in the end on edges).

example2 <- spinner(dummy_graph, target = "edge", node_labels = c("node_feat1", "node_feat2"), edge_labels = c("edge_feat1", "edge_feat2"), reps = 3, folds = 2, n_layers = diameter(dummy_graph), dnn_form = c(32, 64, 32), dnn_drop = c(0.3, 0.5, 0.3), skip_shortcut = T, weight_decay = 0.05, update_order = "nce", holdout = 0.3)

epoch:  10    Train loss:  0.4771449    Val loss:  0.5281988 
epoch:  20    Train loss:  0.5203946    Val loss:  0.5565221 
epoch:  30    Train loss:  0.5175064    Val loss:  0.4081402 
early stop at epoch:  32    Train loss:  0.4814069    Val loss:  0.5704495 
epoch:  10    Train loss:  0.3075961    Val loss:  0.3461457 
epoch:  20    Train loss:  0.3171342    Val loss:  0.3589231 
epoch:  30    Train loss:  0.3137291    Val loss:  0.346644 
early stop at epoch:  33    Train loss:  0.3260146    Val loss:  0.3676803 
epoch:  10    Train loss:  0.2746742    Val loss:  0.272223 
epoch:  20    Train loss:  0.2873681    Val loss:  0.2948366 
epoch:  30    Train loss:  0.297682    Val loss:  0.2497663 
epoch:  40    Train loss:  0.261989    Val loss:  0.2963362 
early stop at epoch:  40    Train loss:  0.261989    Val loss:  0.2963362 
epoch:  10    Train loss:  0.6362951    Val loss:  0.653718 
epoch:  20    Train loss:  0.6636615    Val loss:  0.6275757 
epoch:  30    Train loss:  0.6286563    Val loss:  0.6419924 
epoch:  40    Train loss:  0.6809555    Val loss:  0.6062608 
early stop at epoch:  42    Train loss:  0.6995168    Val loss:  0.6282438 
epoch:  10    Train loss:  0.5628222    Val loss:  0.5956068 
epoch:  20    Train loss:  0.5892258    Val loss:  0.5917251 
epoch:  30    Train loss:  0.5953156    Val loss:  0.6059892 
early stop at epoch:  39    Train loss:  0.5678455    Val loss:  0.596338 
epoch:  10    Train loss:  0.5379111    Val loss:  0.558876 
epoch:  20    Train loss:  0.5317364    Val loss:  0.5530674 
epoch:  30    Train loss:  0.588866    Val loss:  0.5431762 
early stop at epoch:  31    Train loss:  0.5304825    Val loss:  0.5873731 
epoch:  10    Train loss:  0.1922428    Val loss:  0.19973 
epoch:  20    Train loss:  0.195193    Val loss:  0.1873679 
epoch:  30    Train loss:  0.1985004    Val loss:  0.217934 
early stop at epoch:  39    Train loss:  0.2190993    Val loss:  0.1921318 
time: 1381.67 sec elapsed

Despite we are adding “some” parameters (!), the second model doesn’t show a clear improvement in term of loss. You have to work on the fine-tuning of hyper-parameters to find the best architecture to solve your specific problem.

example2$model_description

[1] "model with 8 GraphNet layers, 1 classification tasks and 1 regression tasks (108826 parameters)"

example2$history

Concluding thought: why did the Graph Neural Network refuse to go to the party? Because it didn’t know how to … network! (yeah, it’s a joke).

Enzoi!

Footnotes

A brief (and uncomplete) list with some general references on the topic:
- Battaglia, P. W., et al. “Relational inductive biases, deep learning, and graph networks.” arXiv preprint arXiv:1806.01261 (2018);
- Xu, K., et al. “How Powerful are Graph Neural Networks?” International Conference on Learning Representations, 2019;
- Hamilton, W. L., et al. “Inductive representation learning on large graphs.” Advances in Neural Information Processing Systems, 2017, pp. 1024-1034;
- Zhou, J., et al. “Graph Neural Networks: A Review of Methods and Applications.” arXiv preprint arXiv:1812.08434 (2018).
↩︎
For the Laplacian matrix, you can start here.↩︎
For skip shortcuts and residual nets, take a look here.↩︎
For reference on entropy, take a look to this package here.↩︎