Intro to janus

Author

Giancarlo Vercellino

Published

December 16, 2022

In ancient Roman religion and myth, Janus is the god of beginnings, gates, transitions, time, duality, doorways, passages, frames, and endings.’” (Wikipedia)

“In the mathematical field of graph theory, a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets U and V, that is every edge connects a vertex in U to one in V. Vertex sets U and V are usually called the parts of the graph. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles.” (Wikipedia)

A friend is Janus-faced: he looks to the past and the future. He is the child of all my foregoing hours, the prophet of those to come, and the harbinger of a greater friend.”(Ralph Waldo Emerson)

“A good face, they say, is a letter of recommendation. O Nature, Nature, why art thou so dishonest, as ever to send men with these false recommendations into the World!” (Henry Fielding)

Your recommendations are as good as your friends

janus is a recommending system based on Embedding Neural Networks: despite other recommending systems based on matrix factorization, here we used neural networks with embedding layers, so you can project actors and items into any multidimensional space and directly optimizing the embeddings, avoiding the critical performance issues typically related with huge matrix factorization, thanks to the batch size parameter of the back-propagation process.

The architecture of janus is quite simple, as depicted below: the information requires are only three features (the rating actors, items and values), then the optimization is guided through a Coarse-to-Fine (CTF) pipeline process that progressively focus on the hyper-parameters offering the best performances. You can easily manage the process, deciding the n_steps, the n_samp per step and the n_top models selected at each step to reduce the search range.

The back-end is based on TensorFlow, so some plots and model representations are pretty standard.

Figure 1: The process flow of janus

How-To: some examples

Let’s start with a dummy dataset containing three features: a rating actors (numeric, factor or character), a rated items (numeric, factor or character), a rating value (numeric, factor or character). In our example we have 10 actors randomly rating 30 items with a value between 1 and 5.

library(janus)

dummy <- data.frame(actor = sample(10, 100, T), item = sample(30, 100, T), rating = sample(5, 100, T))
knitr::kable(head(dummy, 10), align = "ccc", caption = "Our dummy set: 10 raters, 30 items and rating scale 1 to 5")
Our dummy set: 10 raters, 30 items and rating scale 1 to 5
actor item rating
2 13 4
6 19 2
6 6 3
9 8 3
10 25 3
5 10 1
1 25 1
6 21 3
9 7 5
4 6 1

The minimal set of required parameters are data frame, feature names and task (classif if we are dealing with a classification scale or regr if numeric). The default behavior is a CTF optimization with the sampling of 10 random models during each phase using the default parameters.

example1 <- janus(dummy, rating_label = "rating", rater_label = "actor", rated_label = "item", task = "classif")
step  1 
time: 8.66 sec elapsed
time: 6.31 sec elapsed
time: 3.15 sec elapsed
time: 3.09 sec elapsed
time: 3.59 sec elapsed
time: 3.67 sec elapsed
time: 3.53 sec elapsed
time: 3.81 sec elapsed
time: 5.13 sec elapsed
time: 4.39 sec elapsed
step  2 
time: 6 sec elapsed
time: 7.4 sec elapsed
time: 6.5 sec elapsed
time: 4.58 sec elapsed
time: 4.32 sec elapsed
time: 6.92 sec elapsed
time: 3.92 sec elapsed
time: 4.48 sec elapsed
time: 5.84 sec elapsed
time: 3.49 sec elapsed
step  3 
time: 5.95 sec elapsed
time: 7.34 sec elapsed
time: 7.07 sec elapsed
time: 3.83 sec elapsed
time: 4.23 sec elapsed
time: 6.88 sec elapsed
time: 4.24 sec elapsed
time: 3.94 sec elapsed
time: 4.24 sec elapsed
time: 3.78 sec elapsed
coarse to fine: 160.31 sec elapsed

The outuput includes the full pipeline of explored models with the opt_metric used for selection (default to bac for classif, and mae for regr).

knitr::kable(example1$pipeline, align = "ccc", caption = "A standard pipeline with three steps")
A standard pipeline with three steps
step activation optimizer rater_embedding_size rated_embedding_size layers nodes regularization_L1 regularization_L2 dropout bac
1 bent rmsprop 17 11 4 259 51.050270 15.577399 0.5515696 0.5000000
1 sigmoid nadam 32 22 5 310 12.546022 80.096788 0.2249020 0.5000000
1 swish sgd 11 17 4 120 29.005472 45.105960 0.3626408 0.5069444
1 softsign adadelta 29 23 2 331 54.314546 52.538766 0.6641948 0.4894841
1 linear adadelta 24 17 4 100 29.564567 7.806064 0.8090294 0.4357143
1 softsign rmsprop 9 22 3 101 16.811318 84.613277 0.7910808 0.5000000
1 selu sgd 17 19 4 486 52.576193 49.558358 0.7333141 0.4388889
1 elu adagrad 17 26 3 288 8.818195 51.088856 0.4785732 0.5000000
1 swish nadam 14 29 3 439 64.278919 36.457298 0.1239607 0.5000000
1 bent adadelta 20 15 3 398 41.063072 1.816721 0.9392480 0.4375000
2 swish nadam 21 17 4 265 29.265172 41.489607 0.3332468 0.5000000
2 bent nadam 26 17 5 210 13.599736 58.898948 0.4093867 0.5000000
2 sigmoid nadam 23 18 5 290 26.150314 51.181220 0.3878724 0.5000000
2 bent sgd 23 21 4 122 38.652078 66.955559 0.3735114 0.4815476
2 swish sgd 19 12 5 223 20.985587 70.864051 0.3374270 0.5569444
2 swish nadam 30 19 5 148 29.339857 38.117864 0.4878818 0.5000000
2 sigmoid sgd 24 17 5 303 33.281197 27.264244 0.4054307 0.5000000
2 swish sgd 20 18 5 306 30.200249 78.693696 0.4049575 0.5454365
2 bent rmsprop 28 21 4 294 42.595863 64.153116 0.4564578 0.5000000
2 sigmoid sgd 17 21 4 305 38.730847 21.089628 0.4132289 0.5000000
3 swish nadam 20 15 4 286 24.986738 56.431474 0.3570308 0.5000000
3 swish nadam 20 15 5 262 21.237757 66.470296 0.3737452 0.5000000
3 swish nadam 20 16 5 297 24.241304 62.019990 0.3690224 0.5000000
3 swish sgd 20 18 4 224 27.233169 71.116013 0.3658698 0.4353175
3 swish sgd 19 12 5 268 23.005305 73.369783 0.3579485 0.5055556
3 swish nadam 20 17 5 235 25.004611 54.487211 0.3909766 0.5000000
3 swish sgd 20 16 5 303 25.947835 48.228642 0.3728768 0.6505952
3 swish sgd 19 16 5 304 25.210516 77.884625 0.3727729 0.7222222
3 swish sgd 20 17 4 299 28.176978 69.500029 0.3840784 0.6638889
3 swish sgd 19 17 4 304 27.252020 44.668147 0.3745887 0.5000000

The best model over 30 samples in 3 step is based on the following parameters:

example1$best_model$configuration
   step activation optimizer rater_embedding_size rated_embedding_size layers
28    3      swish       sgd                   19                   16      5
   nodes regularization_L1 regularization_L2   dropout       bac
28   304          25.21052          77.88462 0.3727729 0.7222222

You can also find other information in the classic TensorFlow style:

example1$best_model$model
Model: "model_27"
_____________________________________________________________________
Layer (type)          Output Shape   Param #  Connected to           
=====================================================================
input_55 (InputLayer) [(None, 1)]    0                               
_____________________________________________________________________
input_56 (InputLayer) [(None, 1)]    0                               
_____________________________________________________________________
embedding_54 (Embeddi (None, 1, 19)  209      input_55[0][0]         
_____________________________________________________________________
embedding_55 (Embeddi (None, 1, 16)  464      input_56[0][0]         
_____________________________________________________________________
flatten_54 (Flatten)  (None, 19)     0        embedding_54[0][0]     
_____________________________________________________________________
flatten_55 (Flatten)  (None, 16)     0        embedding_55[0][0]     
_____________________________________________________________________
concatenate_27 (Conca (None, 35)     0        flatten_54[0][0]       
                                              flatten_55[0][0]       
_____________________________________________________________________
dense_141 (Dense)     (None, 304)    10944    concatenate_27[0][0]   
_____________________________________________________________________
activation_111 (Activ (None, 304)    0        dense_141[0][0]        
_____________________________________________________________________
dropout_114 (Dropout) (None, 304)    0        activation_111[0][0]   
_____________________________________________________________________
batch_normalization_1 (None, 304)    1216     dropout_114[0][0]      
_____________________________________________________________________
dense_142 (Dense)     (None, 304)    92720    batch_normalization_114
_____________________________________________________________________
activation_112 (Activ (None, 304)    0        dense_142[0][0]        
_____________________________________________________________________
dropout_115 (Dropout) (None, 304)    0        activation_112[0][0]   
_____________________________________________________________________
batch_normalization_1 (None, 304)    1216     dropout_115[0][0]      
_____________________________________________________________________
dense_143 (Dense)     (None, 304)    92720    batch_normalization_115
_____________________________________________________________________
activation_113 (Activ (None, 304)    0        dense_143[0][0]        
_____________________________________________________________________
dropout_116 (Dropout) (None, 304)    0        activation_113[0][0]   
_____________________________________________________________________
batch_normalization_1 (None, 304)    1216     dropout_116[0][0]      
_____________________________________________________________________
dense_144 (Dense)     (None, 304)    92720    batch_normalization_116
_____________________________________________________________________
activation_114 (Activ (None, 304)    0        dense_144[0][0]        
_____________________________________________________________________
dropout_117 (Dropout) (None, 304)    0        activation_114[0][0]   
_____________________________________________________________________
batch_normalization_1 (None, 304)    1216     dropout_117[0][0]      
_____________________________________________________________________
dense_145 (Dense)     (None, 304)    92720    batch_normalization_117
_____________________________________________________________________
activation_115 (Activ (None, 304)    0        dense_145[0][0]        
_____________________________________________________________________
dropout_118 (Dropout) (None, 304)    0        activation_115[0][0]   
_____________________________________________________________________
batch_normalization_1 (None, 304)    1216     dropout_118[0][0]      
_____________________________________________________________________
dense_146 (Dense)     (None, 5)      1525     batch_normalization_118
=====================================================================
Total params: 390,102
Trainable params: 387,062
Non-trainable params: 3,040
_____________________________________________________________________

So, out best model used thousands and thousands of parameters to give us the following recommending function. Which items can we recommend to rating actor number 5? Look at rated column.

example1$best_model$recommend(5, top_n = 10)
Top rated items for rater 5
Rater Rated Predicted Rating Probability
5 20 5 0.2052065
5 10 5 0.2050587
5 28 5 0.2039816
5 5 5 0.2038402
5 16 5 0.2028807
5 21 5 0.2027042
5 1 5 0.2018995
5 4 4 0.2101161
5 29 4 0.2085713
5 17 4 0.2084644

The recommend function allows for predicting the top_n predicted item with the relative rating and probability. The output may change if we decide to consider the rating scale as a numeric (and not ordinal one). Let’s see another example: this time we change also the number of phases (from 3 to 4) and the selector size for the CTF process (from top 3 in each phase to top 4).

example2 <- janus(dummy, rating_label = "rating", rater_label = "actor", rated_label = "item", task = "regr", n_steps = 4, n_top = 4)
step  1 
time: 4.33 sec elapsed
time: 4.29 sec elapsed
time: 3.83 sec elapsed
time: 4.84 sec elapsed
time: 4 sec elapsed
time: 3.58 sec elapsed
time: 5.98 sec elapsed
time: 4.18 sec elapsed
time: 6.05 sec elapsed
time: 3.52 sec elapsed
step  2 
time: 5.95 sec elapsed
time: 4.4 sec elapsed
time: 4.33 sec elapsed
time: 4.71 sec elapsed
time: 4.83 sec elapsed
time: 4.27 sec elapsed
time: 5.12 sec elapsed
time: 6.11 sec elapsed
time: 4.11 sec elapsed
time: 6.06 sec elapsed
step  3 
time: 5.5 sec elapsed
time: 5.91 sec elapsed
time: 6.03 sec elapsed
time: 5.93 sec elapsed
time: 5.86 sec elapsed
time: 5.28 sec elapsed
time: 5 sec elapsed
time: 5.14 sec elapsed
time: 5.09 sec elapsed
time: 5.16 sec elapsed
step  4 
time: 4.77 sec elapsed
time: 5.16 sec elapsed
time: 6.86 sec elapsed
time: 6.2 sec elapsed
time: 5.77 sec elapsed
time: 6.21 sec elapsed
time: 5.25 sec elapsed
time: 4.97 sec elapsed
time: 5.92 sec elapsed
time: 4.98 sec elapsed
coarse to fine: 206.34 sec elapsed

This time the pipeline includes 40 random models over 4 steps. The best selected model is the following configuration:

example2$best_model$configuration
   step activation optimizer rater_embedding_size rated_embedding_size layers
12    2        elu      adam                   28                   23      3
   nodes regularization_L1 regularization_L2   dropout   mae
12   430           66.6712          53.36247 0.2254691 1.536

And this time the outcome of the recommending function is numeric:

example2$best_model$recommend("5", top_n = 5)
Top 5 predicted rating for rater 5 in case of numeric scale
Rater Rated Predicted Rating
5 29 1.836775
5 11 1.836770
5 7 1.836770
5 3 1.836769
5 8 1.836769

If you have some kind of domain intuition on the correct values of some hyper-parameters, you can override the default search options anytime you want:

example3 <- janus(dummy, rating_label = "rating", rater_label = "actor", rated_label = "item", task = "classif", activations = "mish", rater_embedding_size = 5, rated_embedding_size = 5, layers = 1)
step  1 
time: 3.58 sec elapsed
time: 2.5 sec elapsed
time: 3.34 sec elapsed
time: 3.59 sec elapsed
time: 3.5 sec elapsed
time: 2.55 sec elapsed
time: 2.71 sec elapsed
time: 2.45 sec elapsed
time: 3.53 sec elapsed
time: 3.31 sec elapsed
step  2 
time: 2.36 sec elapsed
time: 2.43 sec elapsed
time: 2.61 sec elapsed
time: 5.8 sec elapsed
time: 2.6 sec elapsed
time: 2.58 sec elapsed
time: 2.65 sec elapsed
time: 2.75 sec elapsed
time: 2.59 sec elapsed
time: 2.64 sec elapsed
step  3 
time: 2.53 sec elapsed
time: 2.74 sec elapsed
time: 2.39 sec elapsed
time: 2.55 sec elapsed
time: 2.6 sec elapsed
time: 2.61 sec elapsed
time: 2.36 sec elapsed
time: 2.61 sec elapsed
time: 2.56 sec elapsed
time: 2.54 sec elapsed
coarse to fine: 86.17 sec elapsed

As you can see, in this case the search and final configuration are based on the options of your chosing:

knitr::kable(head(example3$pipeline, 5), align = "ccc", caption = "This time we set some hyperparameters: activation, embedding and number of layers")
This time we set some hyperparameters: activation, embedding and number of layers
step activation optimizer rater_embedding_size rated_embedding_size layers nodes regularization_L1 regularization_L2 dropout bac
1 mish rmsprop 5 5 1 76 84.11005 37.44519 0.7297728 0.5000000
1 mish sgd 5 5 1 454 96.52063 59.34469 0.6272155 0.5486111
1 mish adamax 5 5 1 421 61.47267 82.99442 0.7850897 0.5000000
1 mish rmsprop 5 5 1 398 31.06477 71.53775 0.2507421 0.5000000
1 mish rmsprop 5 5 1 198 21.17491 94.39060 0.8008507 0.5000000

For any best_model, you get also some standard error metrics and plots, such as the following:

example3$best_model$plot

The standard error metrics are collected within test_metrics and are related to the holdout testing frame (while the training and validation errors come from a repeated cross-validation setting folds and reps).

example3$best_model$test_metrics
       bac        avs        avp        avf       kend       ndcg 
 0.6152778  0.4000000  0.2222222  0.1333333 -0.4006168  0.6143525 

And don’t forget: your recommendations will always be as good as your friends.

Enzoi.