Intro to janus

Author

Giancarlo Vercellino

Published

December 16, 2022

“In ancient Roman religion and myth, Janus is the god of beginnings, gates, transitions, time, duality, doorways, passages, frames, and endings.’” (Wikipedia)

“In the mathematical field of graph theory, a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets U and V, that is every edge connects a vertex in U to one in V. Vertex sets U and V are usually called the parts of the graph. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles.” (Wikipedia)

“A friend is Janus-faced: he looks to the past and the future. He is the child of all my foregoing hours, the prophet of those to come, and the harbinger of a greater friend.”(Ralph Waldo Emerson)

“A good face, they say, is a letter of recommendation. O Nature, Nature, why art thou so dishonest, as ever to send men with these false recommendations into the World!” (Henry Fielding)

Your recommendations are as good as your friends

janus is a recommending system based on Embedding Neural Networks: despite other recommending systems based on matrix factorization, here we used neural networks with embedding layers, so you can project actors and items into any multidimensional space and directly optimizing the embeddings, avoiding the critical performance issues typically related with huge matrix factorization, thanks to the batch size parameter of the back-propagation process.

The architecture of janus is quite simple, as depicted below: the information requires are only three features (the rating actors, items and values), then the optimization is guided through a Coarse-to-Fine (CTF) pipeline process that progressively focus on the hyper-parameters offering the best performances. You can easily manage the process, deciding the n_steps, the n_samp per step and the n_top models selected at each step to reduce the search range.

The back-end is based on TensorFlow, so some plots and model representations are pretty standard.

How-To: some examples

Let’s start with a dummy dataset containing three features: a rating actors (numeric, factor or character), a rated items (numeric, factor or character), a rating value (numeric, factor or character). In our example we have 10 actors randomly rating 30 items with a value between 1 and 5.

library(janus)

dummy <- data.frame(actor = sample(10, 100, T), item = sample(30, 100, T), rating = sample(5, 100, T))
knitr::kable(head(dummy, 10), align = "ccc", caption = "Our dummy set: 10 raters, 30 items and rating scale 1 to 5")

Our dummy set: 10 raters, 30 items and rating scale 1 to 5
actor	item	rating
2	13	4
6	19	2
6	6	3
9	8	3
10	25	3
5	10	1
1	25	1
6	21	3
9	7	5
4	6	1

The minimal set of required parameters are data frame, feature names and task (classif if we are dealing with a classification scale or regr if numeric). The default behavior is a CTF optimization with the sampling of 10 random models during each phase using the default parameters.

example1 <- janus(dummy, rating_label = "rating", rater_label = "actor", rated_label = "item", task = "classif")

step  1 
time: 8.66 sec elapsed
time: 6.31 sec elapsed
time: 3.15 sec elapsed
time: 3.09 sec elapsed
time: 3.59 sec elapsed
time: 3.67 sec elapsed
time: 3.53 sec elapsed
time: 3.81 sec elapsed
time: 5.13 sec elapsed
time: 4.39 sec elapsed
step  2 
time: 6 sec elapsed
time: 7.4 sec elapsed
time: 6.5 sec elapsed
time: 4.58 sec elapsed
time: 4.32 sec elapsed
time: 6.92 sec elapsed
time: 3.92 sec elapsed
time: 4.48 sec elapsed
time: 5.84 sec elapsed
time: 3.49 sec elapsed
step  3 
time: 5.95 sec elapsed
time: 7.34 sec elapsed
time: 7.07 sec elapsed
time: 3.83 sec elapsed
time: 4.23 sec elapsed
time: 6.88 sec elapsed
time: 4.24 sec elapsed
time: 3.94 sec elapsed
time: 4.24 sec elapsed
time: 3.78 sec elapsed
coarse to fine: 160.31 sec elapsed

The outuput includes the full pipeline of explored models with the opt_metric used for selection (default to bac for classif, and mae for regr).

knitr::kable(example1$pipeline, align = "ccc", caption = "A standard pipeline with three steps")

A standard pipeline with three steps
step	activation	optimizer	rater_embedding_size	rated_embedding_size	layers	nodes	regularization_L1	regularization_L2	dropout	bac
1	bent	rmsprop	17	11	4	259	51.050270	15.577399	0.5515696	0.5000000
1	sigmoid	nadam	32	22	5	310	12.546022	80.096788	0.2249020	0.5000000
1	swish	sgd	11	17	4	120	29.005472	45.105960	0.3626408	0.5069444
1	softsign	adadelta	29	23	2	331	54.314546	52.538766	0.6641948	0.4894841
1	linear	adadelta	24	17	4	100	29.564567	7.806064	0.8090294	0.4357143
1	softsign	rmsprop	9	22	3	101	16.811318	84.613277	0.7910808	0.5000000
1	selu	sgd	17	19	4	486	52.576193	49.558358	0.7333141	0.4388889
1	elu	adagrad	17	26	3	288	8.818195	51.088856	0.4785732	0.5000000
1	swish	nadam	14	29	3	439	64.278919	36.457298	0.1239607	0.5000000
1	bent	adadelta	20	15	3	398	41.063072	1.816721	0.9392480	0.4375000
2	swish	nadam	21	17	4	265	29.265172	41.489607	0.3332468	0.5000000
2	bent	nadam	26	17	5	210	13.599736	58.898948	0.4093867	0.5000000
2	sigmoid	nadam	23	18	5	290	26.150314	51.181220	0.3878724	0.5000000
2	bent	sgd	23	21	4	122	38.652078	66.955559	0.3735114	0.4815476
2	swish	sgd	19	12	5	223	20.985587	70.864051	0.3374270	0.5569444
2	swish	nadam	30	19	5	148	29.339857	38.117864	0.4878818	0.5000000
2	sigmoid	sgd	24	17	5	303	33.281197	27.264244	0.4054307	0.5000000
2	swish	sgd	20	18	5	306	30.200249	78.693696	0.4049575	0.5454365
2	bent	rmsprop	28	21	4	294	42.595863	64.153116	0.4564578	0.5000000
2	sigmoid	sgd	17	21	4	305	38.730847	21.089628	0.4132289	0.5000000
3	swish	nadam	20	15	4	286	24.986738	56.431474	0.3570308	0.5000000
3	swish	nadam	20	15	5	262	21.237757	66.470296	0.3737452	0.5000000
3	swish	nadam	20	16	5	297	24.241304	62.019990	0.3690224	0.5000000
3	swish	sgd	20	18	4	224	27.233169	71.116013	0.3658698	0.4353175
3	swish	sgd	19	12	5	268	23.005305	73.369783	0.3579485	0.5055556
3	swish	nadam	20	17	5	235	25.004611	54.487211	0.3909766	0.5000000
3	swish	sgd	20	16	5	303	25.947835	48.228642	0.3728768	0.6505952
3	swish	sgd	19	16	5	304	25.210516	77.884625	0.3727729	0.7222222
3	swish	sgd	20	17	4	299	28.176978	69.500029	0.3840784	0.6638889
3	swish	sgd	19	17	4	304	27.252020	44.668147	0.3745887	0.5000000

The best model over 30 samples in 3 step is based on the following parameters:

example1$best_model$configuration

   step activation optimizer rater_embedding_size rated_embedding_size layers
28    3      swish       sgd                   19                   16      5
   nodes regularization_L1 regularization_L2   dropout       bac
28   304          25.21052          77.88462 0.3727729 0.7222222

You can also find other information in the classic TensorFlow style:

example1$best_model$model

Model: "model_27"
_____________________________________________________________________
Layer (type)          Output Shape   Param #  Connected to           
=====================================================================
input_55 (InputLayer) [(None, 1)]    0                               
_____________________________________________________________________
input_56 (InputLayer) [(None, 1)]    0                               
_____________________________________________________________________
embedding_54 (Embeddi (None, 1, 19)  209      input_55[0][0]         
_____________________________________________________________________
embedding_55 (Embeddi (None, 1, 16)  464      input_56[0][0]         
_____________________________________________________________________
flatten_54 (Flatten)  (None, 19)     0        embedding_54[0][0]     
_____________________________________________________________________
flatten_55 (Flatten)  (None, 16)     0        embedding_55[0][0]     
_____________________________________________________________________
concatenate_27 (Conca (None, 35)     0        flatten_54[0][0]       
                                              flatten_55[0][0]       
_____________________________________________________________________
dense_141 (Dense)     (None, 304)    10944    concatenate_27[0][0]   
_____________________________________________________________________
activation_111 (Activ (None, 304)    0        dense_141[0][0]        
_____________________________________________________________________
dropout_114 (Dropout) (None, 304)    0        activation_111[0][0]   
_____________________________________________________________________
batch_normalization_1 (None, 304)    1216     dropout_114[0][0]      
_____________________________________________________________________
dense_142 (Dense)     (None, 304)    92720    batch_normalization_114
_____________________________________________________________________
activation_112 (Activ (None, 304)    0        dense_142[0][0]        
_____________________________________________________________________
dropout_115 (Dropout) (None, 304)    0        activation_112[0][0]   
_____________________________________________________________________
batch_normalization_1 (None, 304)    1216     dropout_115[0][0]      
_____________________________________________________________________
dense_143 (Dense)     (None, 304)    92720    batch_normalization_115
_____________________________________________________________________
activation_113 (Activ (None, 304)    0        dense_143[0][0]        
_____________________________________________________________________
dropout_116 (Dropout) (None, 304)    0        activation_113[0][0]   
_____________________________________________________________________
batch_normalization_1 (None, 304)    1216     dropout_116[0][0]      
_____________________________________________________________________
dense_144 (Dense)     (None, 304)    92720    batch_normalization_116
_____________________________________________________________________
activation_114 (Activ (None, 304)    0        dense_144[0][0]        
_____________________________________________________________________
dropout_117 (Dropout) (None, 304)    0        activation_114[0][0]   
_____________________________________________________________________
batch_normalization_1 (None, 304)    1216     dropout_117[0][0]      
_____________________________________________________________________
dense_145 (Dense)     (None, 304)    92720    batch_normalization_117
_____________________________________________________________________
activation_115 (Activ (None, 304)    0        dense_145[0][0]        
_____________________________________________________________________
dropout_118 (Dropout) (None, 304)    0        activation_115[0][0]   
_____________________________________________________________________
batch_normalization_1 (None, 304)    1216     dropout_118[0][0]      
_____________________________________________________________________
dense_146 (Dense)     (None, 5)      1525     batch_normalization_118
=====================================================================
Total params: 390,102
Trainable params: 387,062
Non-trainable params: 3,040
_____________________________________________________________________

So, out best model used thousands and thousands of parameters to give us the following recommending function. Which items can we recommend to rating actor number 5? Look at rated column.

example1$best_model$recommend(5, top_n = 10)

Top rated items for rater 5
Rater	Rated	Predicted Rating	Probability
5	20	5	0.2052065
5	10	5	0.2050587
5	28	5	0.2039816
5	5	5	0.2038402
5	16	5	0.2028807
5	21	5	0.2027042
5	1	5	0.2018995
5	4	4	0.2101161
5	29	4	0.2085713
5	17	4	0.2084644

The recommend function allows for predicting the top_n predicted item with the relative rating and probability. The output may change if we decide to consider the rating scale as a numeric (and not ordinal one). Let’s see another example: this time we change also the number of phases (from 3 to 4) and the selector size for the CTF process (from top 3 in each phase to top 4).

example2 <- janus(dummy, rating_label = "rating", rater_label = "actor", rated_label = "item", task = "regr", n_steps = 4, n_top = 4)

step  1 
time: 4.33 sec elapsed
time: 4.29 sec elapsed
time: 3.83 sec elapsed
time: 4.84 sec elapsed
time: 4 sec elapsed
time: 3.58 sec elapsed
time: 5.98 sec elapsed
time: 4.18 sec elapsed
time: 6.05 sec elapsed
time: 3.52 sec elapsed
step  2 
time: 5.95 sec elapsed
time: 4.4 sec elapsed
time: 4.33 sec elapsed
time: 4.71 sec elapsed
time: 4.83 sec elapsed
time: 4.27 sec elapsed
time: 5.12 sec elapsed
time: 6.11 sec elapsed
time: 4.11 sec elapsed
time: 6.06 sec elapsed
step  3 
time: 5.5 sec elapsed
time: 5.91 sec elapsed
time: 6.03 sec elapsed
time: 5.93 sec elapsed
time: 5.86 sec elapsed
time: 5.28 sec elapsed
time: 5 sec elapsed
time: 5.14 sec elapsed
time: 5.09 sec elapsed
time: 5.16 sec elapsed
step  4 
time: 4.77 sec elapsed
time: 5.16 sec elapsed
time: 6.86 sec elapsed
time: 6.2 sec elapsed
time: 5.77 sec elapsed
time: 6.21 sec elapsed
time: 5.25 sec elapsed
time: 4.97 sec elapsed
time: 5.92 sec elapsed
time: 4.98 sec elapsed
coarse to fine: 206.34 sec elapsed

This time the pipeline includes 40 random models over 4 steps. The best selected model is the following configuration:

example2$best_model$configuration

   step activation optimizer rater_embedding_size rated_embedding_size layers
12    2        elu      adam                   28                   23      3
   nodes regularization_L1 regularization_L2   dropout   mae
12   430           66.6712          53.36247 0.2254691 1.536

And this time the outcome of the recommending function is numeric:

example2$best_model$recommend("5", top_n = 5)

Top 5 predicted rating for rater 5 in case of numeric scale
Rater	Rated	Predicted Rating
5	29	1.836775
5	11	1.836770
5	7	1.836770
5	3	1.836769
5	8	1.836769

If you have some kind of domain intuition on the correct values of some hyper-parameters, you can override the default search options anytime you want:

example3 <- janus(dummy, rating_label = "rating", rater_label = "actor", rated_label = "item", task = "classif", activations = "mish", rater_embedding_size = 5, rated_embedding_size = 5, layers = 1)

step  1 
time: 3.58 sec elapsed
time: 2.5 sec elapsed
time: 3.34 sec elapsed
time: 3.59 sec elapsed
time: 3.5 sec elapsed
time: 2.55 sec elapsed
time: 2.71 sec elapsed
time: 2.45 sec elapsed
time: 3.53 sec elapsed
time: 3.31 sec elapsed
step  2 
time: 2.36 sec elapsed
time: 2.43 sec elapsed
time: 2.61 sec elapsed
time: 5.8 sec elapsed
time: 2.6 sec elapsed
time: 2.58 sec elapsed
time: 2.65 sec elapsed
time: 2.75 sec elapsed
time: 2.59 sec elapsed
time: 2.64 sec elapsed
step  3 
time: 2.53 sec elapsed
time: 2.74 sec elapsed
time: 2.39 sec elapsed
time: 2.55 sec elapsed
time: 2.6 sec elapsed
time: 2.61 sec elapsed
time: 2.36 sec elapsed
time: 2.61 sec elapsed
time: 2.56 sec elapsed
time: 2.54 sec elapsed
coarse to fine: 86.17 sec elapsed

As you can see, in this case the search and final configuration are based on the options of your chosing:

knitr::kable(head(example3$pipeline, 5), align = "ccc", caption = "This time we set some hyperparameters: activation, embedding and number of layers")

This time we set some hyperparameters: activation, embedding and number of layers
step	activation	optimizer	rater_embedding_size	rated_embedding_size	layers	nodes	regularization_L1	regularization_L2	dropout	bac
1	mish	rmsprop	5	5	1	76	84.11005	37.44519	0.7297728	0.5000000
1	mish	sgd	5	5	1	454	96.52063	59.34469	0.6272155	0.5486111
1	mish	adamax	5	5	1	421	61.47267	82.99442	0.7850897	0.5000000
1	mish	rmsprop	5	5	1	398	31.06477	71.53775	0.2507421	0.5000000
1	mish	rmsprop	5	5	1	198	21.17491	94.39060	0.8008507	0.5000000

For any best_model, you get also some standard error metrics and plots, such as the following:

example3$best_model$plot

The standard error metrics are collected within test_metrics and are related to the holdout testing frame (while the training and validation errors come from a repeated cross-validation setting folds and reps).

example3$best_model$test_metrics

       bac        avs        avp        avf       kend       ndcg 
 0.6152778  0.4000000  0.2222222  0.1333333 -0.4006168  0.6143525

And don’t forget: your recommendations will always be as good as your friends.

Enzoi.