“In ancient Roman religion and myth, Janus is the god of beginnings, gates, transitions, time, duality, doorways, passages, frames, and endings.’” (Wikipedia)
“In the mathematical field of graph theory, a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets U and V, that is every edge connects a vertex in U to one in V. Vertex sets U and V are usually called the parts of the graph. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles.” (Wikipedia)
“A friend is Janus-faced: he looks to the past and the future. He is the child of all my foregoing hours, the prophet of those to come, and the harbinger of a greater friend.”(Ralph Waldo Emerson)
“A good face, they say, is a letter of recommendation. O Nature, Nature, why art thou so dishonest, as ever to send men with these false recommendations into the World!” (Henry Fielding)
Your recommendations are as good as your friends
janus is a recommending system based on Embedding Neural Networks: despite other recommending systems based on matrix factorization, here we used neural networks with embedding layers, so you can project actors and items into any multidimensional space and directly optimizing the embeddings, avoiding the critical performance issues typically related with huge matrix factorization, thanks to the batch size parameter of the back-propagation process.
The architecture of janus is quite simple, as depicted below: the information requires are only three features (the rating actors, items and values), then the optimization is guided through a Coarse-to-Fine (CTF) pipeline process that progressively focus on the hyper-parameters offering the best performances. You can easily manage the process, deciding the n_steps, the n_samp per step and the n_top models selected at each step to reduce the search range.
The back-end is based on TensorFlow, so some plots and model representations are pretty standard.
Figure 1: The process flow of janus
How-To: some examples
Let’s start with a dummy dataset containing three features: a rating actors (numeric, factor or character), a rated items (numeric, factor or character), a rating value (numeric, factor or character). In our example we have 10 actors randomly rating 30 items with a value between 1 and 5.
Our dummy set: 10 raters, 30 items and rating scale 1 to 5
actor
item
rating
2
13
4
6
19
2
6
6
3
9
8
3
10
25
3
5
10
1
1
25
1
6
21
3
9
7
5
4
6
1
The minimal set of required parameters are data frame, feature names and task (classif if we are dealing with a classification scale or regr if numeric). The default behavior is a CTF optimization with the sampling of 10 random models during each phase using the default parameters.
So, out best model used thousands and thousands of parameters to give us the following recommending function. Which items can we recommend to rating actor number 5? Look at rated column.
example1$best_model$recommend(5, top_n =10)
Top rated items for rater 5
Rater
Rated
Predicted Rating
Probability
5
20
5
0.2052065
5
10
5
0.2050587
5
28
5
0.2039816
5
5
5
0.2038402
5
16
5
0.2028807
5
21
5
0.2027042
5
1
5
0.2018995
5
4
4
0.2101161
5
29
4
0.2085713
5
17
4
0.2084644
The recommend function allows for predicting the top_n predicted item with the relative rating and probability. The output may change if we decide to consider the rating scale as a numeric (and not ordinal one). Let’s see another example: this time we change also the number of phases (from 3 to 4) and the selector size for the CTF process (from top 3 in each phase to top 4).
This time the pipeline includes 40 random models over 4 steps. The best selected model is the following configuration:
example2$best_model$configuration
step activation optimizer rater_embedding_size rated_embedding_size layers
12 2 elu adam 28 23 3
nodes regularization_L1 regularization_L2 dropout mae
12 430 66.6712 53.36247 0.2254691 1.536
And this time the outcome of the recommending function is numeric:
example2$best_model$recommend("5", top_n =5)
Top 5 predicted rating for rater 5 in case of numeric scale
Rater
Rated
Predicted Rating
5
29
1.836775
5
11
1.836770
5
7
1.836770
5
3
1.836769
5
8
1.836769
If you have some kind of domain intuition on the correct values of some hyper-parameters, you can override the default search options anytime you want:
As you can see, in this case the search and final configuration are based on the options of your chosing:
knitr::kable(head(example3$pipeline, 5), align ="ccc", caption ="This time we set some hyperparameters: activation, embedding and number of layers")
This time we set some hyperparameters: activation, embedding and number of layers
step
activation
optimizer
rater_embedding_size
rated_embedding_size
layers
nodes
regularization_L1
regularization_L2
dropout
bac
1
mish
rmsprop
5
5
1
76
84.11005
37.44519
0.7297728
0.5000000
1
mish
sgd
5
5
1
454
96.52063
59.34469
0.6272155
0.5486111
1
mish
adamax
5
5
1
421
61.47267
82.99442
0.7850897
0.5000000
1
mish
rmsprop
5
5
1
398
31.06477
71.53775
0.2507421
0.5000000
1
mish
rmsprop
5
5
1
198
21.17491
94.39060
0.8008507
0.5000000
For any best_model, you get also some standard error metrics and plots, such as the following:
example3$best_model$plot
The standard error metrics are collected within test_metrics and are related to the holdout testing frame (while the training and validation errors come from a repeated cross-validation setting folds and reps).