Effect of Interaction terms in usage based RE

The following exercise is carried out to understand the effect of dimension of number of interaction terms in the usage RE model. Model are trained for device-wise aggregated inout and the interaction terms is varied from 0,1,2,5 and 8. The results obtained here is compared here.

The input to usage RE is a matrix created for all observed and non-observed device-Content usage.

This matix is very spase and there is not enough data to estimate interactions between variables directly and independently. Factorization machines can estimate interactions even in these settings well because they break the independence of the interaction parameters by factorizing them.

The default value of dimension of interaction in LibFM package is 8. Here, we vary the value from 0 to 8.

Distribution of predicted score for each value of dimension:

Distribution of unary model coefficients:

Pairwise interactions matrices:

From the normalised model pairwise interaction coefficients, interaction matrices are created for each model feature.

## [1] "dim 8"

## [1] "dim 5"

## [1] "dim 2"

## [1] "dim 1"

Current production RE is set to dim 1.

The value of interaction is highest for index [34,34] corresponding to c1_subject, implying high correlation of this feature with itself. Further high value of interaction terms is observed for this feature with coefficients of other c1_subject features.

Effect of Interaction terms in usage based RE

Adarsa

27 March 2017

Distribution of predicted score for each value of dimension:

Distribution of unary model coefficients:

Pairwise interactions matrices: