Libfm model training uses the following usage based featured of content: in-focus(CinF).

1.c1_total_timespent      
2.c1_total_sessions
3.c1_avg_ts_session       
4.c1_num_interactions
5.c1_mean_interactions_min
6.c1_total_devices
7.c1_avg_sess_device 

These features are observed to drive the model significantly. The following exercise is carried out to look at the effect of these features.

A model is trained using ‘als’ method without interaction terms and scoring is obtained.

The CinF usage based features are replaced to zeros and the scores of each of these are compared.

RE model- date:02-02-2017, als, dim1,1,0

Scatter plot between observed c1_total_ts and predicted c1_total_ts of the training data:

Model Coefficients of the Libfm model

Comparing model coefficient with linear regression model:

An ‘als’ dim 110 model is expected to behave similar to linear regression. The input features to libFM training are fiited using a lm model here.

Note:The current RE implementation do not have mapping for categorical variable levels and they are not included in visualisation.

The model coefficient values fall in the range [-1.946000 0.915500] with a median value of 0.040670

Suppressing usage parameters

##                        name      value
## 48       c1_total_timespent  0.2381800
## 49        c1_total_sessions -0.0266452
## 50        c1_avg_ts_session  0.3054390
## 51      c1_num_interactions  0.2783930
## 52 c1_mean_interactions_min  0.0383813
## 53         c1_total_devices -0.2042400
## 54       c1_avg_sess_device  0.3000490

Scatter plot of observed c1_total_ts and predicted c1_total_ts of model with suppressed features :

scatter plot of c1_total_ts before and after suppressing usage parameters:

Obsevations:

After suppressing usage features, the range of predicted score has become [-70.31 66.02]

Mean of observed c1_total_ts is 5.228 and from libFM is 5.229. This has become 2.963 when usage features are suppressed.

Further it is seen that the predicted value falls in ‘bands’. The features are divided into 3 groups based on value of predicted score. Distribution of each of the c1_usage params in these groups are ploted below.

##                  name   value
## 48 c1_total_timespent 0.23818

##                 name      value
## 49 c1_total_sessions -0.0266452

##                 name    value
## 50 c1_avg_ts_session 0.305439

##                   name    value
## 51 c1_num_interactions 0.278393

##                        name     value
## 52 c1_mean_interactions_min 0.0383813

##                name    value
## 53 c1_total_devices -0.20424

##                  name    value
## 54 c1_avg_sess_device 0.300049