Alternating Least Squares (ALS)

After watching the spark conference video on how spotify is using spark I became interested in how ALS can be used to make predictions for users.

From the video, prior experience and new research I found several advantages to ALS :

It makes predictions for all entries,
It will find a global minima,
It is easily parallelizable,
Surprisingly least squares can fit non-linear data.
- This is only really surprising because I spent so much time doing OLS which is linear and I always associate any least squares with this.

How it works

ALS works by using matrix factorization. Matrix factorization works by taking a matrix $A_{m \times n}$ and finding two other matricies $U_{mk} and $P_{kn} that approximately equal $A$.

This is done by first initializing $P \& U$ variables with random values and then holding one constant while updating the second and comparing the results to $A$.

Once the updates do not lead to further improvements, the process is stopped and the final matrix, $A'$ is compared to the existing values in $A$.

In the video much of the talk was dedicated to problems with problems with implementing this with spark.

This is very interesting but beyond the scope of this discussion. I was going to show my own pyspark implementation using pyspark but it turns out the spark did not like my computer. Instead I’m using R’s recomender lab.

This is a plug and play library that does it all for you. Full disclosure, the code below is coppied verbatim from this website. I included it as an example.

library(recommenderlab)

## Loading required package: Matrix

## Loading required package: arules

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

## Loading required package: proxy

## 
## Attaching package: 'proxy'

## The following object is masked from 'package:Matrix':
## 
##     as.matrix

## The following objects are masked from 'package:stats':
## 
##     as.dist, dist

## The following object is masked from 'package:base':
## 
##     as.matrix

## Loading required package: registry

data(MovieLense)

scheme <- evaluationScheme(MovieLense, method="split", train=0.9, given=-5, goodRating=4)

accuracy_table <- function(scheme, algorithm, parameter){
  r <- Recommender(getData(scheme, "train"), algorithm, parameter = parameter)
  p <- predict(r, getData(scheme, "known"), type="ratings")                      
  acc_list <- calcPredictionAccuracy(p, getData(scheme, "unknown"))
  total_list <- c(algorithm =algorithm, acc_list)
  total_list <- total_list[sapply(total_list, function(x) !is.null(x))]
  return(data.frame(as.list(total_list)))
}

table_random <- accuracy_table(scheme, algorithm = "RANDOM", parameter = NULL)
table_ubcf <- accuracy_table(scheme, algorithm = "UBCF", parameter = list(nn=50))
table_ibcf <- accuracy_table(scheme, algorithm = "IBCF", parameter = list(k=50))
table_pop <- accuracy_table(scheme, algorithm = "POPULAR", parameter = NULL)
table_ALS_1 <- accuracy_table(scheme, algorithm = "ALS", 
                              parameter = list( normalize=NULL, lambda=0.1, n_factors=200, 
                                                n_iterations=10, seed = 1234, verbose = TRUE))

## Used parameters:
## normalize     =  NULL
## lambda    =  0.1
## n_factors     =  200
## n_iterations  =  10
## min_item_nr   =  1
## seed  =  1234
## verbose   =  TRUE
## [1] "0th iteration: cost function = 234567.802886753"
## [1] "1th iteration, step 1: cost function = 219201.419286132"
## [1] "1th iteration, step 2: cost function = 209173.640547462"
## [1] "2th iteration, step 1: cost function = 200499.060344742"
## [1] "2th iteration, step 2: cost function = 193980.29851724"
## [1] "3th iteration, step 1: cost function = 188151.683064291"
## [1] "3th iteration, step 2: cost function = 183569.54912126"
## [1] "4th iteration, step 1: cost function = 179278.335628184"
## [1] "4th iteration, step 2: cost function = 174936.790473661"
## [1] "5th iteration, step 1: cost function = 169734.421170487"
## [1] "5th iteration, step 2: cost function = 165096.595903737"
## [1] "6th iteration, step 1: cost function = 161320.261699575"
## [1] "6th iteration, step 2: cost function = 157742.500282274"
## [1] "7th iteration, step 1: cost function = 154681.297374595"
## [1] "7th iteration, step 2: cost function = 152102.965757727"
## [1] "8th iteration, step 1: cost function = 150044.960735819"
## [1] "8th iteration, step 2: cost function = 148390.160518515"
## [1] "9th iteration, step 1: cost function = 147033.14083349"
## [1] "9th iteration, step 2: cost function = 145918.817904675"
## [1] "10th iteration, step 1: cost function = 144973.288106876"
## [1] "10th iteration, step 2: cost function = 144181.90082624"

rbind(table_random, table_pop, table_ubcf, table_ibcf, table_ALS_1)

##   algorithm              RMSE               MSE               MAE
## 1    RANDOM  1.35221499402814  1.82848539007453  1.05056156601822
## 2   POPULAR   1.0089118662917   1.0179031539442 0.795988545380187
## 3      UBCF  1.09697369627405  1.20335129031716 0.890989007984894
## 4      IBCF  1.39990648238184  1.95973815941471 0.985714285714286
## 5       ALS 0.951419431307476 0.905198934269441 0.757327141291289

It is interesting to see that the most accurate method is infact ALS.

Also, I am now going to spend the rest of my day trying to reinstall spark.

ALS in R

Kai Lukowiak

2018-06-21

Alternating Least Squares (ALS)

How it works