1 Implementing a Recommender System on Spark

Back To Top

In our previous assignment, we experimented with accuracy measures and incorporated serendipity into our recommender system. The dataset we used for that assignment - in its original form - had appoximately 1.5 mil ratings, which we in turn scaled down considerably.

In this assignment, we will attempt to use the full dataset using Spark, then compare the performance to the model we built in project 4 using the scaled down dataset.

## Parsed with column specification:
## cols(
##   userId = col_double(),
##   movieId = col_double(),
##   rating = col_double(),
##   timestamp = col_double()
## )

1.2 Create Recommender Model in Spark

Back To Top

After loading the data frame to Spark, we built a recommender system using the ml_als function available in sparklyr, which builds collaborative filtering models using Alternating Least Squares (ALS).

We then used that model to make predictions and compiled accuracy metrics. In a later section, we will compare these metrics to the same metrics generated by running the model exactly as we had in Project 4.

##                Length Class             Mode       
## pipeline_model  5     ml_pipeline_model list       
## formula         1     -none-            character  
## dataset         2     tbl_spark         list       
## pipeline        5     ml_pipeline       list       
## model          11     ml_als_model      list       
## .jobj           2     spark_jobj        environment

1.4 Compare the UBCF and SVD Recommender Models to Spark Model

Back To Top

Now we can compare the results from the ALS model built using Spark with the UBCF and SVD models we created in Project 4:

RMSE MSE MAE
ALS 0.6484466 0.4204830 0.4863415
UBCF-Cosine 0.8843518 0.7820781 0.6635512
SVD 0.8926244 0.7967784 0.6693957

As can be seen the RMSE, MSE and MAE are better for the ALS model than the UBCF or SVD models.