Research Question
For this discussion item, please watch the following talk and summarize what you found to be the most important or interesting points. The first half will cover some of the mathematical techniques covered in this unit’s reading and the second half some of the data management challenges in an industrial-scale recommendation system.
Discussion
For this week’s discussion, I found an article on Medium by Victor which discusses ALS Implicit Collaborative Filtering. I’ve highlighted a few items of interest from the article about implicit data recommendation systems, feature reduction, and alternating least squares.
Explicit vs. Implicit
Explicit data recommender systems rely on a user’s direct input or rating of a product whereas Implicit data recommender systems rely on a user’s activity level with the product or service. How many time have they listened to a song? How many repeat purchases of a product. Implicit data is far easier to gather than Explicit data.
Feature Reduction
A user matrix may include hundreds of different features. For example, users may listen to music with trumpets in it or a saxophone solo or a jazz track. Our goal here is to use matrix factorization to reduce the number of dimensions. “If we can express each user as a vector of their taste values, and at the same time express each item as a vector of what tastes they represent. You can see we can quite easily make a recommendation.”1
ALS
The alternating least squares (ALS) algorithm factorizes a given matrix R into two factors U and V such that R???UTV. The unknown row dimension is given as a parameter to the algorithm and is called latent factors.2
LS0tDQp0aXRsZTogIlJlc2VhcmNoIERpc2N1c3Npb24gQXNzaWdubWVudCAyIg0KYXV0aG9yOiAiSm9obiBLLiBIYW5jb2NrIg0KZGF0ZTogIjYvMTUvMjAyMCINCm91dHB1dDoNCiAgaHRtbF9kb2N1bWVudDoNCiAgICBjb2RlX2Rvd25sb2FkOiB5ZXMNCiAgICBjb2RlX2ZvbGRpbmc6IGhpZGUNCiAgICBoaWdobGlnaHQ6IHB5Z21lbnRzDQogICAgbnVtYmVyX3NlY3Rpb25zOiBubw0KICAgIHRoZW1lOiBmbGF0bHkNCiAgICB0b2M6IHllcw0KICAgIHRvY19mbG9hdDogeWVzDQogIHBkZl9kb2N1bWVudDoNCiAgICB0b2M6IHllcw0KLS0tDQoNCg0KDQojIyBSZXNlYXJjaCBRdWVzdGlvbg0KDQpGb3IgdGhpcyBkaXNjdXNzaW9uIGl0ZW0sIHBsZWFzZSB3YXRjaCB0aGUgZm9sbG93aW5nIHRhbGsgYW5kIHN1bW1hcml6ZSB3aGF0IHlvdSBmb3VuZCB0byBiZSB0aGUgbW9zdCBpbXBvcnRhbnQgb3IgaW50ZXJlc3RpbmcgcG9pbnRzLiBUaGUgZmlyc3QgaGFsZiB3aWxsIGNvdmVyIHNvbWUgb2YgdGhlIG1hdGhlbWF0aWNhbCB0ZWNobmlxdWVzIGNvdmVyZWQgaW4gdGhpcyB1bml0J3MgcmVhZGluZyBhbmQgdGhlIHNlY29uZCBoYWxmIHNvbWUgb2YgdGhlIGRhdGEgbWFuYWdlbWVudCBjaGFsbGVuZ2VzIGluIGFuIGluZHVzdHJpYWwtc2NhbGUgcmVjb21tZW5kYXRpb24gc3lzdGVtLg0KDQoNCiMjIERpc2N1c3Npb24NCg0KRm9yIHRoaXMgd2VlaydzIGRpc2N1c3Npb24sIEkgZm91bmQgYW4gYXJ0aWNsZSBvbiBNZWRpdW0gYnkgVmljdG9yIHdoaWNoIGRpc2N1c3NlcyBBTFMgSW1wbGljaXQgQ29sbGFib3JhdGl2ZSBGaWx0ZXJpbmcuIEkndmUgaGlnaGxpZ2h0ZWQgYSBmZXcgaXRlbXMgb2YgaW50ZXJlc3QgZnJvbSB0aGUgYXJ0aWNsZSBhYm91dCBpbXBsaWNpdCBkYXRhIHJlY29tbWVuZGF0aW9uIHN5c3RlbXMsIGZlYXR1cmUgcmVkdWN0aW9uLCBhbmQgYWx0ZXJuYXRpbmcgbGVhc3Qgc3F1YXJlcy4gDQoNCg0KDQojIyBFeHBsaWNpdCB2cy4gSW1wbGljaXQgDQoNCg0KRXhwbGljaXQgZGF0YSByZWNvbW1lbmRlciBzeXN0ZW1zIHJlbHkgb24gYSB1c2VyJ3MgZGlyZWN0IGlucHV0IG9yIHJhdGluZyBvZiBhIHByb2R1Y3Qgd2hlcmVhcyBJbXBsaWNpdCBkYXRhIHJlY29tbWVuZGVyIHN5c3RlbXMgcmVseSBvbiBhIHVzZXIncyBhY3Rpdml0eSBsZXZlbCB3aXRoIHRoZSBwcm9kdWN0IG9yIHNlcnZpY2UuICBIb3cgbWFueSB0aW1lIGhhdmUgdGhleSBsaXN0ZW5lZCB0byBhIHNvbmc/IEhvdyBtYW55IHJlcGVhdCBwdXJjaGFzZXMgb2YgYSBwcm9kdWN0LiAgSW1wbGljaXQgZGF0YSBpcyBmYXIgZWFzaWVyIHRvIGdhdGhlciB0aGFuIEV4cGxpY2l0IGRhdGEuDQoNCg0KIyMgRmVhdHVyZSBSZWR1Y3Rpb24NCg0KQSB1c2VyIG1hdHJpeCBtYXkgaW5jbHVkZSBodW5kcmVkcyBvZiBkaWZmZXJlbnQgZmVhdHVyZXMuIEZvciBleGFtcGxlLCB1c2VycyBtYXkgbGlzdGVuIHRvIG11c2ljIHdpdGggdHJ1bXBldHMgaW4gaXQgb3IgYSBzYXhvcGhvbmUgc29sbyBvciBhIGphenogdHJhY2suICBPdXIgZ29hbCBoZXJlIGlzIHRvIHVzZSBtYXRyaXggZmFjdG9yaXphdGlvbiB0byByZWR1Y2UgdGhlIG51bWJlciBvZiBkaW1lbnNpb25zLiAiSWYgd2UgY2FuIGV4cHJlc3MgZWFjaCB1c2VyIGFzIGEgdmVjdG9yIG9mIHRoZWlyIHRhc3RlIHZhbHVlcywgYW5kIGF0IHRoZSBzYW1lIHRpbWUgZXhwcmVzcyBlYWNoIGl0ZW0gYXMgYSB2ZWN0b3Igb2Ygd2hhdCB0YXN0ZXMgdGhleSByZXByZXNlbnQuIFlvdSBjYW4gc2VlIHdlIGNhbiBxdWl0ZSBlYXNpbHkgbWFrZSBhIHJlY29tbWVuZGF0aW9uLiIxDQoNCg0KIyMgQUxTDQoNClRoZSBhbHRlcm5hdGluZyBsZWFzdCBzcXVhcmVzIChBTFMpIGFsZ29yaXRobSBmYWN0b3JpemVzIGEgZ2l2ZW4gbWF0cml4IFIgaW50byB0d28gZmFjdG9ycyBVIGFuZCBWIHN1Y2ggdGhhdCBSPz8/VVRWLiBUaGUgdW5rbm93biByb3cgZGltZW5zaW9uIGlzIGdpdmVuIGFzIGEgcGFyYW1ldGVyIHRvIHRoZSBhbGdvcml0aG0gYW5kIGlzIGNhbGxlZCBsYXRlbnQgZmFjdG9ycy4yDQoNCg0KIyMgUmVmZXJlbmNlcw0KDQoxLiBbQUxTIEltcGxpY2l0IENvbGxhYm9yYXRpdmUgRmlsdGVyaW5nIGJ5IFZpY3RvciBBdWcuIDIzLCAyMDE3XShodHRwczovL21lZGl1bS5jb20vcmFkb24tZGV2L2Fscy1pbXBsaWNpdC1jb2xsYWJvcmF0aXZlLWZpbHRlcmluZy01ZWQ2NTNiYTM5ZmUpDQoNCjIuIFtBbHRlcm5hdGluZyBMZWFzdCBTcXVhcmVzXShodHRwczovL2NpLmFwYWNoZS5vcmcvcHJvamVjdHMvZmxpbmsvZmxpbmstZG9jcy1yZWxlYXNlLTEuMi9kZXYvbGlicy9tbC9hbHMuaHRtbCM6fjp0ZXh0PURlc2NyaXB0aW9uLGFuZCUyMGlzJTIwY2FsbGVkJTIwbGF0ZW50JTIwZmFjdG9ycy4p