For this project the recommender system that I have built is an item-item collaborative filtering system that includes the additional contextual element of “popularity”, or per my updated assessment including as a “bias”.
My initial take on “popularity” was determined by the number of observations per unique user. This turned out to be inaccurate after further research and analysis given the fact that multiple sets of observations do not necessarily mean a certain item, in this specific example a joke, is more popular than others - it only means it is more popular within the vector space of that given user.
While this dataset may be classified as “toy”, the implementation can be leverage for other sets of data – e.g. subscription or sales data to determine which are the most popular services and products. The audience for this is broad, but the immediate impact is with Marketing or Sales. The dataset that I choose to use for this project is the Jester dataset where columns represent different jokes, and rows represent observations by different users. These are captured at various frequencies and will serve as the source of this additional context.
require(recommenderlab)
# Jester data contains user ratings over 100 jokes
jesterdf <- read.csv('jester-data-1.csv')
# drop the first column which indicates the user and their corresponding ratings by joke
jesterdf[,1] <- NULL
# set '99' values, unrated values, to NA to express the true sparseness of the dataset
jesterdf[jesterdf==99] <- NA
# set column names as sequence of jokes from 1-100
colnames <- seq(1:100)
colnames(jesterdf) <- colnames
# Convert to matrix
R <- as.matrix(jesterdf)
r <- as(R, "realRatingMatrix")
# Recommenderlab cosine similarity using item-item cf
simjester <- similarity(r[1:100,],method="cosine", which="items")
simjester <- as.matrix(simjester)
head(simjester)
## 1 2 3 4 5 6 7
## 1 0.0000000 0.4395643 0.4249140 0.3727608 0.2558314 0.1405758 0.3475624
## 2 0.4395643 0.0000000 0.2864470 0.3497866 0.1878873 0.1998267 0.3336379
## 3 0.4249140 0.2864470 0.0000000 0.5247078 0.2860770 0.1845423 0.2996709
## 4 0.3727608 0.3497866 0.5247078 0.0000000 0.3064756 0.2655054 0.2899915
## 5 0.2558314 0.1878873 0.2860770 0.3064756 0.0000000 0.1607973 0.1218406
## 6 0.1405758 0.1998267 0.1845423 0.2655054 0.1607973 0.0000000 0.1321282
## 8 9 10 11 12 13
## 1 0.40644398 0.33205506 0.34727011 0.4269052 0.2831891 0.08257706
## 2 0.24448036 0.27231250 0.16798512 0.3611745 0.2656873 0.39555597
## 3 0.17917971 0.34154414 0.31375016 0.6060165 0.2985383 0.11749435
## 4 0.14844152 0.47620406 0.22758891 0.3854993 0.1528981 0.18986393
## 5 0.35675808 0.18179176 0.24846694 0.2381203 0.1580268 0.17903892
## 6 -0.06501615 0.04532291 0.03876957 0.2740760 0.3760787 -0.02516843
## 14 15 16 17 18 19
## 1 0.2857586 0.01289400 -0.03953702 0.12366778 -0.02994520 0.13734865
## 2 0.3511557 0.04377305 -0.08610570 0.02257109 -0.02904188 0.10576472
## 3 0.3276654 0.12499783 0.07193521 0.11352312 0.05436360 0.13047187
## 4 0.2206366 0.16512569 0.17870407 0.14961327 0.07799061 0.05426031
## 5 0.2895551 0.05977706 0.02274650 0.23433193 0.22873900 0.10835878
## 6 0.3973681 -0.31616385 -0.18765631 -0.02947378 0.08322846 0.35955349
## 20 21 22 23 24 25 26
## 1 0.11606097 0.2210128 0.2482934 0.2599006 0.16210053 0.4048808 0.2986862
## 2 0.11461843 0.3088328 0.2548945 0.3707202 0.03826959 0.3294228 0.2570957
## 3 0.14806698 0.2451156 0.3155065 0.3481317 0.17594039 0.5068059 0.3982167
## 4 0.18292135 0.1640771 0.2961456 0.2193293 0.49687969 0.4175197 0.2884400
## 5 0.22941384 0.2109126 0.1636257 0.2886296 0.16404395 0.1889658 0.2366035
## 6 0.09595459 0.4662852 0.2334745 0.2142393 0.10543768 0.3255688 0.3630010
## 27 28 29 30 31 32
## 1 0.12683472 0.19981773 0.27375831 0.2292546 0.2606423 0.18041813
## 2 0.30515702 0.06838973 0.33131422 0.3152154 0.3111069 0.11392642
## 3 0.20066362 0.21903917 0.34248093 0.2759196 0.1841701 0.16327904
## 4 0.01173081 0.22072033 0.03745812 0.4279334 0.1542700 0.05040128
## 5 0.23098300 0.23879201 0.02613059 0.1166838 0.1998264 0.11726045
## 6 0.39467846 0.38272300 0.20608795 0.4617996 0.5038228 0.28917730
## 33 34 35 36 37 38
## 1 0.29325128 0.19253712 0.2128451 0.17528977 0.3760118 0.3899436
## 2 0.36645562 0.20122689 0.3049941 0.20085579 0.4026337 0.3559577
## 3 0.47709841 0.17936166 0.3044496 0.22395073 0.4364265 0.4402060
## 4 0.33630609 0.17056791 0.1391569 -0.02559309 0.6506767 0.3188761
## 5 0.09090592 0.02838061 0.1784218 0.16863152 0.2043580 0.2873129
## 6 0.08394651 0.26315642 0.4262442 0.32590646 0.1763755 0.3058636
## 39 40 41 42 43 44
## 1 0.09453386 0.5441656 0.45216461 0.02554126 0.268413785 0.02736405
## 2 0.17468833 0.3919674 0.41824684 0.15193438 -0.004024159 0.22985128
## 3 0.17285544 0.5919134 0.36165841 0.07703658 0.193420271 0.28790565
## 4 0.18260868 0.4601645 0.42743487 0.09195502 0.367132780 0.45879147
## 5 0.36933484 0.3257997 0.19731037 0.13942454 0.403343799 0.16527443
## 6 0.57760069 0.1526484 0.06109139 0.44258597 0.147383607 -0.02943206
## 45 46 47 48 49 50
## 1 0.1625393 0.2597376 0.15924750 0.3010008 0.17703014 0.11646299
## 2 0.1897275 0.2779266 0.16736304 0.2670573 0.16469678 0.21653487
## 3 0.2830701 0.2935667 0.09837036 0.3644929 0.09451148 0.22305423
## 4 0.1879673 0.1340365 0.19014108 0.3077853 0.04462822 -0.04350821
## 5 0.1354051 0.2973150 0.15407386 0.2657065 0.16731833 0.02089771
## 6 0.2939979 0.2817491 0.55793000 0.3451230 0.44064491 0.32668096
## 51 52 53 54 55 56
## 1 0.3134830 0.2170295 0.0818743987 0.2189972 0.2946886 0.27338008
## 2 0.2914467 0.3531921 0.1629546657 0.2525394 0.1099182 0.30132271
## 3 0.4530512 0.4657236 0.0862863191 0.1401217 0.1695407 0.19371951
## 4 0.5057642 0.4822371 0.0002037858 0.1674058 0.2924497 0.08281139
## 5 0.2339323 0.2017145 0.0388014708 0.2511922 0.0846692 0.19585495
## 6 0.2541159 0.4687612 0.4518692966 0.2572189 0.3349934 0.37878770
## 57 58 59 60 61 62
## 1 0.14626342 0.04164413 0.1509995 0.40128770 0.3093822 0.22905094
## 2 0.29022855 0.19329812 0.2301673 0.25815658 0.2694100 0.22933454
## 3 0.25449781 0.19271670 0.2418131 0.45581364 0.1744140 0.05202851
## 4 0.62339587 0.57619340 0.2989209 0.43448909 0.0570256 0.01666584
## 5 0.10845809 0.22080214 0.1734944 0.14532625 0.2572304 0.14985245
## 6 0.08098882 0.11025491 0.3822097 0.06871624 0.4642782 0.40682182
## 63 64 65 66 67 68 69
## 1 0.2985771 0.28994530 0.3398867 0.2086863 0.4115091 0.10704452 0.16055010
## 2 0.2516777 0.30899047 0.2706617 0.1765883 0.1685053 0.13786008 0.10467567
## 3 0.1862233 0.27059213 0.4053198 0.1567393 0.2593603 0.09551333 0.10537713
## 4 0.2604404 0.46957827 0.2626264 0.1979441 0.5058753 0.09517487 0.05018873
## 5 0.0997726 0.17997809 0.2327499 0.1892915 0.1314505 0.19837197 0.03461529
## 6 0.3691181 0.01281013 0.3243530 0.4518212 0.3113032 0.50262643 0.44025461
## 70 71 72 73 74 75
## 1 0.1978082 -0.04525169 0.163433046 0.10576745 0.2114968 -0.03032197
## 2 0.2101386 0.07433689 0.179936490 0.16414152 0.1888246 -0.13758150
## 3 0.3071758 0.03464057 -0.004967610 0.09316952 0.2925640 0.09861653
## 4 0.3249120 0.09367013 0.001110791 0.13847712 0.3357485 0.25339948
## 5 0.0481126 0.17667442 0.052040950 0.14334627 0.2509061 0.11269178
## 6 0.4307216 0.17277061 0.285405238 0.18047785 0.1131694 -0.00271214
## 76 77 78 79 80 81
## 1 0.23195895 0.07667396 0.03323488 0.02104895 -0.01787629 0.4458802
## 2 0.02414987 0.24648903 0.10129019 0.11099796 0.11750118 0.1001534
## 3 0.37545023 -0.11391475 0.06170825 -0.04594455 -0.11289195 0.3884740
## 4 0.32582805 0.08454319 0.11514861 0.06206323 0.07117609 0.3416116
## 5 0.13813701 0.16241818 -0.06730685 0.05102862 0.17058804 0.2311745
## 6 0.27085098 0.23179076 0.33666903 0.16684135 0.33366799 0.2302327
## 82 83 84 85 86 87
## 1 0.16360731 0.2693224 0.18906989 0.16701265 0.37221959 0.02690194
## 2 -0.04989387 0.2694905 0.04632873 0.16644727 0.07038068 -0.04154067
## 3 0.10819431 0.1306376 0.06193234 0.06001095 0.12851688 0.09781543
## 4 0.23270034 0.2625194 0.24472692 0.11198194 0.19349913 0.09394957
## 5 -0.03305863 0.3270794 0.10662290 0.19738218 0.11003719 0.09792109
## 6 0.35147616 0.2613340 0.40026671 0.22136663 0.08756471 0.36336141
## 88 89 90 91 92 93
## 1 0.23525454 0.12517598 -0.01875062 0.24093068 0.003185035 0.230227603
## 2 0.11012968 0.09252558 0.11498090 0.20436111 0.115461476 0.153656620
## 3 0.12022548 0.08327957 0.09947107 0.03224014 0.052803767 -0.055363540
## 4 -0.05217219 -0.11251237 0.28923929 -0.13819448 0.064785890 -0.173953728
## 5 -0.01349688 0.06201759 0.22994716 0.06387780 0.025897510 -0.006095602
## 6 0.31635519 0.22677687 0.23375409 0.16025132 0.519361897 0.348347915
## 94 95 96 97 98 99
## 1 0.31475117 0.1814723 -0.099482559 0.40786880 0.06666792 0.14069068
## 2 0.09388688 0.2169415 -0.104498210 0.24165438 0.04511819 0.19975144
## 3 -0.03709201 0.1699300 0.003613705 0.12284761 0.11730475 0.04118937
## 4 0.23370115 0.1176007 -0.027966041 -0.01465989 0.09398252 0.07980947
## 5 0.27260665 0.2413827 -0.038614510 0.26861235 0.07159713 0.19005827
## 6 0.16109969 0.2942596 0.471269408 0.01251777 0.06541690 0.23891101
## 100
## 1 0.046733950
## 2 0.301178225
## 3 0.255774101
## 4 0.237323533
## 5 0.008345025
## 6 0.212482029
The code and results above represent a standard approach to item-item cosine similarity. It does not take into account any item biases, which may be rooted in the sparseness of the source data.
To address adding this context, I took the approach of “post-processing” the item-item similarity results to add an item bias. The inspiration for this approach can be found here: https://www.ics.uci.edu/~welling/teaching/CS77Bwinter12/presentations/course_Ricci/13-Item-to-Item-Matrix-CF.pdf
Bias is calculated, in this specific case, by adding the mean of the overall rating, to a specific item’s bias (calculated by subtracting the mean of that item’s ratings against the overall rating), and adding that to the “original” rating value for that recommendation.
# Calculate the overall average rating returned from the item-item CF
mu <- mean(simjester)
# Biased rating =
# the overall average rating + the bias value for a given item (joke) + the original cf rating
addbias <- function(x){
biasjoke <- mean(x)
return (mu + biasjoke + x)
}
# Apply against rating recommendation
biasedcf <- apply(simjester, 2, addbias)
head(biasedcf)
## 1 2 3 4 5 6 7
## 1 0.3997880 0.8292184 0.8259842 0.7791076 0.6146656 0.5854921 0.7034263
## 2 0.8393523 0.3896541 0.6875171 0.7561333 0.5467215 0.6447430 0.6895018
## 3 0.8247020 0.6761011 0.4010702 0.9310546 0.6449112 0.6294586 0.6555348
## 4 0.7725487 0.7394407 0.9257780 0.4063468 0.6653098 0.7104217 0.6458554
## 5 0.6556193 0.5775414 0.6871472 0.7128223 0.3588342 0.6057136 0.4777045
## 6 0.5403637 0.5894808 0.5856125 0.6718522 0.5196316 0.4449163 0.4879921
## 8 9 10 11 12 13 14
## 1 0.6974659 0.7031238 0.7432828 0.8801314 0.7560569 0.3832134 0.7829817
## 2 0.5355022 0.6433813 0.5639978 0.8144007 0.7385551 0.6961923 0.8483788
## 3 0.4702016 0.7126129 0.7097628 1.0592427 0.7714061 0.4181306 0.8248885
## 4 0.4394634 0.8472728 0.6236016 0.8387255 0.6257659 0.4905002 0.7178597
## 5 0.6477800 0.5528605 0.6444796 0.6913465 0.6308946 0.4796752 0.7867782
## 6 0.2260057 0.4163917 0.4347822 0.7273022 0.8489465 0.2754679 0.8945913
## 15 16 17 18 19 20 21
## 1 0.1935617 0.08914754 0.3733988 0.2599572 0.5002753 0.4445197 0.6726410
## 2 0.2244407 0.04257885 0.2723021 0.2608605 0.4686914 0.4430772 0.7604610
## 3 0.3056655 0.20061977 0.3632541 0.3442660 0.4933985 0.4765257 0.6967437
## 4 0.3457933 0.30738862 0.3993443 0.3678930 0.4171870 0.5113801 0.6157052
## 5 0.2404447 0.15143105 0.4840629 0.5186414 0.4712854 0.5578726 0.6625407
## 6 -0.1354962 -0.05897176 0.2202572 0.3731309 0.7224801 0.4244133 0.9179133
## 22 23 24 25 26 27 28
## 1 0.6737974 0.7120839 0.5178486 0.8313906 0.7621944 0.5641136 0.6297076
## 2 0.6803985 0.8229036 0.3940177 0.7559326 0.7206039 0.7424359 0.4982796
## 3 0.7410105 0.8003150 0.5316885 0.9333157 0.8617249 0.6379425 0.6489290
## 4 0.7216495 0.6715126 0.8526278 0.8440295 0.7519482 0.4490097 0.6506102
## 5 0.5891297 0.7408129 0.5197921 0.6154756 0.7001117 0.6682619 0.6686818
## 6 0.6589784 0.6664226 0.4611858 0.7520787 0.8265092 0.8319573 0.8126128
## 29 30 31 32 33 34 35
## 1 0.6620469 0.6827970 0.7396790 0.5327705 0.7022169 0.5775355 0.6688026
## 2 0.7196028 0.7687578 0.7901436 0.4662788 0.7754212 0.5862253 0.7609516
## 3 0.7307695 0.7294619 0.6632068 0.5156314 0.8860640 0.5643601 0.7604071
## 4 0.4257467 0.8814758 0.6333066 0.4027537 0.7452717 0.5555663 0.5951144
## 5 0.4144192 0.5702261 0.6788631 0.4696128 0.4998715 0.4133790 0.6343793
## 6 0.5943765 0.9153419 0.9828595 0.6415297 0.4929121 0.6481548 0.8822017
## 36 37 38 39 40 41 42
## 1 0.5824075 0.7983185 0.8482447 0.5406927 0.9938437 0.8776984 0.4434588
## 2 0.6079735 0.8249404 0.8142587 0.6208472 0.8416455 0.8437806 0.5698520
## 3 0.6310684 0.8587332 0.8985070 0.6190143 1.0415915 0.7871922 0.4949542
## 4 0.3815246 1.0729835 0.7771772 0.6287676 0.9098426 0.8529686 0.5098726
## 5 0.5757492 0.6266647 0.7456140 0.8154937 0.7754778 0.6228441 0.5573421
## 6 0.7330242 0.5986822 0.7641646 1.0237596 0.6023264 0.4866251 0.8605035
## 43 44 45 46 47 48 49
## 1 0.6410100 0.3447546 0.5938242 0.6424343 0.6249081 0.7269965 0.5938424
## 2 0.3685720 0.5472419 0.6210124 0.6606232 0.6330236 0.6930530 0.5815091
## 3 0.5660165 0.6052962 0.7143550 0.6762633 0.5640309 0.7904886 0.5113238
## 4 0.7397290 0.7761821 0.6192522 0.5167332 0.6558016 0.7337810 0.4614405
## 5 0.7759400 0.4826650 0.5666900 0.6800116 0.6197344 0.6917022 0.5841306
## 6 0.5199798 0.2879585 0.7252828 0.6644458 1.0235906 0.7711186 0.8574572
## 50 51 52 53 54 55 56
## 1 0.4852381 0.7310282 0.6929911 0.5113929 0.6356848 0.6969639 0.6916589
## 2 0.5853100 0.7089919 0.8291536 0.5924732 0.6692271 0.5121936 0.7196016
## 3 0.5918294 0.8705965 0.9416851 0.5158048 0.5568094 0.5718161 0.6119984
## 4 0.3252669 0.9233095 0.9581987 0.4297223 0.5840935 0.6947250 0.5010902
## 5 0.3896728 0.6514776 0.6776761 0.4683200 0.6678799 0.4869446 0.6141338
## 6 0.6954561 0.6716611 0.9447227 0.8813878 0.6739066 0.7372688 0.7970666
## 57 58 59 60 61 62 63
## 1 0.4949690 0.3357947 0.5586818 0.7885616 0.7507186 0.6505538 0.6799003
## 2 0.6389342 0.4874487 0.6378496 0.6454305 0.7107464 0.6508374 0.6330010
## 3 0.6032034 0.4868672 0.6494955 0.8430876 0.6157503 0.4735314 0.5675466
## 4 0.9721015 0.8703440 0.7066032 0.8217630 0.4983619 0.4381687 0.6417637
## 5 0.4571637 0.5149527 0.5811767 0.5326002 0.6985667 0.5713553 0.4810959
## 6 0.4296944 0.4044055 0.7898920 0.4559902 0.9056145 0.8283247 0.7504414
## 64 65 66 67 68 69 70
## 1 0.6700788 0.7981338 0.6438846 0.8432428 0.5012333 0.5834664 0.6565034
## 2 0.6891240 0.7289087 0.6117867 0.6002390 0.5320489 0.5275920 0.6688337
## 3 0.6507256 0.8635668 0.5919377 0.6910941 0.4897021 0.5282934 0.7658709
## 4 0.8497118 0.7208734 0.6331425 0.9376091 0.4893636 0.4731050 0.7836072
## 5 0.5601116 0.6909969 0.6244899 0.5631842 0.5925607 0.4575316 0.5068077
## 6 0.3929436 0.7826000 0.8870196 0.7430369 0.8968152 0.8631709 0.8894168
## 71 72 73 74 75 76 77
## 1 0.2672764 0.5761672 0.4810021 0.5832801 0.2416117 0.6402964 0.4428509
## 2 0.3868649 0.5926706 0.5393761 0.5606079 0.1343522 0.4324873 0.6126660
## 3 0.3471686 0.4077665 0.4684041 0.6643473 0.3705502 0.7837877 0.2522622
## 4 0.4061982 0.4138449 0.5137117 0.7075318 0.5253332 0.7341655 0.4507201
## 5 0.4892025 0.4647751 0.5185809 0.6226894 0.3846255 0.5464744 0.5285951
## 6 0.4852987 0.6981394 0.5557125 0.4849527 0.2692215 0.6791884 0.5979677
## 78 79 80 81 82 83 84
## 1 0.4363756 0.3935318 0.3681778 0.8547861 0.5387875 0.7084985 0.6088596
## 2 0.5044309 0.4834808 0.5035553 0.5090593 0.3252863 0.7086666 0.4661184
## 3 0.4648490 0.3265383 0.2731621 0.7973799 0.4833745 0.5698137 0.4817220
## 4 0.5182893 0.4345461 0.4572302 0.7505175 0.6078805 0.7016955 0.6645166
## 5 0.3358339 0.4235115 0.5566421 0.6400804 0.3421215 0.7662555 0.5264126
## 6 0.7398097 0.5393242 0.7197221 0.6391386 0.7266563 0.7005101 0.8200564
## 85 86 87 88 89 90 91
## 1 0.5987377 0.6849132 0.3733497 0.6484184 0.4747104 0.3176705 0.5974835
## 2 0.5981723 0.3830742 0.3049070 0.5232936 0.4420600 0.4514020 0.5609139
## 3 0.4917360 0.4412104 0.4442631 0.5333894 0.4328140 0.4358922 0.3887929
## 4 0.5437070 0.5061927 0.4403973 0.3609917 0.2370221 0.6256604 0.2183583
## 5 0.6291072 0.4227308 0.4443688 0.3996670 0.4115520 0.5663683 0.4204306
## 6 0.6530917 0.4002583 0.7098091 0.7295191 0.5763113 0.5701752 0.5168041
## 92 93 94 95 96 97 98
## 1 0.3969547 0.6275499 0.6732772 0.5745587 0.2392467 0.7365393 0.3788008
## 2 0.5092312 0.5509789 0.4524130 0.6100278 0.2342311 0.5703248 0.3572510
## 3 0.4465735 0.3419587 0.3214341 0.5630164 0.3423430 0.4515181 0.4294376
## 4 0.4585556 0.2233686 0.5922272 0.5106870 0.3107632 0.3140106 0.4061154
## 5 0.4196672 0.3912267 0.6311327 0.6344691 0.3001148 0.5972828 0.3837300
## 6 0.9131316 0.7456702 0.5196258 0.6873459 0.8099987 0.3411882 0.3775498
## 99 100
## 1 0.4435053 0.4122682
## 2 0.5025661 0.6667125
## 3 0.3440040 0.6213084
## 4 0.3826241 0.6028578
## 5 0.4928729 0.3738793
## 6 0.5417256 0.5780163
Now that we have both results, we can take a subset from each to compare how much impact adding the additional context of the bias had.
# subset sample of the original
sampleoriginalcf <- simjester[1:5,1:5]
heatmap(data.matrix(sampleoriginalcf))
# subset sample including the context bias
samplebiasedcf <- biasedcf[1:5,1:5]
heatmap(data.matrix(samplebiasedcf))
It can be observed that for at least these samples there was no material difference - only some variance around jokes 4 and 5. Looking across the larger matrices - the effect of adding the context is more noticeable.
# original
heatmap(data.matrix(simjester))
# context/bias
heatmap(data.matrix(biasedcf))