For this project I chose to utilize the the lecture-provided dataset that have been mean-centered normalized. I attempted, I believe somewhat successfully, to implement a user-user collaborative filtering approach as well as an item-item collaborative filtering approach.
# Create normalized data frame using provided mean-centered values
itemdf <- data.frame()
matt <- c(.75,NA,-1.25,NA,-1.25,1.75)
mauricio <- c(.5,NA,-.5,-.5,.5,NA)
param <- c(.5,.5,-2.5,NA,NA,1.5)
shipra <- c(NA,NA,0,1,NA,-1)
itemdf <- data.frame(matt,mauricio,param,shipra)
obsnames <- c("CaptainAmerica","Deadpool","Frozen","JungleBook","PitchPerfect2","StarWarsForce")
rownames(itemdf) <- obsnames
itemdf
## matt mauricio param shipra
## CaptainAmerica 0.75 0.5 0.5 NA
## Deadpool NA NA 0.5 NA
## Frozen -1.25 -0.5 -2.5 0
## JungleBook NA -0.5 NA 1
## PitchPerfect2 -1.25 0.5 NA NA
## StarWarsForce 1.75 NA 1.5 -1
I utilized an implementation approach for consine similarity noted in the following article: http://www.salemmarafi.com/code/collaborative-filtering-r/
The following function is used by both the user-user and item-item cf approaches
# I found it necessary to handle NA values, and explicitly create the crossproduct here. Not the most efficient way to go, recommenderlab and lsa do a great job of handling this for you
getCosine <- function(x,y){
this.cosine <- sum(x*y, na.rm = TRUE)/(sqrt(sum(x*x, na.rm = TRUE) * sum(y*y, na.rm = TRUE)))
return(this.cosine)
}
The following illustrates the user-user cf:
# placeholder result matrix
userusercf <- matrix(NA, nrow=4, ncol=4)
for(i in 1:ncol(itemdf)){
for(j in 1:ncol(itemdf)){
userusercf[i,j] <- getCosine(itemdf[i],itemdf[j])
}
}
colnames(userusercf) <- c('matt','mauricio','param','shipra')
rownames(userusercf) <- colnames(userusercf)
userusercf
## matt mauricio param shipra
## matt 1.0000000 0.1443376 0.7858379 -0.4762897
## mauricio 0.1443376 1.0000000 0.5000000 -0.3535534
## param 0.7858379 0.5000000 1.0000000 -0.3535534
## shipra -0.4762897 -0.3535534 -0.3535534 1.0000000
# plot
heatmap(data.matrix(userusercf))
The following shows the item-item cf, which reuses the generalized getCosine function:
item2 <- as.data.frame(t(itemdf))
result2 <- matrix(NA, nrow=6, ncol=6)
for(i in 1:ncol(item2)){
for(j in 1:ncol(item2)){
result2[i,j] <- getCosine(item2[i],item2[j])
}
}
colnames(result2) <- obsnames
rownames(result2) <- obsnames
result2
## CaptainAmerica Deadpool Frozen JungleBook
## CaptainAmerica 1.0000000 0.4850713 -0.83280877 -0.21693046
## Deadpool 0.4850713 1.0000000 -0.88045091 0.00000000
## Frozen -0.8328088 -0.8804509 1.00000000 0.07874992
## JungleBook -0.2169305 0.0000000 0.07874992 1.00000000
## PitchPerfect2 -0.4954151 0.0000000 0.34334082 -0.16609096
## StarWarsForce 0.7963955 0.5970223 -0.83227733 -0.35599533
## PitchPerfect2 StarWarsForce
## CaptainAmerica -0.4954151 0.7963955
## Deadpool 0.0000000 0.5970223
## Frozen 0.3433408 -0.8322773
## JungleBook -0.1660910 -0.3559953
## PitchPerfect2 1.0000000 -0.6467082
## StarWarsForce -0.6467082 1.0000000
# plot
heatmap(data.matrix(result2))
The net results of both user-user and item-item cf approaches are consistent with the values provided with the lecture solution data set.