Param Singh

For this project I chose to utilize the the lecture-provided dataset that have been mean-centered normalized. I attempted, I believe somewhat successfully, to implement a user-user collaborative filtering approach as well as an item-item collaborative filtering approach.

# Create normalized data frame using provided mean-centered values
itemdf <- data.frame()
matt <- c(.75,NA,-1.25,NA,-1.25,1.75)
mauricio <- c(.5,NA,-.5,-.5,.5,NA)
param <- c(.5,.5,-2.5,NA,NA,1.5)
shipra <- c(NA,NA,0,1,NA,-1)
itemdf <- data.frame(matt,mauricio,param,shipra)
obsnames <- c("CaptainAmerica","Deadpool","Frozen","JungleBook","PitchPerfect2","StarWarsForce")
rownames(itemdf) <- obsnames
itemdf

##                 matt mauricio param shipra
## CaptainAmerica  0.75      0.5   0.5     NA
## Deadpool          NA       NA   0.5     NA
## Frozen         -1.25     -0.5  -2.5      0
## JungleBook        NA     -0.5    NA      1
## PitchPerfect2  -1.25      0.5    NA     NA
## StarWarsForce   1.75       NA   1.5     -1

I utilized an implementation approach for consine similarity noted in the following article: http://www.salemmarafi.com/code/collaborative-filtering-r/
The following function is used by both the user-user and item-item cf approaches

# I found it necessary to handle NA values, and explicitly create the crossproduct here. Not the most efficient way to go, recommenderlab and lsa do a great job of handling this for you
getCosine <- function(x,y){
  this.cosine <- sum(x*y, na.rm = TRUE)/(sqrt(sum(x*x, na.rm = TRUE) * sum(y*y, na.rm = TRUE)))
  return(this.cosine)
}

The following illustrates the user-user cf:

# placeholder result matrix
userusercf <- matrix(NA, nrow=4, ncol=4)
for(i in 1:ncol(itemdf)){
  for(j in 1:ncol(itemdf)){
    userusercf[i,j] <- getCosine(itemdf[i],itemdf[j])
  }
}
colnames(userusercf) <- c('matt','mauricio','param','shipra')
rownames(userusercf) <- colnames(userusercf)
userusercf

##                matt   mauricio      param     shipra
## matt      1.0000000  0.1443376  0.7858379 -0.4762897
## mauricio  0.1443376  1.0000000  0.5000000 -0.3535534
## param     0.7858379  0.5000000  1.0000000 -0.3535534
## shipra   -0.4762897 -0.3535534 -0.3535534  1.0000000

# plot
heatmap(data.matrix(userusercf))

The following shows the item-item cf, which reuses the generalized getCosine function:

item2 <- as.data.frame(t(itemdf))
result2 <- matrix(NA, nrow=6, ncol=6)
for(i in 1:ncol(item2)){
  for(j in 1:ncol(item2)){
    result2[i,j] <- getCosine(item2[i],item2[j])
  }
}
colnames(result2) <- obsnames
rownames(result2) <- obsnames
result2

##                CaptainAmerica   Deadpool      Frozen  JungleBook
## CaptainAmerica      1.0000000  0.4850713 -0.83280877 -0.21693046
## Deadpool            0.4850713  1.0000000 -0.88045091  0.00000000
## Frozen             -0.8328088 -0.8804509  1.00000000  0.07874992
## JungleBook         -0.2169305  0.0000000  0.07874992  1.00000000
## PitchPerfect2      -0.4954151  0.0000000  0.34334082 -0.16609096
## StarWarsForce       0.7963955  0.5970223 -0.83227733 -0.35599533
##                PitchPerfect2 StarWarsForce
## CaptainAmerica    -0.4954151     0.7963955
## Deadpool           0.0000000     0.5970223
## Frozen             0.3433408    -0.8322773
## JungleBook        -0.1660910    -0.3559953
## PitchPerfect2      1.0000000    -0.6467082
## StarWarsForce     -0.6467082     1.0000000

# plot
heatmap(data.matrix(result2))

The net results of both user-user and item-item cf approaches are consistent with the values provided with the lecture solution data set.

Param Singh - Project2

June 28, 2016