In this project we will use the Ratings dataset. We will be using Single Value Decomposition(SVD) matrix factorization method to estimate similarity and to create a content based recommender system. Factorizing matrix allows us to discover the most descriptive dimensions for predicting movie preferences. We can identify the first few most important dimensions from a matrix decomposition and explore the movies location in this new space.

Singular value decomposition (SVD )

The singular value decomposition (SVD) is a factorization of a real or complex matrix that generalizes the eigendecomposition of a square normal matrix to any M X N. we will factorize our matrix of M users and N jokes into the product of three matrices:

\[U = M X M \]

\[Σ = M X N\] \[V^T = N X N\]

\[A = U \ \Sigma \ V^T\]

Our movie matrix contains 610 users and 9,724 items/movies.

ratings <- read.csv("ratings.csv") 
movies <- read.csv("movies.csv")

m_matrix <- ratings %>%
  select(-timestamp) %>%
  spread(movieId, rating)

row.names(m_matrix) <- m_matrix[,1]

m_matrix <- m_matrix[-c(1)]
m_matrix <- as(as.matrix(m_matrix), "realRatingMatrix")

m_matrix
## 610 x 9724 rating matrix of class 'realRatingMatrix' with 100836 ratings.

We will split our dataset into train and test

set.seed(100)
eval <- evaluationScheme(m_matrix, method = "split",
                         train = 0.8, given= 20, goodRating=3)
train <- getData(eval, "train")
known <- getData(eval, "known")
unknown <- getData(eval, "unknown")

User Collaborative Filtering

First, we will build a user collaborative filtering model.

tic("UBCF  - Training")
model_ubcf <- Recommender(train, method = "UBCF")
toc(log = TRUE, quiet = TRUE)

tic("UBCF - Predicting")
predict_ubcf <- predict(model_ubcf, newdata = known, type = "ratings")
toc(log = TRUE, quiet = TRUE)

( accuracy_ubcf <- calcPredictionAccuracy(predict_ubcf, unknown) )
##      RMSE       MSE       MAE 
## 0.9237453 0.8533054 0.7062559

Singular Value Decomposition (SVD ) Model

We will build a SVD Model and compare this model with UBCF Model.

tic("SVD Model - Training")
model_svd <- Recommender(train, method = "SVD", parameter = list(k = 50))
toc(log = TRUE, quiet = TRUE)

tic("SVD Model - Predicting")
predict_svd <- predict(model_svd, newdata = known, type = "ratings")
toc(log = TRUE, quiet = TRUE)

( accuracy_Svd <- calcPredictionAccuracy(predict_svd, unknown) )
##      RMSE       MSE       MAE 
## 0.9290035 0.8630474 0.7110872

RMSE for this model is very similar to the UBCF model.

Runtime

Based on the table below we see that

UBCF takes less time to build the model, but requires more time making predictions.

SVD takes more time to build the model, but predictions are fast.

log <- as.data.frame(unlist(tic.log(format = TRUE)))
colnames(log) <- c("Runtime")
knitr::kable(log, format = "html") %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Runtime
UBCF - Training: 0.02 sec elapsed
UBCF - Predicting: 3.7 sec elapsed
SVD Model - Training: 1.81 sec elapsed
SVD Model - Predicting: 0.5 sec elapsed

Evaluation

Lets evaluate our predictions and look at the predcition for the first user.

movie_ratings <- as.data.frame(m_matrix@data[c("1"), ]) 
colnames(movie_ratings) <- c("Rating")
movie_ratings$movieId <- as.integer(rownames(movie_ratings))
movie_ratings <- movie_ratings %>% filter(Rating != 0) %>% 
  inner_join (movies, by="movieId") %>%
  arrange(Rating) %>%
  select(Movie = "title", Rating)
knitr::kable(movie_ratings, format = "html") %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Movie Rating
Talented Mr. Ripley, The (1999) 1
Psycho (1960) 2
Toys (1992) 2
I Still Know What You Did Last Summer (1998) 2
Psycho (1998) 2
Mummy, The (1999) 2
From Dusk Till Dawn (1996) 3
Clerks (1994) 3
Pulp Fiction (1994) 3
Stargate (1994) 3
Blown Away (1994) 3
Mrs. Doubtfire (1993) 3
Mission: Impossible (1996) 3
Space Jam (1996) 3
Twister (1996) 3
Independence Day (a.k.a. ID4) (1996) 3
Escape to Witch Mountain (1975) 3
Pete’s Dragon (1977) 3
Shining, The (1980) 3
Batman Returns (1992) 3
Sneakers (1992) 3
Last of the Mohicans, The (1992) 3
McHale’s Navy (1997) 3
Men in Black (a.k.a. MIB) (1997) 3
I Know What You Did Last Summer (1997) 3
Starship Troopers (1997) 3
Return to Oz (1985) 3
Young Sherlock Holmes (1985) 3
Logan’s Run (1976) 3
Rocky Horror Picture Show, The (1975) 3
Encino Man (1992) 3
Sister Act (1992) 3
Toy Story (1995) 4
Grumpier Old Men (1995) 4
Heat (1995) 4
Braveheart (1995) 4
Ed Wood (1994) 4
Clear and Present Danger (1994) 4
Forrest Gump (1994) 4
Mask, The (1994) 4
Dazed and Confused (1993) 4
Jurassic Park (1993) 4
So I Married an Axe Murderer (1993) 4
Three Musketeers, The (1993) 4
Dances with Wolves (1990) 4
Batman (1989) 4
Silence of the Lambs, The (1991) 4
Rock, The (1996) 4
She’s the One (1996) 4
Ghost and Mrs. Muir, The (1947) 4
That Thing You Do! (1996) 4
Swingers (1996) 4
Platoon (1986) 4
Abyss, The (1989) 4
Apocalypse Now (1979) 4
Alien (1979) 4
Groundhog Day (1993) 4
Nosferatu (Nosferatu, eine Symphonie des Grauens) (1922) 4
Best Men (1997) 4
Grosse Pointe Blank (1997) 4
Con Air (1997) 4
Kiss the Girls (1997) 4
Wedding Singer, The (1998) 4
Welcome to Woop-Woop (1997) 4
Wild Things (1998) 4
Small Soldiers (1998) 4
Labyrinth (1986) 4
Lethal Weapon (1987) 4
Back to the Future Part III (1990) 4
Saving Private Ryan (1998) 4
Flight of the Navigator (1986) 4
Honey, I Shrunk the Kids (1989) 4
Sleeping Beauty (1959) 4
Song of the South (1946) 4
Tron (1982) 4
Legend (1985) 4
Beetlejuice (1988) 4
Willow (1988) 4
Few Good Men, A (1992) 4
Rush Hour (1998) 4
King Kong (1933) 4
Romancing the Stone (1984) 4
Howard the Duck (1986) 4
¡Three Amigos! (1986) 4
20 Dates (1998) 4
Dick Tracy (1990) 4
Star Wars: Episode I - The Phantom Menace (1999) 4
Superman (1978) 4
Dracula (1931) 4
Frankenstein (1931) 4
Big (1988) 4
13th Warrior, The (1999) 4
Total Recall (1990) 4
RoboCop (1987) 4
Being John Malkovich (1999) 4
Longest Day, The (1962) 4
Easy Rider (1969) 4
Teenage Mutant Ninja Turtles II: The Secret of the Ooze (1991) 4
Teenage Mutant Ninja Turtles III (1993) 4
Ladyhawke (1985) 4
Hook (1991) 4
Predator (1987) 4
Road Trip (2000) 4
Man with the Golden Gun, The (1974) 4
Big Trouble in Little China (1986) 4
Shaft (2000) 4
What About Bob? (1991) 4
Transformers: The Movie (1986) 4
Seven (a.k.a. Se7en) (1995) 5
Usual Suspects, The (1995) 5
Bottle Rocket (1996) 5
Rob Roy (1995) 5
Canadian Bacon (1995) 5
Desperado (1995) 5
Billy Madison (1995) 5
Dumb & Dumber (Dumb and Dumber) (1994) 5
Star Wars: Episode IV - A New Hope (1977) 5
Tommy Boy (1995) 5
Jungle Book, The (1994) 5
Fugitive, The (1993) 5
Schindler’s List (1993) 5
Tombstone (1993) 5
Pinocchio (1940) 5
Fargo (1996) 5
James and the Giant Peach (1996) 5
Wizard of Oz, The (1939) 5
Citizen Kane (1941) 5
Adventures of Robin Hood, The (1938) 5
Mr. Smith Goes to Washington (1939) 5
Winnie the Pooh and the Blustery Day (1968) 5
Three Caballeros, The (1945) 5
Sword in the Stone, The (1963) 5
Dumbo (1941) 5
Bedknobs and Broomsticks (1971) 5
Alice in Wonderland (1951) 5
Ghost and the Darkness, The (1996) 5
Willy Wonka & the Chocolate Factory (1971) 5
Monty Python’s Life of Brian (1979) 5
Reservoir Dogs (1992) 5
Basic Instinct (1992) 5
E.T. the Extra-Terrestrial (1982) 5
Monty Python and the Holy Grail (1975) 5
Star Wars: Episode V - The Empire Strikes Back (1980) 5
Princess Bride, The (1987) 5
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981) 5
Clockwork Orange, A (1971) 5
Star Wars: Episode VI - Return of the Jedi (1983) 5
Goodfellas (1990) 5
Blues Brothers, The (1980) 5
Full Metal Jacket (1987) 5
Henry V (1989) 5
Quiet Man, The (1952) 5
Terminator, The (1984) 5
Duck Soup (1933) 5
Back to the Future (1985) 5
Highlander (1986) 5
Young Frankenstein (1974) 5
Fantasia (1940) 5
Indiana Jones and the Last Crusade (1989) 5
Pink Floyd: The Wall (1982) 5
Austin Powers: International Man of Mystery (1997) 5
Face/Off (1997) 5
Conan the Barbarian (1982) 5
L.A. Confidential (1997) 5
Game, The (1997) 5
Big Lebowski, The (1998) 5
Newton Boys, The (1998) 5
All Quiet on the Western Front (1930) 5
Rocky (1976) 5
Goonies, The (1985) 5
Bambi (1942) 5
Black Cauldron, The (1985) 5
Great Mouse Detective, The (1986) 5
Negotiator, The (1998) 5
Jungle Book, The (1967) 5
Rescuers, The (1977) 5
Rocketeer, The (1991) 5
Indiana Jones and the Temple of Doom (1984) 5
Lord of the Rings, The (1978) 5
Charlotte’s Web (1973) 5
Secret of NIMH, The (1982) 5
American Tail, An (1986) 5
NeverEnding Story, The (1984) 5
Edward Scissorhands (1990) 5
American History X (1998) 5
Enemy of the State (1998) 5
Very Bad Things (1998) 5
Rushmore (1998) 5
Thin Red Line, The (1998) 5
Texas Chainsaw Massacre, The (1974) 5
Crocodile Dundee (1986) 5
Office Space (1999) 5
Planet of the Apes (1968) 5
Lock, Stock & Two Smoking Barrels (1998) 5
Matrix, The (1999) 5
Go (1999) 5
SLC Punk! (1998) 5
Superman II (1980) 5
Wolf Man, The (1941) 5
Run Lola Run (Lola rennt) (1998) 5
South Park: Bigger, Longer and Uncut (1999) 5
Ghostbusters (a.k.a. Ghost Busters) (1984) 5
Iron Giant, The (1999) 5
American Beauty (1999) 5
Excalibur (1981) 5
Gulliver’s Travels (1939) 5
Dirty Dozen, The (1967) 5
Goldfinger (1964) 5
From Russia with Love (1963) 5
Dr. No (1962) 5
Fight Club (1999) 5
Who Framed Roger Rabbit? (1988) 5
Live and Let Die (1973) 5
Thunderball (1965) 5
Spaceballs (1987) 5
Robin Hood (1973) 5
Dogma (1999) 5
Messenger: The Story of Joan of Arc, The (1999) 5
Green Mile, The (1999) 5
Wayne’s World (1992) 5
Scream 3 (2000) 5
JFK (1991) 5
Red Dawn (1984) 5
Good Morning, Vietnam (1987) 5
Grumpy Old Men (1993) 5
Gladiator (2000) 5
Blazing Saddles (1974) 5
Mad Max (1979) 5
Road Warrior, The (Mad Max 2) (1981) 5
Shaft (1971) 5
X-Men (2000) 5
MAS*H (a.k.a. MASH) (1970) 5

Movies predicted by SVD model for the first user

movie_ratings <- as.data.frame(predict_svd@data[c(1), ]) 
colnames(movie_ratings) <- c("Rating")
movie_ratings$movieId <- as.integer(rownames(movie_ratings))
movie_ratings <- movie_ratings %>% arrange(desc(Rating)) %>% head(6) %>% 
  inner_join (movies, by="movieId") %>%
  select(Movie = "title")
knitr::kable(movie_ratings, format = "html") %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Movie
Pocahontas (1995)
Lord of the Rings: The Fellowship of the Ring, The (2001)
Harry Potter and the Prisoner of Azkaban (2004)
Harry Potter and the Chamber of Secrets (2002)
Star Wars: Episode VI - Return of the Jedi (1983)
Men in Black (a.k.a. MIB) (1997)

Singular Value Decomposition(Manual)

movieMatrix <- as.matrix(normalize(m_matrix)@data)


movieSVD <- svd(movieMatrix)
rownames(movieSVD$u) <- rownames(movieMatrix)
rownames(movieSVD$v) <- colnames(movieMatrix)

As we have seen earlier, our data has 610 users. In order to be usable we need to reduce number of dimensions/concepts by setting some singular values in the diagonal matrix Σ to 0. Based on Leskovec (Mining of Massive Datasets, 2014, p. 424), we will retain enough singular values to make up 90% of the energy of Σ.

n <- length(movieSVD$d)
total_energy <- sum(movieSVD$d^2)
for (i in (n-1):1) {
  energy <- sum(movieSVD$d[1:i]^2)
  if (energy/total_energy<0.9) {
    n_dims <- i+1
    break
  }
}

trim_mov_D <- movieSVD$d[1:n_dims]
trim_mov_U <- movieSVD$u[, 1:n_dims]
trim_mov_V <- movieSVD$v[, 1:n_dims]

Our original matrix contianed ratings for 610 users. After svd the diagonal matrix Σ , we have 251 dimensions/concepts.First concepts with singular values 76.2 and 43.6.

head(trim_mov_D)
## [1] 76.20047 43.62240 41.77917 39.37051 37.95619 36.54896

Lets look at 5 movies with highest and lowest values .

display_num <- 5

movies_df <- as.data.frame(trim_mov_V) %>% select(V1, V2)
movies_df$movieId <- as.integer(rownames(movies_df))

movie_samples <- movies_df %>% arrange(V1) %>% head(display_num)
movie_samples <- rbind(movie_samples, movies_df %>% arrange(desc(V1)) %>% head(display_num))
movie_samples <- rbind(movie_samples, movies_df %>% arrange(V2) %>% head(display_num))
movie_samples <- rbind(movie_samples, movies_df %>% arrange(desc(V2)) %>% head(display_num))
movie_samples <- movie_samples %>% inner_join(movies, by = "movieId") %>% 
  select(Movie = "title", Concept1 = "V1", Concept2 = "V2")
movie_samples$Concept1 <- round(movie_samples$Concept1, 10)
movie_samples$Concept2 <- round(movie_samples$Concept2, 10)

knitr::kable(movie_samples, format = "html") %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Movie Concept1 Concept2
Pulp Fiction (1994) -0.1352741 -0.0097276
Star Wars: Episode IV - A New Hope (1977) -0.1182052 0.0267932
Star Wars: Episode V - The Empire Strikes Back (1980) -0.1156339 0.0233453
Godfather, The (1972) -0.1093219 -0.0117634
Fight Club (1999) -0.1067558 -0.0064984
Batman & Robin (1997) 0.0598986 -0.0060177
Batman Forever (1995) 0.0584163 0.0255171
Wild Wild West (1999) 0.0580364 0.0242007
Hollow Man (2000) 0.0563563 0.0003223
Nutty Professor, The (1996) 0.0541703 0.0156627
Charlie’s Angels: Full Throttle (2003) 0.0365662 -0.0589041
Transformers: Dark of the Moon (2011) 0.0175475 -0.0536945
Battlefield Earth (2000) 0.0337085 -0.0524135
Schindler’s List (1993) -0.0757228 -0.0522486
Shawshank Redemption, The (1994) -0.1056603 -0.0500263
Cannonball Run, The (1981) 0.0060784 0.0630204
Naked Gun: From the Files of Police Squad!, The (1988) -0.0103541 0.0588366
Blazing Saddles (1974) -0.0308651 0.0585352
Ace Ventura: Pet Detective (1994) 0.0255363 0.0585069
Beverly Hills Cop (1984) -0.0089759 0.0563334

Summary

Collaborative Filtering:

• CF is static and avoids the problems caused by dynamic user preference.

• Not scalabile. The worst case complexity is O(mn).

Singular Value Decomposition:

• SVD redcuces diminsions by extracting latent factors.

• scalable and no sparsity problems. Explenation of recommended items are not obivious.

Reference

Introduction to Recommender System

Wikipedia