The goal of this assignment is give you practice working with Matrix Factorization techniques.

Your task is implement a matrix factorization method—such as singular value decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender system.

You may approach this assignment in a number of ways. You are welcome to start with an existing recommender system written by yourself or someone else. Remember as always to cite your sources, so that you can be graded on what you added, not what you found.

SVD can be thought of as a pre-processing step for feature engineering. You might easily start with thousands or millions of items, and use SVD to create a much smaller set of “k” items (e.g. 20 or 70).

Notes/Limitations: • SVD builds features that may or may not map neatly to items (such as movie genres or news topics). As in many areas of machine learning, the lack of explainability can be an issue). • SVD requires that there are no missing values. There are various ways to handle this, including (1) imputation of missing values, (2) mean-centering values around 0, or (3) using a more advance technique, such as stochastic gradient descent to simulate SVD in populating the factored matrices. • Calculating the SVD matrices can be computationally expensive, although calculating ratings once the factorization is completed is very fast. You may need to create a subset of your data for SVD calculations to be successfully performed, especially on a machine with a small RAM footprint.

This a dataset about movies and its ratings.

This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.

We will be working with only two files from the database (ratings.csv and movies.csv)

#import required dataset
movies <- read.csv("https://raw.githubusercontent.com/maharjansudhan/DATA612/master/ml-latest-small/movies.csv")
head(movies)
##   movieId                              title
## 1       1                   Toy Story (1995)
## 2       2                     Jumanji (1995)
## 3       3            Grumpier Old Men (1995)
## 4       4           Waiting to Exhale (1995)
## 5       5 Father of the Bride Part II (1995)
## 6       6                        Heat (1995)
##                                        genres
## 1 Adventure|Animation|Children|Comedy|Fantasy
## 2                  Adventure|Children|Fantasy
## 3                              Comedy|Romance
## 4                        Comedy|Drama|Romance
## 5                                      Comedy
## 6                       Action|Crime|Thriller
ratings <- read.csv("https://raw.githubusercontent.com/maharjansudhan/DATA612/master/ml-latest-small/ratings.csv")
head(ratings)
##   userId movieId rating timestamp
## 1      1       1      4 964982703
## 2      1       3      4 964981247
## 3      1       6      4 964982224
## 4      1      47      5 964983815
## 5      1      50      5 964982931
## 6      1      70      3 964982400
#import required libraries
library(recommenderlab)  
## Loading required package: Matrix
## Loading required package: arules
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
## Loading required package: proxy
## 
## Attaching package: 'proxy'
## The following object is masked from 'package:Matrix':
## 
##     as.matrix
## The following objects are masked from 'package:stats':
## 
##     as.dist, dist
## The following object is masked from 'package:base':
## 
##     as.matrix
## Loading required package: registry
## Registered S3 methods overwritten by 'registry':
##   method               from 
##   print.registry_field proxy
##   print.registry_entry proxy
library(dplyr)         
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:arules':
## 
##     intersect, recode, setdiff, setequal, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)       
## 
## Attaching package: 'tidyr'
## The following objects are masked from 'package:Matrix':
## 
##     expand, pack, unpack
library(ggplot2)       
library(ggrepel)         
library(tictoc)
# Convert then dataset to matrix
movieMatrix <- ratings %>% 
                select(-timestamp) %>% 
                spread(movieId, rating)
row.names(movieMatrix) <- movieMatrix[,1]
movieMatrix <- as.matrix(movieMatrix[-c(1)])
movieMatrix <- as(movieMatrix, "realRatingMatrix")
movieMatrix
## 610 x 9724 rating matrix of class 'realRatingMatrix' with 100836 ratings.

So, we have 610 users and 9724 movies with the ratings of 100836.

To do any kinds of modeling, we need to split the dataset in train and test. We are splitting the data below:

# Train/test split in 75-25 ratio
set.seed(123)
evaluation <- evaluationScheme(movieMatrix, method = "split", 
                         train = 0.75, given = 20, goodRating = 3)
train <- getData(evaluation, "train")
known <- getData(evaluation, "known")
unknown <- getData(evaluation, "unknown")

#Let’s compare two models UBCF and SVD

#UBCF Model

# UBCF Model - Training
model_UBCF <- Recommender(train, method = "UBCF")

#UBCF Model - Predicting
pred_UBCF <- predict(model_UBCF, newdata = known, type = "ratings")

acc_UBCF <- calcPredictionAccuracy(pred_UBCF, unknown)
acc_UBCF
##      RMSE       MSE       MAE 
## 0.9133433 0.8341961 0.7040390

#SVD Model

# SVD Model - Training
model_SVD <- Recommender(train, method = "SVD", parameter = list(k = 50))

# SVD Model - Predicting
pred_SVD <- predict(model_SVD, newdata = known, type = "ratings")

acc_SVD <- calcPredictionAccuracy(pred_SVD, unknown)
acc_SVD
##      RMSE       MSE       MAE 
## 0.9164460 0.8398733 0.7074397

We can see that value of RMSE, MSE and MAE for SVD is slightly higher than that of UBCD model.

#Input user value and predict the movies

The values that we will input c(“25”) will show the movies that he/she has rated. It will display list of movies rated by particular user, everytime we will input the value on it.

movie_rated <- as.data.frame(movieMatrix@data[c("11"), ]) 
colnames(movie_rated) <- c("Rating")
movie_rated$movieId <- as.integer(rownames(movie_rated))
movie_rated <- movie_rated %>% filter(Rating != 0) %>% 
                inner_join (movies, by="movieId") %>%
                arrange(desc(Rating)) %>%
                select(Movie = "title", Rating)
knitr::kable(movie_rated, format = "html") %>%
                kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Movie Rating
Heat (1995) 5
Braveheart (1995) 5
Apollo 13 (1995) 5
Clear and Present Danger (1994) 5
Forrest Gump (1994) 5
Fugitive, The (1993) 5
Searching for Bobby Fischer (1993) 5
Silence of the Lambs, The (1991) 5
Top Gun (1986) 5
Last of the Mohicans, The (1992) 5
Contact (1997) 5
Amistad (1997) 5
Titanic (1997) 5
As Good as It Gets (1997) 5
Saving Private Ryan (1998) 5
Dead Man Walking (1995) 4
Hackers (1995) 4
Outbreak (1995) 4
Shawshank Redemption, The (1994) 4
True Lies (1994) 4
In the Line of Fire (1993) 4
Jurassic Park (1993) 4
Program, The (1993) 4
Terminator 2: Judgment Day (1991) 4
Mission: Impossible (1996) 4
Rock, The (1996) 4
Twister (1996) 4
Independence Day (a.k.a. ID4) (1996) 4
Star Wars: Episode VI - Return of the Jedi (1983) 4
Sling Blade (1996) 4
Breakdown (1997) 4
Con Air (1997) 4
G.I. Jane (1997) 4
Conspiracy Theory (1997) 4
Money Talks (1997) 4
Air Force One (1997) 4
Good Will Hunting (1997) 4
He Got Game (1998) 4
Armageddon (1998) 4
Lethal Weapon 4 (1998) 4
There’s Something About Mary (1998) 4
GoldenEye (1995) 3
Broken Arrow (1996) 3
Batman Forever (1995) 3
Die Hard: With a Vengeance (1995) 3
Waterworld (1995) 3
Maverick (1994) 3
Speed (1994) 3
Cliffhanger (1993) 3
Hot Shots! Part Deux (1993) 3
Menace II Society (1993) 3
Days of Thunder (1990) 3
Die Hard 2 (1990) 3
Under Siege (1992) 3
Dante’s Peak (1997) 3
Face/Off (1997) 3
Peacemaker, The (1997) 3
Jackal, The (1997) 3
Jane Austen’s Mafia! (1998) 3
Mortal Kombat (1995) 2
River Wild, The (1994) 2
Godzilla (1998) 2
Lethal Weapon 3 (1992) 2
Mars Attacks! (1996) 1

let’s choose another user to see if the model works.

movie_rated <- as.data.frame(movieMatrix@data[c("50"), ]) 
colnames(movie_rated) <- c("Rating")
movie_rated$movieId <- as.integer(rownames(movie_rated))
movie_rated <- movie_rated %>% filter(Rating != 0) %>% 
                inner_join (movies, by="movieId") %>%
                arrange(desc(Rating)) %>%
                select(Movie = "title", Rating)
knitr::kable(movie_rated, format = "html") %>%
                kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Movie Rating
2001: A Space Odyssey (1968) 4.5
Lawrence of Arabia (1962) 4.5
Apocalypse Now (1979) 4.5
8 1/2 (8½) (1963) 4.5
Taxi Driver (1976) 4.0
Pulp Fiction (1994) 4.0
Blade Runner (1982) 4.0
Fargo (1996) 4.0
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964) 4.0
Godfather, The (1972) 4.0
Singin’ in the Rain (1952) 4.0
Vertigo (1958) 4.0
Citizen Kane (1941) 4.0
Monty Python and the Holy Grail (1975) 4.0
Brazil (1985) 4.0
Good, the Bad and the Ugly, The (Buono, il brutto, il cattivo, Il) (1966) 4.0
Clockwork Orange, A (1971) 4.0
Stalker (1979) 4.0
Chinatown (1974) 4.0
Shining, The (1980) 4.0
Akira (1988) 4.0
Seven Samurai (Shichinin no samurai) (1954) 4.0
Rosemary’s Baby (1968) 4.0
Jules and Jim (Jules et Jim) (1961) 4.0
Mirror, The (Zerkalo) (1975) 4.0
Conversation, The (1974) 4.0
Breathless (À bout de souffle) (1960) 4.0
That Obscure Object of Desire (Cet obscur objet du désir) (1977) 4.0
Spirited Away (Sen to Chihiro no kamikakushi) (2001) 4.0
Persona (1966) 4.0
Scenes From a Marriage (Scener ur ett äktenskap) (1973) 4.0
Wages of Fear, The (Salaire de la peur, Le) (1953) 4.0
Pierrot le fou (1965) 4.0
Fearless Vampire Killers, The (1967) 4.0
Topo, El (1970) 4.0
Holy Mountain, The (Montaña sagrada, La) (1973) 4.0
Neon Genesis Evangelion: The End of Evangelion (Shin seiki Evangelion Gekijô-ban: Air/Magokoro wo, kimi ni) (1997) 4.0
All Watched Over by Machines of Loving Grace (2011) 4.0
Shawshank Redemption, The (1994) 3.5
Silence of the Lambs, The (1991) 3.5
Ghost in the Shell (Kôkaku kidôtai) (1995) 3.5
African Queen, The (1951) 3.5
Die Hard (1988) 3.5
Monty Python’s Life of Brian (1979) 3.5
Wallace & Gromit: The Wrong Trousers (1993) 3.5
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981) 3.5
Amadeus (1984) 3.5
Annie Hall (1977) 3.5
Deer Hunter, The (1978) 3.5
Young Frankenstein (1974) 3.5
Great Dictator, The (1940) 3.5
Big Lebowski, The (1998) 3.5
Eyes Wide Shut (1999) 3.5
Conformist, The (Conformista, Il) (1970) 3.5
Goldfinger (1964) 3.5
Princess Mononoke (Mononoke-hime) (1997) 3.5
Easy Rider (1969) 3.5
Draughtsman’s Contract, The (1982) 3.5
Modern Times (1936) 3.5
Fellini Satyricon (1969) 3.5
Diary of a Chambermaid (Journal d’une femme de chambre, Le) (1964) 3.5
Santa Sangre (1989) 3.5
Burden of Dreams (1982) 3.5
Discreet Charm of the Bourgeoisie, The (Charme discret de la bourgeoisie, Le) (1972) 3.5
Triplets of Belleville, The (Les triplettes de Belleville) (2003) 3.5
Kill Bill: Vol. 1 (2003) 3.5
Funny Games (1997) 3.5
Aguirre: The Wrath of God (Aguirre, der Zorn Gottes) (1972) 3.5
Mon Oncle (My Uncle) (1958) 3.5
Kill Bill: Vol. 2 (2004) 3.5
Lola Montès (1955) 3.5
Lupin III: The Castle Of Cagliostro (Rupan sansei: Kariosutoro no shiro) (1979) 3.5
Hearts of Darkness: A Filmmakers Apocalypse (1991) 3.5
Man Who Planted Trees, The (Homme qui plantait des arbres, L’) (1987) 3.5
Trip to the Moon, A (Voyage dans la lune, Le) (1902) 3.5
Lives of Others, The (Das leben der Anderen) (2006) 3.5
Orchestra Rehearsal (Prova d’orchestra) (1978) 3.5
FLCL (2000) 3.5
Pervert’s Guide to Cinema, The (2006) 3.5
One Week (1920) 3.5
Argo (2012) 3.5
Wind Rises, The (Kaze tachinu) (2013) 3.5
Century of the Self, The (2002) 3.5
Dance of Reality, The (Danza de la realidad, La) (2013) 3.5
Victoria (2015) 3.5
Endless Poetry (2016) 3.5
HyperNormalisation (2016) 3.5
Call Me by Your Name (2017) 3.5
Band of Brothers (2001) 3.5
The Death of Louis XIV (2016) 3.5
Belladonna of Sadness (1973) 3.5
Themroc (1973) 3.5
The Darkest Minds (2018) 3.5
Toy Story (1995) 3.0
Twelve Monkeys (a.k.a. 12 Monkeys) (1995) 3.0
Die Hard: With a Vengeance (1995) 3.0
Forrest Gump (1994) 3.0
Apartment, The (1960) 3.0
My Fair Lady (1964) 3.0
Mary Poppins (1964) 3.0
Grand Day Out with Wallace and Gromit, A (1989) 3.0
Day the Earth Stood Still, The (1951) 3.0
Down by Law (1986) 3.0
Koyaanisqatsi (a.k.a. Koyaanisqatsi: Life Out of Balance) (1983) 3.0
Indiana Jones and the Last Crusade (1989) 3.0
Die Hard 2 (1990) 3.0
Batman Returns (1992) 3.0
Jaws (1975) 3.0
Mars Attacks! (1996) 3.0
Titanic (1997) 3.0
Pi (1998) 3.0
Village of the Damned (1960) 3.0
Rocky Horror Picture Show, The (1975) 3.0
Fight Club (1999) 3.0
General, The (1926) 3.0
Robin Hood (1973) 3.0
Toy Story 2 (1999) 3.0
Naked Gun: From the Files of Police Squad!, The (1988) 3.0
Requiem for a Dream (2000) 3.0
Memento (2000) 3.0
Monsters, Inc. (2001) 3.0
Party, The (1968) 3.0
Murder by Death (1976) 3.0
Finding Nemo (2003) 3.0
Pirates of the Caribbean: The Curse of the Black Pearl (2003) 3.0
Pink Panther, The (1963) 3.0
Pink Panther Strikes Again, The (1976) 3.0
Lost in Translation (2003) 3.0
Monty Python’s The Meaning of Life (1983) 3.0
Chitty Chitty Bang Bang (1968) 3.0
Jin Roh: The Wolf Brigade (Jin-Rô) (1998) 3.0
Safety Last! (1923) 3.0
Harry Potter and the Prisoner of Azkaban (2004) 3.0
Dresser, The (1983) 3.0
Murder on the Orient Express (1974) 3.0
Incredibles, The (2004) 3.0
Hair (1979) 3.0
Dog Days (Hundstage) (2001) 3.0
Old Boy (2003) 3.0
Holiday (Jour de fête) (1949) 3.0
Woyzeck (1979) 3.0
Wallace & Gromit in The Curse of the Were-Rabbit (2005) 3.0
Little Miss Sunshine (2006) 3.0
Borat: Cultural Learnings of America for Make Benefit Glorious Nation of Kazakhstan (2006) 3.0
Ratatouille (2007) 3.0
Paprika (Papurika) (2006) 3.0
Dark Knight, The (2008) 3.0
Illusionist, The (L’illusionniste) (2010) 3.0
My Afternoons with Margueritte (La tête en friche) (2010) 3.0
Contact High (2009) 3.0
Ro.Go.Pa.G. (1963) 3.0
Django Unchained (2012) 3.0
Wolf of Wall Street, The (2013) 3.0
Grand Budapest Hotel, The (2014) 3.0
Birdman: Or (The Unexpected Virtue of Ignorance) (2014) 3.0
Old Lady and the Pigeons, The (La vieille dame et les pigeons) (1997) 3.0
La cravate (1957) 3.0
Culture High, The (2014) 3.0
Mad Max: Fury Road (2015) 3.0
Kurt Cobain: Montage of Heck (2015) 3.0
BMX Bandits (1983) 3.0
Ghost in the Shell: Stand Alone Complex - The Laughing Man (2005) 3.0
Sicario (2015) 3.0
Big Short, The (2015) 3.0
The Nice Guys (2016) 3.0
The Beguiled (2017) 3.0
Stefan Zweig: Farewell to Europe (2016) 3.0
Get Me Roger Stone (2017) 3.0
Don Camillo in Moscow (1965) 3.0
A German Life (2016) 3.0
Self-criticism of a Bourgeois Dog (2017) 3.0
Der Herr Karl (1961) 3.0
Blade Runner 2049 (2017) 3.0
Male Hunt (1964) 3.0
Ant-Man and the Wasp (2018) 3.0
Hunchback of Notre Dame, The (1996) 2.5
Matilda (1996) 2.5
Fish Called Wanda, A (1988) 2.5
Top Gun (1986) 2.5
Grease (1978) 2.5
Hercules (1997) 2.5
Men in Black (a.k.a. MIB) (1997) 2.5
Truman Show, The (1998) 2.5
Breakfast Club, The (1985) 2.5
Mask of Zorro, The (1998) 2.5
Bambi (1942) 2.5
Karate Kid, The (1984) 2.5
Matrix, The (1999) 2.5
Harry Potter and the Sorcerer’s Stone (a.k.a. Harry Potter and the Philosopher’s Stone) (2001) 2.5
Bowling for Columbine (2002) 2.5
Harry Potter and the Chamber of Secrets (2002) 2.5
Journey to the Center of the Earth (1959) 2.5
Tanguy (2001) 2.5
Quo Vadis (1951) 2.5
Batman Begins (2005) 2.5
Brainstorm (2001) 2.5
Marie Antoinette (2006) 2.5
Prestige, The (2006) 2.5
Casino Royale (2006) 2.5
Simpsons Movie, The (2007) 2.5
Into the Wild (2007) 2.5
Kung Fu Panda (2008) 2.5
Mamma Mia! (2008) 2.5
Wave, The (Welle, Die) (2008) 2.5
Slumdog Millionaire (2008) 2.5
Afro Samurai: Resurrection (2009) 2.5
Men Who Stare at Goats, The (2009) 2.5
Shutter Island (2010) 2.5
Inception (2010) 2.5
Warrior (2011) 2.5
Adventures of Tintin, The (2011) 2.5
Skyfall (2012) 2.5
Samsara (2011) 2.5
Flight (2012) 2.5
Unintentional Kidnapping of Mrs. Elfriede Ott, The (Die Unabsichtliche Entführung der Frau Elfriede Ott) (2010) 2.5
Great Gatsby, The (2013) 2.5
Rush (2013) 2.5
Dampfnudelblues (2013) 2.5
The Lego Movie (2014) 2.5
Interstellar (2014) 2.5
The Fault in Our Stars (2014) 2.5
Whiplash (2014) 2.5
The Imitation Game (2014) 2.5
The Hateful Eight (2015) 2.5
The Dark Valley (2014) 2.5
Kung Fury (2015) 2.5
Macbeth (2015) 2.5
The Walk (2015) 2.5
The Revenant (2015) 2.5
Joy (2015) 2.5
Life Eternal (2015) 2.5
Requiem for the American Dream (2015) 2.5
Zootopia (2016) 2.5
Dunkirk (2017) 2.5
The Putin Interviews (2017) 2.5
Rick and Morty: State of Georgia Vs. Denver Fenton Allen (2016) 2.5
Pinocchio (1940) 2.0
101 Dalmatians (1996) 2.0
Shrek (2001) 2.0
Ice Age (2002) 2.0
Men in Black II (a.k.a. MIIB) (a.k.a. MIB 2) (2002) 2.0
8 Mile (2002) 2.0
SpongeBob SquarePants Movie, The (2004) 2.0
Robots (2005) 2.0
Chronicles of Narnia: The Lion, the Witch and the Wardrobe, The (2005) 2.0
Cars (2006) 2.0
Pursuit of Happyness, The (2006) 2.0
Indiana Jones and the Kingdom of the Crystal Skull (2008) 2.0
Krabat (2008) 2.0
Cloudy with a Chance of Meatballs (2009) 2.0
Avatar (2009) 2.0
Toy Story 3 (2010) 2.0
Karate Kid, The (2010) 2.0
Scott Pilgrim vs. the World (2010) 2.0
127 Hours (2010) 2.0
Hobo with a Shotgun (2011) 2.0
Kung Fu Panda 2 (2011) 2.0
Dictator, The (2012) 2.0
Looper (2012) 2.0
Hotel Transylvania (2012) 2.0
Life of Pi (2012) 2.0
Pacific Rim (2013) 2.0
Captain America: The Winter Soldier (2014) 2.0
The Theory of Everything (2014) 2.0
Fantastic Beasts and Where to Find Them (2016) 2.0
Black Mass (2015) 2.0
Spectre (2015) 2.0
The Jungle Book (2016) 2.0
Everest (2015) 2.0
Hotel Transylvania 2 (2015) 2.0
Hail, Caesar! (2016) 2.0
Kung Fu Panda 3 (2016) 2.0
Charlie and the Chocolate Factory (2005) 1.5
Ice Age 2: The Meltdown (2006) 1.5
I Am Legend (2007) 1.5
Johnny English Reborn (2011) 1.5
Muppets, The (2011) 1.5
Prometheus (2012) 1.5
Despicable Me 2 (2013) 1.5
Gravity (2013) 1.5
Fuck You, Goethe (Fack Ju Göhte) (2013) 1.5
Jurassic World (2015) 1.5
Deadpool (2016) 1.5
Batman v Superman: Dawn of Justice (2016) 1.5
Knock Knock (2015) 1.5
Assassin’s Creed (2016) 1.5
Kong: Skull Island (2017) 1.5
The Hitman’s Bodyguard (2017) 1.5
Surf Nazis Must Die (1987) 1.0
Day After Tomorrow, The (2004) 1.0
High School Musical (2006) 1.0
Hangover, The (2009) 1.0
Ice Age: Dawn of the Dinosaurs (2009) 1.0
2012 (2009) 1.0
Woman in Love (Rubbeldiekatz) (2011) 1.0
This Means War (2012) 1.0
Ice Age 4: Continental Drift (2012) 1.0
Good Day to Die Hard, A (2013) 1.0
Schlussmacher (2013) 1.0
Guardians of the Galaxy 2 (2017) 1.0
Minions (2015) 1.0
Sharknado 3: Oh Hell No! (2015) 1.0
F*ck You, Goethe 2 (2015) 1.0
Er ist wieder da (2015) 1.0
The Huntsman Winter’s War (2016) 1.0
Ice Age: Collision Course (2016) 1.0
The Fate of the Furious (2017) 1.0
Kingsman: The Golden Circle (2017) 1.0
Justice League (2017) 0.5
Death Note (2017) 0.5

Lastly, we can say that both these models are working pretty good. It doesn’t take that long to load the model because it is a smaller data. I would work on bigger dataset in future may be just to see if there is any difference. I think SVD prediction looks better than UBCF model. The RMSE, MSE, MAE numbers are higher which means the probabilty rate is good. So we can conclude that SVD is good for predictions.