DATA 612 Project 2 | Content-Based and Collaborative Filtering

For assignment 2, start with an existing dataset of user-item ratings, such as our toy books dataset, MovieLens, Jester [http://eigentaste.berkeley.edu/dataset/] or another dataset of your choosing. Implement at least two of these recommendation algorithms: . Content-Based Filtering . User-User Collaborative Filtering . Item-Item Collaborative Filtering

Data: Jester Online Joke Recommender System (Dataset 1)

About:

  • Data from 24,983 users who have rated 36 or more jokes, a matrix with dimensions 24983 X 101
  • Ratings are real values ranging from -10.00 to +10.00 (the value “99” corresponds to “null” = “not rated”).
  • One row per user
  • The first column gives the number of jokes rated by that user. The next 100 columns give the ratings for jokes 01 - 100.

Jester Dataset:

## [1] "Raw.Df:   c(24983, 101)"

Data Filtering

  • The dataset is very large (25,983, 101). I would like to remove some data to make it more managable
  • To make the dataset more managable, subset by the number of jokes rated by a user, column 0. Randomly selected 80 jokes rated.
  • Remove all the ALL null columns (unrated joke)
  • Remove the columns where the joke was rated by all users
  • The final subset dimensions are more managable with 145 rows (reviewers) & 25 jokes
## [1] "Subset.1 (Num Jokes Filter):   c(145, 101)"
## [1] "Subset.2(Null Eval):   c(145, 25)"

EDA

  • Evaluate how many users ranked each joke (display with a histogram)
  • Evaluate the average ranking of each joke (display with histogram)
  • I visualize the whole matrix of ratings by building a heatmap image() whose colors represent the ratings. Y-axis = selected users (50), X-axis = Jokes (25)
  • Create a matrix from the subset2 dataset for later use
  • Create a realRatingMatrix
  • Find the count of occurrences per individual ranking (Regular and Noramlized)

Joke Histograms

  • Count of rating per joke (Num. Joke Ranked)
  • Mean joke rating (Average Joke Ranking - Joke)
  • Mean Joke rating per User (Average joke Ranking - User)

Joke Heat Map

  • rating vs users

Joke Occurrences

  • In the submatrix, count of a specific rating and how many times the rating occurred
  • normalization of the submatrix

Modeling

  • Split the dataframe into train/test dataframe at 80/20%
  • Insert train/test data into the item-based model & the user-based model
  • print out predicted jokes
  • Evaluate
  • Roc Curve
## [1] "Train_df:   c(117, 25)"
## [1] "Test_df:   c(28, 25)"

Item-based model

## Recommender of type 'IBCF' for 'realRatingMatrix' 
## learned using 117 users.
## Recommendations as 'topNList' with n = 10 for 28 users.
##  [1] 23  1 19 13 10 21 14 12  5  8
##  [1] "94" "72" "90" "84" "81" "92" "85" "83" "76" "79"
JokeNumber JokeText
94 Two atoms are walking down the street when one atom says to the other “Oh, my! I’ve lost an electron!”The second atom says“Are you sure”The first replies “I’m positive!”
72 On the first day of college, the Dean addressed the students,pointing out some of the rules:“The female dormitory will be out-of-bounds for all male studentsand the male dormitory to the female students. Anybody caught breakingthis rule will be finded $20 the first time.” He continued, “Anybody caught breaking this rule the second time will be fined $60. Being caughta third time will cost you a fine of $180. Are there any questions ?”At this point, a male student in the crowd inquired:“How much for a season pass ?”
90 Q: How many programmers does it take to change a lightbulb?A: NONE! That’s a hardware problem….
84 Q: What is the difference between Mechanical Engineers and Civil Engineers? A: Mechanical Engineers build weapons, Civil Engineers build targets.
81 An Asian man goes into a New York CityBank to exchange 10,000 yen forAmerican Currency. The teller gives him $72.00. The next month theAsian man goes into the same bank with 10,000 yen and receives $62.00.He asks, “How come? Only $62.00?” The teller says “Fluctuations-Fluctuations!”Whereupon the Asian man looks back at the teller and says “Fluk youAmelicans too!”
92 Early one morning a mother went to her sleeping son and woke him up.“Wake up, son. It’s time to go to school.” “But why, Mama? I don’t want to go to school.” “Give me two reasons why you don’t want to go to school.” “One, all the children hate me. Two, all the teachers hate me,” “Oh! that’s no reason. Come on, you have to go to school,” “Give me two good reasons WHY I should go to school?” “One, you are fifty-two years old. Two, you are the principal of the school.”
85 Q: How many Presidents does it take to screw in a light bulb?A: It depends upon your definition of screwing a light bulb.
83 What a woman says:“This place is a mess! C’mon,You and I need to clean up,Your stuff is lying on the floor andyou’ll have no clothes to wear,if we don’t do laundry right now!”What a man hears:blah, blah, blah, blah, C’monblah, blah, blah, blah, you and Iblah, blah, blah, blah, on the floorblah, blah, blah, blah, no clothesblah, blah, blah, blah, RIGHT NOW!
76 There once was a man and a woman that both got in a terrible car wreck. Both of their vehicles were completely destroyed, buy fortunately, no one was hurt. In thankfulness, the woman said to the man, ‘We are both okay, so we should celebrate. I have a bottle of wine in my car, let’s open it.’ So the woman got the bottleout of the car, and handed it to the man. The man took a really big drink, and handed the woman the bottle. The woman closed the bottle and put it down. The man asked, ‘Aren’t you going to take a drink?’ The woman cleverly replied, ‘No, I think I’ll just wait for the cops to get here.’
79 Q: Ever wonder why the IRS calls it Form 1040?A: Because for every $50 that you earn, you get 10 and they get 40.

User-based model

## Available parameter (with default values):
## method    =  cosine
## nn    =  25
## sample    =  FALSE
## normalize     =  center
## verbose   =  FALSE
## Recommender of type 'UBCF' for 'realRatingMatrix' 
## learned using 117 users.
## Recommendations as 'topNList' with n = 10 for 28 users.
##  [1] "94" "92" "81" "82" "77" "85" "74" "79" "83" "91"
JokeNumber JokeText
94 Two atoms are walking down the street when one atom says to the other “Oh, my! I’ve lost an electron!”The second atom says“Are you sure”The first replies “I’m positive!”
92 Early one morning a mother went to her sleeping son and woke him up.“Wake up, son. It’s time to go to school.” “But why, Mama? I don’t want to go to school.” “Give me two reasons why you don’t want to go to school.” “One, all the children hate me. Two, all the teachers hate me,” “Oh! that’s no reason. Come on, you have to go to school,” “Give me two good reasons WHY I should go to school?” “One, you are fifty-two years old. Two, you are the principal of the school.”
81 An Asian man goes into a New York CityBank to exchange 10,000 yen forAmerican Currency. The teller gives him $72.00. The next month theAsian man goes into the same bank with 10,000 yen and receives $62.00.He asks, “How come? Only $62.00?” The teller says “Fluctuations-Fluctuations!”Whereupon the Asian man looks back at the teller and says “Fluk youAmelicans too!”
82 Q: How do you keep a computer programmer in the shower all day long?A: Give them a shampoo with a label that says“rinse, lather, repeat”.
77 If pro- is the opposite of con- then congress must be the oppositeof progress.
85 Q: How many Presidents does it take to screw in a light bulb?A: It depends upon your definition of screwing a light bulb.
74 Q: How many stalkers does it take to change a light bulb?A: Two. One to replace the bulb, and the other to watch it day and night.
79 Q: Ever wonder why the IRS calls it Form 1040?A: Because for every $50 that you earn, you get 10 and they get 40.
83 What a woman says:“This place is a mess! C’mon,You and I need to clean up,Your stuff is lying on the floor andyou’ll have no clothes to wear,if we don’t do laundry right now!”What a man hears:blah, blah, blah, blah, C’monblah, blah, blah, blah, you and Iblah, blah, blah, blah, on the floorblah, blah, blah, blah, no clothesblah, blah, blah, blah, RIGHT NOW!
91 A Panda bear walks into a bar. Sits down at a table and orders a beer and a double cheeseburger. After he is finished eating, he pulls out a gunand rips the place with gunfire. Patrons scatter and dive under chairs andtables as the bear runs out the door. After ensuring that no one is hurt, the bartender races out the door, and calls after the bear “What the hell didyou do that for?” The bear calls back, “I’m a Panda bear. Look it up in thedictionary.” The bartender returns, pulls out his dictionary.panda : “da, n. (Zo[“o]l.)A small Asiatic mammal (Ailurus fulgens) having fine soft fur.It is related to the bears, and inhabits the mountains of Northern India.Eats shoots and leaves.

Evaluation

## Evaluation scheme using all-but-1 items
## Method: 'cross-validation' with 10 run(s).
## Good ratings: >=0.544000
## Data set: 145 x 25 rating matrix of class 'realRatingMatrix' with 725 ratings.
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.03sec] 
##   2  [0sec/0.01sec] 
##   3  [0sec/0.02sec] 
##   4  [0sec/0.04sec] 
##   5  [0sec/0.05sec] 
##   6  [0sec/0.02sec] 
##   7  [0sec/0.02sec] 
##   8  [0sec/0.03sec] 
##   9  [0sec/0.03sec] 
##   10  [0sec/0.01sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.01sec] 
##   2  [0sec/0.02sec] 
##   3  [0sec/0.02sec] 
##   4  [0sec/0.02sec] 
##   5  [0.02sec/0sec] 
##   6  [0sec/0.02sec] 
##   7  [0sec/0sec] 
##   8  [0.02sec/0sec] 
##   9  [0sec/0.01sec] 
##   10  [0sec/0sec]
## List of evaluation results for 2 recommenders:
## Evaluation results for 10 folds/samples using method 'UBCF'.
## Evaluation results for 10 folds/samples using method 'IBCF'.