library(tidyverse)
library(readr)
library(sqldf)
library(dplyr)
library(tidyr)
library(tinytex)
library(recommenderlab)
library(kableExtra)
library(gridExtra)

Final Project Goal


The goal for the final project is for you to build out a recommender system using a large dataset (ex: 1M+ ratings or 10k+ users, 10k+ items. There are three deliverables, with separate dates:
[1] Planning Document Find an interesting dataset and describe the system you plan to build out. If you would like to use one of the datasets you have already worked with, you should add a unique element or incorporate additional data. (i.e. explicit features you scrape from another source, like image analysis on movie posters). The overall goal, however, will be to produce quality recommendations by extracting insights from a large dataset. You may do so using Spark, or another distributed computing method, OR by effectively applying one of the more advanced mathematical techniques we have covered. There is no preference for one over the other, as long as the recommender works!
[2] Implementation. In this final project deliverable, you’ll build out the system that you describe in the planning document.

Final Project Dataset

The dataset that is used for this final project is the the books dataset by zygmuntz. This dataset contains 6 million ratings for ten thousand of the most popular books. Information included in this dataset include books marked to be read by the users, metadata for books (author, year, etc.) and tags/shelves/genres. The dataset for this project complies with the requirement for having enough users and items to develop a robust recommender system. The link to the dataset can be found on href = https://github.com/zygmuntz/goodbooks-10k> github here.

Data Exploration

To start off, data exploration was done on the books dataset. A few histograms are presented below which suggest a few things. Book rating distributions are skewed left with most individuals giving high ratings of 4 and 5. Most books do not receive ranks of 1. Additionally, The scrollable table below provides a list of the top 1000 most rated books with The Hunger Games and Harry Potter leading the pack. The subsequent bar plot shows the top 15 rated books.

data_package <- data(package = "recommenderlab")

update_geom_defaults('bar',list(color = 'blue', fill = 'skyblue', alpha = .5))
update_geom_defaults('col',list(color = 'blue', fill = 'skyblue', alpha = .5))
BookData<-read_csv(url("https://raw.githubusercontent.com/zygmuntz/goodbooks-10k/master/ratings.csv"))

BookDataMeta<-read_csv(url("https://raw.githubusercontent.com/zygmuntz/goodbooks-10k/master/books.csv"))

recommender_models <- recommenderRegistry$get_entries(dataType = 
                                                        "realRatingMatrix")


plottheme<-
  theme(panel.background = element_blank(),
        panel.grid = element_blank())

BookData %>% 
  ggplot()+
  geom_histogram(mapping = aes(x = rating), alpha = 0.5, color = "blue", fill = 'skyblue')+
  plottheme

BookDataMatrix<-BookData %>% 
  pivot_wider(names_from = book_id,values_from = rating)

rownames(BookDataMatrix)<-BookDataMatrix$user_id
## Warning: Setting row names on a tibble is deprecated.
BookDataMatrix<- BookDataMatrix %>% select(-'user_id')

?arrange
left_join(BookData,BookDataMeta[c('book_id','authors','title','language_code')]) %>% 
  group_by(title)%>% 
  tally() %>% 
  top_n(1000) %>% rename(Number_of_Ratings = n) %>% 
  arrange(desc(Number_of_Ratings)) %>% 
  kable() %>% 
  kable_styling(
    bootstrap_options = c("hover", "condensed", "responsive"),
    full_width = F) %>% 
  scroll_box(height = "300px")
title Number_of_Ratings
The Hunger Games (The Hunger Games, #1) 22806
Harry Potter and the Sorcerer’s Stone (Harry Potter, #1) 21850
To Kill a Mockingbird 19088
Twilight (Twilight, #1) 16931
The Great Gatsby 16604
Catching Fire (The Hunger Games, #2) 16549
Mockingjay (The Hunger Games, #3) 15953
Harry Potter and the Prisoner of Azkaban (Harry Potter, #3) 15855
Harry Potter and the Chamber of Secrets (Harry Potter, #2) 15657
The Hobbit 15558
Harry Potter and the Goblet of Fire (Harry Potter, #4) 15523
Harry Potter and the Deathly Hallows (Harry Potter, #7) 15304
Harry Potter and the Order of the Phoenix (Harry Potter, #5) 15258
Harry Potter and the Half-Blood Prince (Harry Potter, #6) 15081
1984 14693
The Catcher in the Rye 14472
The Girl with the Dragon Tattoo (Millennium, #1) 14382
Animal Farm 14328
Lord of the Flies 13556
Angels & Demons (Robert Langdon, #1) 13451
Pride and Prejudice 13445
The Lion, the Witch, and the Wardrobe (Chronicles of Narnia, #1) 13089
The Da Vinci Code (Robert Langdon, #2) 13072
The Help 12727
The Kite Runner 12698
The Fellowship of the Ring (The Lord of the Rings, #1) 12530
Gone Girl 12105
The Diary of a Young Girl 12101
Of Mice and Men 11921
Divergent (Divergent, #1) 11780
The Lovely Bones 11677
Romeo and Juliet 11578
Memoirs of a Geisha 11304
The Fault in Our Stars 11264
The Giver (The Giver, #1) 10949
A Game of Thrones (A Song of Ice and Fire, #1) 10692
Little Women (Little Women, #1) 10649
The Book Thief 10394
Water for Elephants 10361
Fahrenheit 451 10312
Jane Eyre 10308
Life of Pi 10175
The Time Traveler’s Wife 9977
The Hitchhiker’s Guide to the Galaxy (Hitchhiker’s Guide to the Galaxy, #1) 9960
New Moon (Twilight, #2) 9712
Eclipse (Twilight, #3) 9620
Brave New World 9612
Wuthering Heights 9584
Breaking Dawn (Twilight, #4) 9433
Charlotte’s Web 9391
The Adventures of Huckleberry Finn 9323
The Girl on the Train 9090
The Curious Incident of the Dog in the Night-Time 9035
The Alchemist 8916
Ender’s Game (Ender’s Saga, #1) 8849
The Secret Life of Bees 8383
Where the Sidewalk Ends 8297
Insurgent (Divergent, #2) 8263
The Golden Compass (His Dark Materials, #1) 8192
Slaughterhouse-Five 8179
A Wrinkle in Time (A Wrinkle in Time Quintet, #1) 8109
Eat, Pray, Love 7885
Frankenstein 7727
Fifty Shades of Grey (Fifty Shades, #1) 7724
Gone with the Wind 7714
The Scarlet Letter 7667
A Thousand Splendid Suns 7662
The Secret Garden 7626
The Lightning Thief (Percy Jackson and the Olympians, #1) 7606
The Girl Who Played with Fire (Millennium, #2) 7563
The Handmaid’s Tale 7543
The Grapes of Wrath 7458
The Adventures of Tom Sawyer 7214
My Sister’s Keeper 7192
Sense and Sensibility 7110
Eragon (The Inheritance Cycle, #1) 7040
The Giving Tree 7002
Catch-22 6944
Hamlet 6899
The Shining (The Shining #1) 6861
A Clash of Kings (A Song of Ice and Fire, #2) 6854
The Little Prince 6848
The Road 6779
The Notebook (The Notebook, #1) 6717
The Girl Who Kicked the Hornet’s Nest (Millennium, #3) 6702
Dracula 6615
Jurassic Park (Jurassic Park, #1) 6601
Wicked: The Life and Times of the Wicked Witch of the West (The Wicked Years, #1) 6598
A Tale of Two Cities 6592
Anne of Green Gables (Anne of Green Gables, #1) 6536
City of Bones (The Mortal Instruments, #1) 6504
The Two Towers (The Lord of the Rings, #2) 6478
Night (The Night Trilogy #1) 6466
Macbeth 6421
One Flew Over the Cuckoo’s Nest 6414
The Picture of Dorian Gray 6406
Where the Wild Things Are 6401
The Poisonwood Bible 6343
The Return of the King (The Lord of the Rings, #3) 6331
The Host (The Host, #1) 6313
Great Expectations 6305
The Odyssey 6301
Allegiant (Divergent, #3) 6300
The Old Man and the Sea 6273
A Storm of Swords (A Song of Ice and Fire, #3) 6237
Bridget Jones’s Diary (Bridget Jones, #1) 6229
The Joy Luck Club 6212
The Outsiders 6190
The Firm (Penguin Readers, Level 5) 6135
The Princess Bride 6102
All the Light We Cannot See 6082
Holes (Holes, #1) 6039
Charlie and the Chocolate Factory (Charlie Bucket, #1) 6034
The Perks of Being a Wallflower 5990
The Stand 5973
Me Before You (Me Before You, #1) 5897
Dune (Dune Chronicles #1) 5864
The Glass Castle 5847
A Time to Kill 5844
Interview with the Vampire (The Vampire Chronicles, #1) 5764
A Feast for Crows (A Song of Ice and Fire, #4) 5705
American Gods (American Gods, #1) 5681
The Pillars of the Earth (The Kingsbridge Series, #1) 5672
Middlesex 5615
Emma 5602
Room 5554
The Color Purple 5531
The Five People You Meet in Heaven 5495
One Hundred Years of Solitude 5492
The Count of Monte Cristo 5489
Tuesdays with Morrie 5467
The Martian 5408
The Stranger 5365
The Red Tent 5358
Bossypants 5346
Divine Secrets of the Ya-Ya Sisterhood 5326
The Lord of the Rings (The Lord of the Rings, #1-3) 5316
The Chronicles of Narnia (Chronicles of Narnia, #1-7) 5306
Angela’s Ashes (Frank McCourt, #1) 5302
Outlander (Outlander, #1) 5297
The Goldfinch 5275
Lolita 5274
The Memory Keeper’s Daughter 5257
Green Eggs and Ham 5249
Me Talk Pretty One Day 5199
The Bell Jar 5178
Girl with a Pearl Earring 5171
The Night Circus 5154
A Dance with Dragons (A Song of Ice and Fire, #5) 5086
A Clockwork Orange 5076
Anna Karenina 5073
Atonement 5052
The Maze Runner (Maze Runner, #1) 5014
A Christmas Carol 5006
Into the Wild 4921
The Guernsey Literary and Potato Peel Pie Society 4904
A Midsummer Night’s Dream 4898
Miss Peregrine’s Home for Peculiar Children (Miss Peregrine’s Peculiar Children, #1) 4868
It 4853
Flowers for Algernon 4810
Matilda 4810
The Other Boleyn Girl (The Plantagenet and Tudor Novels, #9) 4759
Watership Down (Watership Down, #1) 4753
Les Misérables 4750
The Devil Wears Prada (The Devil Wears Prada, #1) 4723
Alice’s Adventures in Wonderland & Through the Looking-Glass 4722
Looking for Alaska 4682
Crime and Punishment 4679
Freakonomics: A Rogue Economist Explores the Hidden Side of Everything (Freakonomics, #1) 4658
Siddhartha 4610
The Devil in the White City: Murder, Magic, and Madness at the Fair That Changed America 4606
Unbroken: A World War II Story of Survival, Resilience, and Redemption 4577
East of Eden 4576
In Cold Blood 4467
The Metamorphosis 4463
A Tree Grows in Brooklyn 4460
Love in the Time of Cholera 4459
Fifty Shades Darker (Fifty Shades, #2) 4452
City of Glass (The Mortal Instruments, #3) 4451
’Salem’s Lot 4433
City of Ashes (The Mortal Instruments, #2) 4415
Dead Until Dark (Sookie Stackhouse, #1) 4409
Deception Point 4405
Heart of Darkness 4377
Bridge to Terabithia 4295
Persuasion 4274
The Client 4260
Paper Towns 4247
Ready Player One 4239
The Gunslinger (The Dark Tower, #1) 4229
Atlas Shrugged 4227
Rebecca 4220
The Lost Symbol (Robert Langdon, #3) 4213
Misery 4199
Fifty Shades Freed (Fifty Shades, #3) 4185
The Sisterhood of the Traveling Pants (Sisterhood, #1) 4180
The Sun Also Rises 4161
Carrie 4118
The Voyage of the Dawn Treader (Chronicles of Narnia, #3) 4080
The Pelican Brief 4067
The Sea of Monsters (Percy Jackson and the Olympians, #2) 4056
Digital Fortress 4041
Moby-Dick or, The Whale 4034
The Battle of the Labyrinth (Percy Jackson and the Olympians, #4) 4021
James and the Giant Peach 4017
The Ocean at the End of the Lane 4015
And Then There Were None 4011
The Husband’s Secret 4003
Where the Red Fern Grows 4000
Wild: From Lost to Found on the Pacific Crest Trail 3998
Sarah’s Key 3995
Treasure Island 3959
Dark Places 3950
The Name of the Wind (The Kingkiller Chronicle, #1) 3948
Fight Club 3917
A Light in the Attic 3916
The Art of Racing in the Rain 3889
The Last Olympian (Percy Jackson and the Olympians, #5) 3881
The Cat in the Hat 3871
A Prayer for Owen Meany 3868
Artemis Fowl (Artemis Fowl, #1) 3851
She’s Come Undone 3850
Beloved 3842
Never Let Me Go 3837
Watchmen 3825
Confessions of a Shopaholic (Shopaholic, #1) 3792
The Immortal Life of Henrietta Lacks 3777
A Walk to Remember 3760
Eleanor & Park 3756
Pet Sematary 3748
11/22/63 3737
Running with Scissors 3724
The Thorn Birds 3721
World War Z: An Oral History of the Zombie War 3720
The Titan’s Curse (Percy Jackson and the Olympians, #3) 3717
The Iliad 3715
The Tipping Point: How Little Things Can Make a Big Difference 3709
Big Little Lies 3705
The Nightingale 3666
The Time Machine 3659
The Fountainhead 3656
Number the Stars 3646
The Call of the Wild 3625
Neverwhere 3612
Inferno (Robert Langdon, #4) 3608
Uglies (Uglies, #1) 3600
On the Road 3596
Cutting for Stone 3575
The Crucible 3560
The Shack 3559
Sharp Objects 3549
Matched (Matched, #1) 3538
Cat’s Cradle 3532
Good Omens: The Nice and Accurate Prophecies of Agnes Nutter, Witch 3532
The Shadow of the Wind (The Cemetery of Forgotten Books, #1) 3531
Extremely Loud and Incredibly Close 3519
If I Stay (If I Stay, #1) 3512
Goodnight Moon 3507
The Silence of the Lambs (Hannibal Lecter, #2) 3480
Into Thin Air: A Personal Account of the Mount Everest Disaster 3479
The Subtle Knife (His Dark Materials, #2) 3479
The Magician’s Nephew (Chronicles of Narnia, #6) 3461
White Oleander 3449
Outliers: The Story of Success 3428
Oliver Twist 3421
Othello 3414
The Cuckoo’s Calling (Cormoran Strike, #1) 3413
Marley and Me: Life and Love With the World’s Worst Dog 3412
Island of the Blue Dolphins (Island of the Blue Dolphins, #1) 3409
Where’d You Go, Bernadette 3403
The Graveyard Book 3401
Orphan Train 3378
Blink: The Power of Thinking Without Thinking 3371
Clockwork Angel (The Infernal Devices, #1) 3366
One for the Money (Stephanie Plum, #1) 3365
The Light Between Oceans 3363
Prince Caspian (Chronicles of Narnia, #2) 3353
Snow Flower and the Secret Fan 3346
The Nanny Diaries (Nanny, #1) 3331
Oh, The Places You’ll Go! 3321
Like Water for Chocolate 3314
I Know This Much Is True 3312
Alice in Wonderland 3298
Three Cups of Tea: One Man’s Mission to Promote Peace … One School at a Time 3284
The Wonderful Wizard of Oz (Oz, #1) 3275
All Quiet on the Western Front 3273
The Ultimate Hitchhiker’s Guide to the Galaxy 3270
Stardust 3250
The Rosie Project (Don Tillman, #1) 3244
The World According to Garp 3233
How the Grinch Stole Christmas! 3231
Harry Potter and the Cursed Child - Parts One and Two (Harry Potter, #8) 3230
For Whom the Bell Tolls 3222
A Farewell to Arms 3204
Thirteen Reasons Why 3186
Ella Enchanted 3177
Cold Mountain 3174
The Runaway Jury 3168
Winnie-the-Pooh (Winnie-the-Pooh, #1) 3161
Bel Canto 3144
City of Fallen Angels (The Mortal Instruments, #4) 3140
The Good Earth (House of Earth, #1) 3137
Coraline 3132
The Lost Hero (The Heroes of Olympus, #1) 3128
Vampire Academy (Vampire Academy, #1) 3125
The Unbearable Lightness of Being 3120
Fried Green Tomatoes at the Whistle Stop Cafe 3113
The Bad Beginning (A Series of Unfortunate Events, #1) 3092
The Bourne Identity (Jason Bourne, #1) 3091
The Amber Spyglass (His Dark Materials, #3) 3075
Beautiful Creatures (Caster Chronicles, #1) 3067
I Know Why the Caged Bird Sings 3067
The Three Musketeers 3062
The Very Hungry Caterpillar Board Book 3057
Midnight in the Garden of Good and Evil 3037
The Scorch Trials (Maze Runner, #2) 3031
A Walk in the Woods 3029
Foundation (Foundation #1) 3026
The Wise Man’s Fear (The Kingkiller Chronicle, #2) 3026
The Andromeda Strain 3004
A Confederacy of Dunces 2999
Along Came a Spider (Alex Cross, #1) 2985
Eldest (The Inheritance Cycle, #2) 2963
Little Bee 2962
The Thirteenth Tale 2959
The Clan of the Cave Bear (Earth’s Children, #1) 2931
The Strange Case of Dr. Jekyll and Mr. Hyde 2926
And the Mountains Echoed 2925
The Eye of the World (Wheel of Time, #1) 2924
Inkheart (Inkworld, #1) 2919
The Name of the Rose 2910
The Selection (The Selection, #1) 2903
The Tell-Tale Heart and Other Writings 2897
The BFG 2889
Dress Your Family in Corduroy and Denim 2885
The Velveteen Rabbit 2882
Things Fall Apart (The African Trilogy, #1) 2878
Something Borrowed (Darcy & Rachel, #1) 2877
Do Androids Dream of Electric Sheep? 2864
The Casual Vacancy 2863
Speak 2854
Gulliver’s Travels 2848
The Lorax 2847
Still Alice 2836
A Child Called “It” (Dave Pelzer #1) 2834
The Importance of Being Earnest 2831
Their Eyes Were Watching God 2825
The Historian 2780
The Godfather 2772
The Horse and His Boy (Chronicles of Narnia, #5) 2769
Dear John 2746
City of Lost Souls (The Mortal Instruments, #5) 2744
Naked 2744
The Green Mile 2744
The Invention of Wings 2735
Cinder (The Lunar Chronicles, #1) 2734
Mansfield Park 2725
The Canterbury Tales 2723
The Witches 2713
Fear and Loathing in Las Vegas 2712
A Discovery of Witches (All Souls Trilogy, #1) 2705
Breakfast of Champions 2704
The Namesake 2700
Much Ado About Nothing 2696
The Boy in the Striped Pajamas 2694
The Hunt for Red October (Jack Ryan Universe, #4) 2689
Nineteen Minutes 2654
Robinson Crusoe 2653
Northanger Abbey 2651
Wonder 2648
Kiss the Girls (Alex Cross, #2) 2644
Beowulf 2643
The Prince of Tides 2643
Black Beauty 2638
The Paris Wife 2634
Hotel on the Corner of Bitter and Sweet 2631
Stranger in a Strange Land 2629
The Final Empire (Mistborn, #1) 2626
The No. 1 Ladies’ Detective Agency (No. 1 Ladies’ Detective Agency, #1) 2622
Speaker for the Dead (Ender’s Saga, #2) 2618
The Complete Stories and Poems 2618
The Pact 2616
Under the Dome 2609
Is Everyone Hanging Out Without Me? (And Other Concerns) 2606
Hush, Hush (Hush, Hush, #1) 2601
A Separate Peace 2599
Where the Heart Is 2592
Graceling (Graceling Realm, #1) 2588
The Phantom Tollbooth 2587
Cujo 2583
Marked (House of Night, #1) 2583
The Silver Chair (Chronicles of Narnia, #4) 2558
Station Eleven 2556
The Brief Wondrous Life of Oscar Wao 2553
Delirium (Delirium, #1) 2549
I, Robot (Robot #0.1) 2549
Under the Tuscan Sun 2545
Fallen (Fallen, #1) 2541
Snow Falling on Cedars 2533
The Secret History 2522
Tess of the D’Urbervilles 2516
The Amazing Adventures of Kavalier & Clay 2513
A Million Little Pieces 2511
What Alice Forgot 2507
We Were Liars 2503
The Screwtape Letters 2497
The Last Lecture 2488
Beautiful Disaster (Beautiful, #1) 2487
Living Dead in Dallas (Sookie Stackhouse, #2) 2485
Little House on the Prairie (Little House, #2) 2483
Red Dragon (Hannibal Lecter, #1) 2483
A Little Princess 2480
Dragonfly in Amber (Outlander, #2) 2472
Are You There God? It’s Me, Margaret 2471
The Last Song 2471
Needful Things 2470
A Heartbreaking Work of Staggering Genius 2463
The Color of Magic (Discworld, #1; Rincewind #1) 2463
A Streetcar Named Desire 2457
Brisingr (The Inheritance Cycle, #3) 2455
The Cider House Rules 2448
The Death Cure (Maze Runner, #3) 2446
A Man Called Ove 2441
The Son of Neptune (The Heroes of Olympus, #2) 2437
Storm Front (The Dresden Files, #1) 2431
The God of Small Things 2427
Timeline 2426
The Last Battle (Chronicles of Narnia, #7) 2412
Shiver (The Wolves of Mercy Falls, #1) 2411
Fangirl 2409
The War of the Worlds 2404
The Tales of Beedle the Bard 2403
Death of a Salesman 2400
War and Peace 2398
Hatchet (Brian’s Saga, #1) 2392
Club Dead (Sookie Stackhouse, #3) 2387
Clockwork Prince (The Infernal Devices, #2) 2380
The Mists of Avalon (Avalon, #1) 2380
Yes Please 2378
Dead to the World (Sookie Stackhouse, #4) 2375
Tuck Everlasting 2375
Cloud Atlas 2373
The Brothers Karamazov 2369
David Copperfield 2364
Madame Bovary 2361
An Abundance of Katherines 2357
Christine 2352
The Dead Zone 2348
Firestarter 2344
Franny and Zooey 2343
The Rainmaker 2339
King Lear 2333
Congo 2325
The Magicians (The Magicians #1) 2311
Peter Pan 2310
Life After Life 2309
Good in Bed (Cannie Shapiro, #1) 2307
Defending Jacob 2297
The Things They Carried 2278
Uncle Tom’s Cabin 2278
I Am Number Four (Lorien Legacies, #1) 2265
The Vampire Lestat (The Vampire Chronicles, #2) 2263
Zen and the Art of Motorcycle Maintenance: An Inquiry Into Values 2258
Guns, Germs, and Steel: The Fates of Human Societies 2255
2001: A Space Odyssey (Space Odyssey, #1) 2253
Jonathan Strange & Mr Norrell 2251
Neuromancer 2243
Anansi Boys 2238
Don Quixote 2236
Candide 2235
Go Set a Watchman 2233
The Virgin Suicides 2233
Go Ask Alice 2232
The Drawing of the Three (The Dark Tower, #2) 2231
A Short History of Nearly Everything 2230
High Fidelity 2229
Legend (Legend, #1) 2225
The Taming of the Shrew 2225
Frostbite (Vampire Academy, #2) 2219
The Great Hunt (Wheel of Time, #2) 2217
Pretties (Uglies, #2) 2208
Dead as a Doornail (Sookie Stackhouse, #5) 2207
The Adventures of Sherlock Holmes 2203
The Restaurant at the End of the Universe (Hitchhiker’s Guide, #2) 2200
The Elite (The Selection, #2) 2195
One Day 2192
The Martian Chronicles 2189
The Prince 2189
Oryx and Crake (MaddAddam, #1) 2181
Sphere 2181
The Corrections 2177
The Wind in the Willows 2175
The Tale of Peter Rabbit 2174
When You Are Engulfed in Flames 2172
Heidi 2170
Fall of Giants (The Century Trilogy, #1) 2157
The Pearl 2147
A Wizard of Earthsea (Earthsea Cycle, #1) 2143
Little House in the Big Woods (Little House, #1) 2141
Definitely Dead (Sookie Stackhouse, #6) 2139
In the Woods (Dublin Murder Squad, #1) 2132
The Kitchen House 2131
Stargirl (Stargirl, #1) 2123
The Language of Flowers 2122
Perfume: The Story of a Murderer 2121
Odd Thomas (Odd Thomas, #1) 2118
The Reader 2104
Shadow Kiss (Vampire Academy, #3) 2098
The Dragon Reborn (Wheel of Time, #3) 2096
All Together Dead (Sookie Stackhouse, #7) 2095
Fast Food Nation: The Dark Side of the All-American Meal 2095
The Prophet 2094
The Hound of the Baskervilles 2093
American Psycho 2082
Voyager (Outlander, #3) 2081
Julius Caesar 2075
The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change 2073
Can You Keep a Secret? 2065
Everything Is Illuminated 2056
Jonathan Livingston Seagull 2054
Throne of Glass (Throne of Glass, #1) 2051
Snow Crash 2050
The Tempest 2049
Bared to You (Crossfire, #1) 2048
Prey 2045
State of Wonder 2040
Different Seasons 2035
Everything I Never Told You 2032
Maus I: A Survivor’s Tale: My Father Bleeds History (Maus, #1) 2027
A Study in Scarlet 2024
Thinner 2021
1st to Die (Women’s Murder Club, #1) 2019
Blood Promise (Vampire Academy, #4) 2013
The Well of Ascension (Mistborn, #2) 2007
Invisible Man 2004
The Shipping News 2003
The Red Pyramid (Kane Chronicles, #1) 2000
Clockwork Princess (The Infernal Devices, #3) 1999
Mrs. Frisby and the Rats of NIMH (Rats of NIMH, #1) 1998
The Waste Lands (The Dark Tower, #3) 1998
From Dead to Worse (Sookie Stackhouse, #8) 1996
The City of Ember (Book of Ember, #1) 1991
The Silkworm (Cormoran Strike, #2) 1990
Choke 1987
The Passage (The Passage, #1) 1981
The One (The Selection, #3) 1980
From the Mixed-Up Files of Mrs. Basil E. Frankweiler 1977
Before I Go to Sleep 1976
Wizard’s First Rule (Sword of Truth, #1) 1975
A Visit from the Goon Squad 1973
Bag of Bones 1971
V for Vendetta 1965
The Sound and the Fury 1960
After You (Me Before You, #2) 1957
Dead and Gone (Sookie Stackhouse, #9) 1955
The Mark of Athena (The Heroes of Olympus, #3) 1952
Are You My Mother? 1951
The Forgotten Garden 1949
The Storied Life of A.J. Fikry 1948
Murder on the Orient Express (Hercule Poirot, #10) 1947
Freedom 1946
The 5th Wave (The 5th Wave, #1) 1944
Kafka on the Shore 1938
Mrs. Dalloway 1932
Roots: The Saga of an American Family 1930
Spirit Bound (Vampire Academy, #5) 1925
Lonesome Dove 1920
Under the Banner of Heaven: A Story of Violent Faith 1919
Harry Potter Boxset (Harry Potter, #1-7) 1915
The Omnivore’s Dilemma: A Natural History of Four Meals 1915
Love You Forever 1912
The Merchant of Venice 1904
Cell 1903
The House of the Spirits 1902
The Little House Collection (Little House, #1-9) 1897
Beautiful Ruins 1895
The Silmarillion (Middle-Earth Universe) 1890
If You Give a Mouse a Cookie 1889
Norwegian Wood 1887
The Awakening 1886
Patriot Games (Jack Ryan Universe, #2) 1885
Where She Went (If I Stay, #2) 1885
World Without End (The Kingsbridge Series, #2) 1885
Interpreter of Maladies 1884
Alexander and the Terrible, Horrible, No Good, Very Bad Day 1877
Wizard and Glass (The Dark Tower, #4) 1875
The Art of War 1873
Life, the Universe and Everything (Hitchhiker’s Guide, #3) 1872
Twelfth Night 1871
Walden 1869
Midwives 1866
The Polar Express 1865
The Lincoln Lawyer (Mickey Haller, #1; Harry Bosch Universe, #16) 1863
On Writing: A Memoir of the Craft 1862
Scarlet (The Lunar Chronicles, #2) 1854
The Wind-Up Bird Chronicle 1853
The Hero of Ages (Mistborn, #3) 1849
Shōgun (Asian Saga, #1) 1847
A Brief History of Time 1845
The Lucky One 1834
Breakfast at Tiffany’s 1830
Dreamcatcher 1829
Madeline 1826
Who Moved My Cheese? 1826
City of Heavenly Fire (The Mortal Instruments, #6) 1821
Corduroy 1821
A Great and Terrible Beauty (Gemma Doyle, #1) 1820
The Remains of the Day 1819
Twilight: The Complete Illustrated Movie Companion 1818
Twenty Thousand Leagues Under the Sea 1817
Mere Christianity 1815
Crossed (Matched, #2) 1813
In Her Shoes 1812
The Chamber 1810
Starship Troopers 1809
The Complete Grimm’s Fairy Tales 1809
Night Shift 1806
Last Sacrifice (Vampire Academy, #6) 1798
Ender’s Shadow (Ender’s Shadow, #1) 1792
The Shadow Rising (Wheel of Time, #4) 1789
The Way of Kings (The Stormlight Archive, #1) 1789
Red Queen (Red Queen, #1) 1785
P.S. I Love You 1782
The Hobbit: Graphic Novel 1782
Dead in the Family (Sookie Stackhouse, #10) 1780
Message in a Bottle 1779
Nickel and Dimed: On (Not) Getting By in America 1778
Fantastic Beasts and Where to Find Them 1776
Around the World in Eighty Days 1775
Dark Lover (Black Dagger Brotherhood, #1) 1774
Kitchen Confidential: Adventures in the Culinary Underbelly 1774
How to Win Friends and Influence People 1767
Killing Floor (Jack Reacher, #1) 1767
Assassin’s Apprentice (Farseer Trilogy, #1) 1765
The Jungle 1764
House of Sand and Fog 1755
The Blind Assassin 1755
Dolores Claiborne 1752
Xenocide (Ender’s Saga, #3) 1742
Insomnia 1741
The Queen of the Damned (The Vampire Chronicles, #3) 1735
Safe Haven 1734
Daughter of Smoke & Bone (Daughter of Smoke & Bone, #1) 1733
Drums of Autumn (Outlander, #4) 1733
The Angel Experiment (Maximum Ride, #1) 1732
Steve Jobs 1731
The Phantom of the Opera 1731
The Bean Trees (Greer Family, #1) 1730
Empire Falls 1729
Inheritance (The Inheritance Cycle, #4) 1727
Specials (Uglies, #3) 1727
Gerald’s Game 1720
The Hiding Place: The Triumphant True Story of Corrie Ten Boom 1718
The Hunchback of Notre-Dame 1717
Oedipus Rex (The Theban Plays, #1) 1716
Pippi Longstocking 1714
The Absolutely True Diary of a Part-Time Indian 1713
The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics 1710
A Portrait of the Artist as a Young Man 1705
The Bonesetter’s Daughter 1705
Postmortem (Kay Scarpetta, #1) 1704
A Room with a View 1703
Flowers in the Attic (Dollanganger, #1) 1701
The Bluest Eye 1700
Hopeless (Hopeless, #1) 1699
Mr. Mercedes (Bill Hodges Trilogy, #1) 1696
The English Patient 1694
Plain Truth 1691
Reading Lolita in Tehran 1684
The Undomestic Goddess 1682
The Dark Half 1681
About a Boy 1680
In the Garden of Beasts: Love, Terror, and an American Family in Hitler’s Berlin 1680
Olive Kitteridge 1673
Evermore (The Immortals, #1) 1670
The Trial 1669
Holy Bible: King James Version 1668
Half Broke Horses 1666
The Secret (The Secret, #1) 1665
The Alienist (Dr. Laszlo Kreizler, #1) 1662
Lamb: The Gospel According to Biff, Christ’s Childhood Pal 1659
The Age of Innocence 1658
Man’s Search for Meaning 1656
The Dark Tower (The Dark Tower, #7) 1655
The Witch of Blackbird Pond 1649
The Hours 1647
The Princess Diaries (The Princess Diaries, #1) 1647
Girl, Interrupted 1646
Heaven is for Real: A Little Boy’s Astounding Story of His Trip to Heaven and Back 1643
My Ántonia 1638
Orange Is the New Black 1635
Mort (Death, #1; Discworld, #4) 1634
Wolves of the Calla (The Dark Tower, #5) 1633
Paradise Lost 1632
One Fish, Two Fish, Red Fish, Blue Fish 1628
As I Lay Dying 1623
Doctor Sleep (The Shining, #2) 1621
Pandemonium (Delirium, #2) 1619
White Fang 1619
The Tommyknockers 1616
Something Blue (Darcy & Rachel, #2) 1615
Julie and Julia: 365 Days, 524 Recipes, 1 Tiny Apartment Kitchen: How One Girl Risked Her Marriage, Her Job, and Her Sanity to Master the Art of Living 1611
Because of Winn-Dixie 1608
The Girl Who Loved Tom Gordon 1608
Shanghai Girls (Shanghai Girls #1) 1606
Before I Fall 1601
The Exorcist 1596
The Lies of Locke Lamora (Gentleman Bastard, #1) 1596
The Complete Sherlock Holmes 1591
Betrayed (House of Night, #2) 1589
The True Story of the 3 Little Pigs 1589
The Elegance of the Hedgehog 1587
Foundation and Empire (Foundation #2) 1579
The Storyteller 1576
Desperation 1575
Song of Susannah (The Dark Tower, #6) 1572
Cress (The Lunar Chronicles, #3) 1568
His Dark Materials (His Dark Materials #1-3) 1568
Major Pettigrew’s Last Stand 1566
Blindness 1565
The Westing Game 1565
1Q84 1563
Fool Moon (The Dresden Files, #2) 1562
No Country for Old Men 1562
The House of Hades (The Heroes of Olympus, #4) 1562
Walk Two Moons 1561
Wool Omnibus (Silo, #1) 1557
The God Delusion 1556
Mr. Penumbra’s 24-Hour Bookstore (Mr. Penumbra’s 24-Hour Bookstore, #1) 1555
Curious George 1553
Slammed (Slammed, #1) 1553
Will Grayson, Will Grayson 1553
Sabriel (Abhorsen, #1) 1552
The Marriage Plot 1552
Aesop’s Fables 1548
This is Where I Leave You 1547
Untamed (House of Night, #4) 1545
Preludes & Nocturnes (The Sandman #1) 1543
The Master and Margarita 1543
Remember Me? 1541
The Neverending Story 1539
Anna and the French Kiss (Anna and the French Kiss, #1) 1538
Quiet: The Power of Introverts in a World That Can’t Stop Talking 1535
Skeleton Crew 1535
Gathering Blue (The Giver, #2) 1534
Crown of Midnight (Throne of Glass, #2) 1532
Watchers 1531
Seabiscuit: An American Legend 1530
Reflected in You (Crossfire, #2) 1529
The Glass Menagerie 1527
Shutter Island 1526
Americanah 1525
Wolf Hall (Thomas Cromwell, #1) 1525
Nine Stories 1521
Hamlet: Screenplay, Introduction And Film Diary 1514
So Long, and Thanks for All the Fish (Hitchhiker’s Guide to the Galaxy, #4) 1513
Hyperion (Hyperion Cantos, #1) 1511
Daughter of Fortune 1508
The Talisman (The Talisman, #1) 1507
Crescendo (Hush, Hush, #2) 1506
The Bourne Supremacy (Jason Bourne, #2) 1503
Waiting for Godot 1501
Guards! Guards! (Discworld, #8) 1500
Inferno (The Divine Comedy #1) 1496
Brave New World / Brave New World Revisited 1494
People of the Book 1493
Journey to the Center of the Earth (Extraordinary Voyages, #3) 1491
The Once and Future King (The Once and Future King #1-4) 1490
Stiff: The Curious Lives of Human Cadavers 1487
Sophie’s World 1486
The Five Love Languages: How to Express Heartfelt Commitment to Your Mate 1485
Falling Up 1484
Contact 1483
The Accidental Tourist 1483
The Fires of Heaven (Wheel of Time, #5) 1482
Presumed Innocent 1479
Landline 1477
Four Past Midnight 1474
Red Rising (Red Rising, #1) 1474
The Fiery Cross (Outlander, #5) 1472
Second Foundation (Foundation #3) 1469
I Am Legend and Other Stories 1463
The Sense of an Ending 1462
Jaws 1459
Eye of the Needle 1457
Lord of Chaos (Wheel of Time, #6) 1456
Persepolis: The Story of a Childhood (Persepolis, #1) 1454
The Eyes of the Dragon 1452
The Deep End of the Ocean (Cappadora Family, #1) 1451
The Complete Fairy Tales 1447
A Wind in the Door (A Wrinkle in Time Quintet, #2) 1445
Year of Wonders 1445
Firefly Lane (Firefly Lane, #1) 1438
The Prince and the Pauper 1435
The White Tiger 1433
The Republic 1431
Shadow of Night (All Souls Trilogy, #2) 1430
A is for Alibi (Kinsey Millhone, #1) 1428
Dirk Gently’s Holistic Detective Agency (Dirk Gently #1) 1427
The Year of Magical Thinking 1426
Left Behind (Left Behind, #1) 1425
A Fine Balance 1424
The Short Second Life of Bree Tanner: An Eclipse Novella (Twilight, #3.5) 1423
Just Listen 1422
The Bone Collector (Lincoln Rhyme, #1) 1420
The Partner 1418
The Plague 1415
All Creatures Great and Small (All Creatures Great and Small, #1-2) 1414
The Lost World (Jurassic Park, #2) 1414
The Unlikely Pilgrimage of Harold Fry (Harold Fry, #1) 1413
The Reptile Room (A Series of Unfortunate Events, #2) 1409
Grave Peril (The Dresden Files, #3) 1406
Two for the Dough (Stephanie Plum, #2) 1405
Animal Farm / 1984 1404
Chosen (House of Night, #3) 1400
J.R.R. Tolkien 4-Book Boxed Set: The Hobbit and The Lord of the Rings 1400
Howl’s Moving Castle (Howl’s Moving Castle, #1) 1396
Anthem 1394
Calvin and Hobbes 1394
Cloudy With a Chance of Meatballs 1394
The Testament 1393
The Truth About Forever 1391
Career of Evil (Cormoran Strike, #3) 1388
Dead Reckoning (Sookie Stackhouse, #11) 1384
Harold and the Purple Crayon 1383
The Interestings 1383
Darkly Dreaming Dexter (Dexter, #1) 1382
Eats, Shoots & Leaves: The Zero Tolerance Approach to Punctuation 1382
A Crown of Swords (Wheel of Time, #7) 1381
The Street Lawyer 1380
Middlemarch 1379
White Teeth 1378
Pride and Prejudice and Zombies (Pride and Prejudice and Zombies, #1) 1375
Shatter Me (Shatter Me, #1) 1373
Words of Radiance (The Stormlight Archive, #2) 1373
Abraham Lincoln: Vampire Hunter 1370
Ulysses 1369
The History of Love 1368
Fire (Graceling Realm, #2) 1366
Men Are from Mars, Women Are from Venus 1365
One Plus One 1362
The Girls 1362
The Art of Fielding 1361
Cold Sassy Tree 1357
Prodigy (Legend, #2) 1356
Torment (Fallen, #2) 1355
Sophie’s Choice 1350
Tales of a Fourth Grade Nothing (Fudge, #1) 1348
Tell the Wolves I’m Home 1348
The Gathering Storm (Wheel of Time, #12) 1348
A People’s History of the United States 1347
Holidays on Ice 1346
The Swiss Family Robinson 1346
Obsidian (Lux, #1) 1345
House Rules 1338
Equal Rites (Discworld, #3; Witches #1) 1336
Vanity Fair 1335
Maniac Magee 1334
Three to Get Deadly (Stephanie Plum, #3) 1334
We Need to Talk About Kevin 1333
Cannery Row 1332
Exodus 1329
The Good Girl 1329
Easy (Contours of the Heart, #1) 1328
Let’s Pretend This Never Happened: A Mostly True Memoir 1327
The Dinner 1327
Anne of Avonlea (Anne of Green Gables, #2) 1326
The Red Badge of Courage 1325
The Invention of Hugo Cabret 1323
The Tenth Circle 1322
The Man Who Mistook His Wife for a Hat and Other Clinical Tales 1320
Chocolat (Chocolat, #1) 1319
Are You There, Vodka? It’s Me, Chelsea 1318
John Adams 1318
Luckiest Girl Alive 1316
Summer Knight (The Dresden Files, #4) 1316
The Scarlet Pimpernel 1315
To the Lighthouse 1315
Stuart Little 1314
Dubliners 1313
Steelheart (The Reckoners, #1) 1311
Truly Madly Guilty 1308
The Divine Comedy 1307
The Monster at the End of this Book 1307
Nights in Rodanthe 1303
Hollow City (Miss Peregrine’s Peculiar Children, #2) 1301
Brown Bear, Brown Bear, What Do You See? 1300
The Circle 1300
The Constant Princess (The Plantagenet and Tudor Novels, #6) 1298
Leaving Time 1296
The Witching Hour (Lives of the Mayfair Witches, #1) 1296
Dune Messiah (Dune Chronicles #2) 1293
Heir of Fire (Throne of Glass, #3) 1293
The Path of Daggers (Wheel of Time, #8) 1293
Guess How Much I Love You 1291
The Silver Linings Playbook 1291
Born to Run: A Hidden Tribe, Superathletes, and the Greatest Race the World Has Never Seen 1289
Royal Assassin (Farseer Trilogy, #2) 1289
Midnight Sun (Twilight, #1.5) 1286
Rose Madder 1286
The Girl with All the Gifts 1286
The Rescue 1285
The Wide Window (A Series of Unfortunate Events, #3) 1285
The White Queen (The Plantagenet and Tudor Novels, #2) 1284
The Sweetness at the Bottom of the Pie (Flavia de Luce, #1) 1283
Trainspotting 1283
Midnight’s Children 1281
The Nest 1280
We Were the Mulvaneys 1280
Schindler’s List 1278
Four to Score (Stephanie Plum, #4) 1277
A Court of Thorns and Roses (A Court of Thorns and Roses, #1) 1276
The Brethren 1276
The Girls’ Guide to Hunting and Fishing 1276
1776 1274
Clear and Present Danger (Jack Ryan Universe, #6) 1274
Helter Skelter: The True Story of the Manson Murders 1274
Stone Soup 1274
Dragonflight (Dragonriders of Pern, #1) 1271
Let the Great World Spin 1267
The Mysterious Affair at Styles (Hercule Poirot, #1) 1267
The Story of Edgar Sawtelle 1266
Hearts in Atlantis 1262
The Indian in the Cupboard (The Indian in the Cupboard, #1) 1261
The Chosen 1259
The Long Walk 1259
Duma Key 1258
Far from the Madding Crowd 1258
Reconstructing Amelia 1258
The Beach House 1258
The Blade Itself (The First Law, #1) 1258
Mystic River 1257
The Alchemyst (The Secrets of the Immortal Nicholas Flamel, #1) 1257
Elantris (Elantris, #1) 1256
Reached (Matched, #3) 1256
The Raven Boys (The Raven Cycle, #1) 1256
The Guardian 1254
The Last of the Mohicans (The Leatherstocking Tales #2) 1254
The Hundred-Year-Old Man Who Climbed Out of the Window and Disappeared 1253
Old Man’s War (Old Man’s War, #1) 1252
The Purpose Driven Life: What on Earth Am I Here for? 1251
A Breath of Snow and Ashes (Outlander, #6) 1250
The Arctic Incident (Artemis Fowl, #2) 1250
The Essential Calvin and Hobbes: A Calvin and Hobbes Treasury 1249
Unwind (Unwind, #1) 1248
Wicked Lovely (Wicked Lovely, #1) 1248
Rendezvous with Rama (Rama, #1) 1247
The Complete Anne of Green Gables Boxed Set (Anne of Green Gables, #1-8) 1247
Death Masks (The Dresden Files, #5) 1245
Silence (Hush, Hush, #3) 1245
America (The Book): A Citizen’s Guide to Democracy Inaction 1242
Batman: The Dark Knight Returns (The Dark Knight Saga, #1) 1242
Sh*t My Dad Says 1242
The Heart is a Lonely Hunter 1242
Shadow and Bone (Shadow and Bone, #1) 1239
The Secret Keeper 1239
Children of the Mind (Ender’s Saga, #4) 1237
Hunted (House of Night, #5) 1237
Stones from the River 1237
Suzanne’s Diary for Nicholas 1237
Winter’s Heart (Wheel of Time, #9) 1233
The Man in the High Castle 1232
Childhood’s End 1231
Bloodlines (Bloodlines, #1) 1227
The Eyre Affair (Thursday Next, #1) 1227
A Long Way Gone: Memoirs of a Boy Soldier 1226
The Round House 1222
The Mist 1219
One Thousand White Women: The Journals of May Dodd (One Thousand White Women#1) 1218
Prodigal Summer 1216
Roll of Thunder, Hear My Cry (Logans, #4) 1216
The Amulet of Samarkand (Bartimaeus, #1) 1216
The Iron King (The Iron Fey, #1) 1216
I’ve Got Your Number 1215
Towers of Midnight (Wheel of Time, #13) 1212
Hot Six (Stephanie Plum, #6) 1211
Lady Chatterley’s Lover 1210
Anne of the Island (Anne of Green Gables, #3) 1209
Let’s Explore Diabetes with Owls 1209
The Wedding (The Notebook, #2) 1209
The Pilot’s Wife 1207
The Aeneid 1206
The Autobiography of Malcolm X 1203
Perfect Chemistry (Perfect Chemistry, #1) 1202
Guilty Pleasures (Anita Blake, Vampire Hunter, #1) 1201
High Five (Stephanie Plum, #5) 1201
Tender Is the Night 1201
Gone (Gone, #1) 1200
The Forever War (The Forever War, #1) 1199
A Swiftly Tilting Planet (A Wrinkle in Time Quintet, #3) 1197
Tempted (House of Night, #6) 1196
Ethan Frome 1195
The Alloy of Law (Mistborn, #4) 1195
The Little Engine That Could 1193
The Bonfire of the Vanities 1190
Walking Disaster (Beautiful, #2) 1190
The Invisible Man 1189
Blood Rites (The Dresden Files, #6) 1187
Fates and Furies 1187
Lover Eternal (Black Dagger Brotherhood, #2) 1185
left_join(BookData,BookDataMeta[c('book_id','authors','title','language_code')]) %>% 
  group_by(title)%>% 
  tally() %>% 
  top_n(15) %>% 
  ggplot(aes(x = fct_reorder(title, -n), y = n))+
  geom_col()+
  plottheme+
  theme(axis.text.x = element_text(angle = 45,hjust = 1))+
  labs(x = "Books", y = "Number of Views")

average_book_ratings<- apply(BookDataMatrix, 2, mean, na.rm = T)

average_user_ratings<- apply(BookDataMatrix, 1, mean, na.rm = T)

p1<-qplot(average_book_ratings,alpha = 0.5)+plottheme+theme(legend.position = 'none')+coord_cartesian(xlim = c(1,5))
p2<-qplot(average_user_ratings, alpha = 0.5 )+plottheme+theme(legend.position = 'none')+coord_cartesian(xlim = c(1,5))
grid.arrange(p1,p2, ncol = 1)

As shown in the figure above, diving a little bit further into the dataset allows us to see what kind of biases are exhibited in our dataset’s distribution. It is clear that there are few users with an average rating less than 3, and that there are few books with average ratings less than 3. It is likely that users with average ratings less than 3 have very few reviews and may have only rated a book when there was dislike.

Recommender Systems

About

The following describes some of the recommender systems produced in this final project as learned throughout Data 612’s coursework.
IBCF is a method that looks at similarities between items (books) and makes recommendations. The algorithm considers user’s purchases (items) and recommends similar items and the core of this algorithm is based on: how similar 2 items are when receiving similar ratings from similar users, identifying the k-most similar items and identifying user specific recommendations based on user purchase history. The results of the top 20 most recommended books are shown in the following figure.
UBCF is a method that looks at similarities between users and makes recommendations to users based on this similarity. The algorithm measures how similar each user is to another and a similarity matrix can define the top similar users via an algorithm like k-nearest neighbors, or similarity can be determined by some threshold similarity value. The user ratings are used as a weight on the books and this is multiplied by the similarity coefficient in order to prioritize recommendations. We can apply the same code and thought process to the recommender function and use method UBCF and the most recommended books can be found in the following figure.
Singular value decomposition (SVD) is a common dimensionality reduction technique that identifies latent semantic factors for retrieving information. Some of the difficulties associated with this technique can be contributed to sparse data (missing values) often present in user-item matrices. Unfortunately filling huge matrices with missing values can often be expensive or even misleading. Regularization models (penalty-based error minimizing) can assist with this.

Model Setup

There are many different recommender systems and there are several methods to define which models produce greater results. Some common metrics for model evaluations are: root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE), receiver operating characteristic (ROC), area under the ROC curve (AUC). The metric that will be presented here is the ROC curve for all models (IBCF and UBCF) for each model, three methods will be evaluated, each measure the similarity between two vectors through different techniques. These include Cosine similarity function, pearson similarity function, and jaccard similarity function. In addition to these, a random or “guessing” recommendation will be produced in order to have a baseline to compare model performance to. The plot below shows the results of this evaluation and the IBCF consistently outperforms the UBCF models and the jaccard IBCF is our best.

data_package <- data(package = "recommenderlab")
data("BookDataMatrix")
## Warning in data("BookDataMatrix"): data set 'BookDataMatrix' not found
recmodels<-recommenderRegistry$get_entries(dataType = "realRatingMatrix")
kable(names(recmodels)) %>%
  kable_styling(full_width = FALSE,
                position = "center",
                bootstrap_options = c("hover", "condense","responsive")) %>% 
  add_header_above(header = "Available Recommender Models")
Available Recommender Models
x
ALS_realRatingMatrix
ALS_implicit_realRatingMatrix
IBCF_realRatingMatrix
LIBMF_realRatingMatrix
POPULAR_realRatingMatrix
RANDOM_realRatingMatrix
RERECOMMEND_realRatingMatrix
SVD_realRatingMatrix
SVDF_realRatingMatrix
UBCF_realRatingMatrix

The ratings for the books are taking from the BookDataMatrix dataset and we will remove books with less than 1500 reviews, as well as users with less than 175 ratings to work with a smaller dataset. Our rating threshold will be defined with “4” as our positive rating since most ratings are between 3 and 5 so books rated 4 or greater are considered positive. Our training percentage will be an 80/20 split of the data and we will compute 10 recommendations per user for calculating RMSE. when calculating ROC curve, we will look at different quantities of recommendations produced by the models and see how the models perform.

BookData$user_id <- as.factor(BookData$user_id)
BookData$book_id <- as.factor(BookData$book_id)

BookDataMatrix<-as(data.frame(BookData), 'realRatingMatrix')

ratings_books <- BookDataMatrix[rowCounts(BookDataMatrix) > 175,  
                             colCounts(BookDataMatrix) > 1500] 

train_percent<-0.8
kept_items<-10
rating_threshold<-3
n_eval<-8
no_recommendations<-10

eval_sets <- evaluationScheme(data = ratings_books, method = "cross-validation", 
                              train = train_percent, given = kept_items, goodRating = rating_threshold, k = n_eval)


#used to train
recc_train<- getData(eval_sets,'train')
#used to predict
recc_known<-getData(eval_sets,'known')
#used to test
recc_unknown<-getData(eval_sets,'unknown')

Model Evaluation

We initially start by making 10 book recommendations to users in our testing dataset. The table below provides a measure of accuracy on our models with root mean square error (RMSE), mean square error (MSE), and MAE, mean absolute error (MAE). Our preliminary results indicate that our UBCF jaccard, UBCF cosine, and SVD methods are our least erroneous recommender systems.

UBCF_pearson_eval<-Recommender(recc_train,method = "UBCF", parameter = list(method = "pearson"))
UBCF_jaccard_eval<-Recommender(recc_train,method = "UBCF", parameter = list(method = "jaccard"))
UBCF_cosine_eval<-Recommender(recc_train,method = "UBCF", parameter = list(method = "cosine"))
IBCF_pearson_eval<-Recommender(recc_train,method = "IBCF", parameter = list(method = "pearson"))
IBCF_jaccard_eval<-Recommender(recc_train,method = "IBCF", parameter = list(method = "jaccard"))
IBCF_cosine_eval<-Recommender(recc_train,method = "IBCF", parameter = list(method = "cosine"))
SVD_eval<-Recommender(recc_train,method = "SVD")
ALS_eval<-Recommender(recc_train,method = "ALS")


recommendations<-function(eval){
  recc_predicted<-predict(object = eval,newdata=recc_known,n=no_recommendations)

recc_matrix <- sapply(recc_predicted@items, function(x){
  colnames(ratings_books)[x]
})

number_of_items<-recc_matrix %>% unlist() %>% table() %>% as.data.frame()

table_top <- data.frame("names.number_of_items_top." = number_of_items$.,  
                        "number_of_items_top"= number_of_items$Freq)
BookDataMeta$book_id<-as.factor(BookDataMeta$book_id)

table_top %>% left_join(BookDataMeta, by = c("names.number_of_items_top."="book_id" )) %>% 
  top_n(20) %>% 
  ggplot(mapping = aes(x=fct_reorder(title,-as.numeric(number_of_items_top)), y = as.numeric(number_of_items_top)))+
  geom_col(aes(fill = as.numeric(number_of_items_top)),color = 'black', alpha = 0.5)+
  theme(axis.text.x = element_text(angle = 90),
        legend.position = 'none',
        panel.grid = element_blank(),
        panel.background = element_blank())+
  labs(x = "Book Title",
       y = "Number of Recommendations",
       title = "Top 20 Book Recomendations")
}
ModelErrors<-function(eval){
  
    recc_predicted<-predict(object = eval,newdata=recc_known,n=no_recommendations, type= "ratings")
    calcPredictionAccuracy(recc_predicted,recc_unknown, byUser = F)

}

rbind(
  UBCF_pearson= ModelErrors(UBCF_pearson_eval),
  UBCF_jaccard= ModelErrors(UBCF_jaccard_eval),
  UBCF_cosine= ModelErrors(UBCF_cosine_eval),
  IBCF_pearson= ModelErrors(IBCF_pearson_eval),
  IBCF_jaccard= ModelErrors(IBCF_jaccard_eval),
  IBCF_cosine= ModelErrors(IBCF_cosine_eval),
  SVD= ModelErrors(SVD_eval),
  ALS= ModelErrors(ALS_eval))
##                   RMSE       MSE       MAE
## UBCF_pearson 0.9682550 0.9375178 0.7700317
## UBCF_jaccard 0.9631783 0.9277125 0.7605164
## UBCF_cosine  0.9678766 0.9367851 0.7634025
## IBCF_pearson 1.2998168 1.6895237 0.8932116
## IBCF_jaccard 1.2653267 1.6010517 0.9252416
## IBCF_cosine  1.3281864 1.7640792 0.9325961
## SVD          0.9703118 0.9415049 0.7685797
## ALS          0.9872877 0.9747370 0.7869963


Finally, We can look at the ROC curve and tune the models on various normalization techniques to see if this provides and adjustments to model accuracy. The normalization techniques include “center” where the data is adjusted so that the mean of each user is 0. “Z-Score” normalization allows the dataset to be adjusted by measuring the distance (number of standard deviations) a rating is from the mean for a given user.
Our ROC curve below indicates that the most accurate recommender system from the models are consistent with the RMSE showing that our singular value decomposition and user based collaborative filtering methods are optimal. Using the function that we created called “recommendations” we can produce a figure showing the top 20 book recommendations for any of our models, in this case we’ll use the best performing one as shown in the final figure.

models_to_evaluate <- list(
  UBCF_pearson = list(name = "UBCF", param = list(method =  
                                                    "pearson")), 
  UBCF_jaccard = list(name = "UBCF", param = list(method =  
                                                    "jaccard")), 
  UBCF_cosine = list(name = "UBCF", param = list(method =  
                                                    "cosine")), 
  IBCF_pearson = list(name = "IBCF", param = list(method =  
                                                    "pearson")), 
  IBCF_jaccard = list(name = "IBCF", param = list(method =  
                                                    "jaccard")), 
  IBCF_cosine = list(name = "IBCF", param = list(method =  
                                                    "cosine")), 
  SVD_Z = list(name = "SVD", param = list(normalize = "Z-score")),
    SVD_Center = list(name = "SVD", param = list(normalize = "center")), 
  ALS_Z = list(name = "ALS", param = list(normalize = "Z-score")),
    ALS_Center = list(name = "ALS", param = list(normalize = "center")),
  random = list(name = "RANDOM", param=NULL)
)


n_recommendations <- c(1, 5, seq(10, 100, 10))

list_results <- evaluate(x = eval_sets, method = models_to_evaluate, n= n_recommendations)
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.12sec] 
##   2  [0sec/0.14sec] 
##   3  [0sec/0.13sec] 
##   4  [0.02sec/0.11sec] 
##   5  [0sec/0.11sec] 
##   6  [0sec/0.12sec] 
##   7  [0sec/0.11sec] 
##   8  [0sec/0.11sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.13sec] 
##   2  [0sec/0.11sec] 
##   3  [0sec/0.11sec] 
##   4  [0sec/0.12sec] 
##   5  [0sec/0.11sec] 
##   6  [0sec/0.12sec] 
##   7  [0sec/0.11sec] 
##   8  [0.01sec/0.11sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.1sec] 
##   2  [0sec/0.11sec] 
##   3  [0sec/0.11sec] 
##   4  [0sec/0.11sec] 
##   5  [0sec/0.12sec] 
##   6  [0sec/0.12sec] 
##   7  [0sec/0.13sec] 
##   8  [0sec/0.12sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.5sec/0.01sec] 
##   2  [0.5sec/0.02sec] 
##   3  [0.5sec/0.01sec] 
##   4  [0.48sec/0.02sec] 
##   5  [0.49sec/0.01sec] 
##   6  [0.5sec/0.02sec] 
##   7  [0.5sec/0.02sec] 
##   8  [0.48sec/0.02sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.47sec/0.01sec] 
##   2  [0.45sec/0.01sec] 
##   3  [0.47sec/0.02sec] 
##   4  [0.47sec/0.01sec] 
##   5  [0.47sec/0.01sec] 
##   6  [0.47sec/0.02sec] 
##   7  [0.52sec/0.01sec] 
##   8  [0.47sec/0.02sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.47sec/0.01sec] 
##   2  [0.52sec/0.02sec] 
##   3  [0.47sec/0.02sec] 
##   4  [0.48sec/0.02sec] 
##   5  [0.49sec/0sec] 
##   6  [0.5sec/0.02sec] 
##   7  [0.48sec/0.02sec] 
##   8  [0.49sec/0.01sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.05sec/0.03sec] 
##   2  [0.05sec/0.03sec] 
##   3  [0.06sec/0.03sec] 
##   4  [0.07sec/0.03sec] 
##   5  [0.06sec/0.03sec] 
##   6  [0.07sec/0.03sec] 
##   7  [0.06sec/0.03sec] 
##   8  [0.03sec/0.03sec] 
## SVD run fold/sample [model time/prediction time]
##   1  [0.05sec/0.03sec] 
##   2  [0.05sec/0.03sec] 
##   3  [0.05sec/0.03sec] 
##   4  [0.05sec/0.03sec] 
##   5  [0.04sec/0.03sec] 
##   6  [0.04sec/0.02sec] 
##   7  [0.07sec/0.03sec] 
##   8  [0.04sec/0.02sec] 
## ALS run fold/sample [model time/prediction time]
##   1  [0sec/11.02sec] 
##   2  [0sec/10.9sec] 
##   3  [0sec/12.15sec] 
##   4  [0sec/10.94sec] 
##   5  [0sec/11.76sec] 
##   6  [0sec/11.3sec] 
##   7  [0sec/11.3sec] 
##   8  [0sec/11.04sec] 
## ALS run fold/sample [model time/prediction time]
##   1  [0sec/11.17sec] 
##   2  [0sec/11.48sec] 
##   3  [0sec/10.87sec] 
##   4  [0sec/11.18sec] 
##   5  [0sec/10.99sec] 
##   6  [0sec/11.15sec] 
##   7  [0sec/11.23sec] 
##   8  [0sec/11.04sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0sec/0.03sec] 
##   2  [0sec/0.01sec] 
##   3  [0sec/0.01sec] 
##   4  [0sec/0.03sec] 
##   5  [0sec/0.03sec] 
##   6  [0sec/0.03sec] 
##   7  [0sec/0.03sec] 
##   8  [0sec/0.03sec]
avg_matrices <- lapply(list_results, avg)


plot(list_results, annotate = 1, legend = "topleft") 
title("ROC curve")

recommendations(UBCF_pearson_eval)
## Warning: Column `names.number_of_items_top.`/`book_id` joining factors with
## different levels, coercing to character vector