Requirements: The goal for the final project is for to build out a recommended system using a large data set (ex: 1M+ ratings or 10k+ users, 10k+ items) Below, please find the dimensions of the beer advocate data set raw data file.
Data set: - The Beer Advocate data set provides reviews for a variety of beers over a period of more than 10 years. The data set includes approximately 1.5 million reviews, scoring on five “aspects”: appearance, aroma, palate, taste, and overall impression. Reviews include product and user information, followed by each of these five ratings, and a plain text review. Source:BeerAdvocate.
## [1] "Raw.Df Dimensions: c(1586614, 13)"
## [1] "Distinct Reviewers 33388"
## [1] "Distinct Beers 66055"
To make this a more actionable and complete data set: - Filter out recommendations that have no reviewer name - Filter for beers that have been reviewed greater than 100 times - Filter for reviewers that have reviewed more than 50 times - Create a summary from the filtered data set by beer and by user
This should provide a more actively reviewed data set and hopefully more meaningful recommendations.
## [1] "Filtered Dimensions: c(643604, 12)"
## [1] "Distinct Reviewers 1763"
## [1] "Distinct Beers 2227"
| reviewer_ID | review_profilename | overall.rt.mean | aroma.rt.mean | appreance.rt.mean | palate.rt.mean | taste.rt.mean | num.beer.reviewed |
|---|---|---|---|---|---|---|---|
| 129 | BuckeyeNation | 3.914141 | 3.761031 | 3.931951 | 3.750665 | 3.793727 | 1881 |
| 64 | mikesgroove | 4.084906 | 3.837907 | 3.964265 | 3.946541 | 3.910520 | 1749 |
| 161 | BEERchitect | 3.886252 | 3.859741 | 3.890567 | 3.735820 | 3.864365 | 1622 |
| 108 | brentk56 | 3.906600 | 3.937422 | 4.069116 | 3.954234 | 3.951744 | 1606 |
| 43 | northyorksammy | 3.805243 | 3.727528 | 3.785268 | 3.662921 | 3.716916 | 1602 |
| 189 | WesWes | 3.961615 | 3.965917 | 3.896095 | 3.889146 | 3.870946 | 1511 |
| beer_beerid | beer_name | overall.rt.mean | aroma.rt.mean | appreance.rt.mean | palate.rt.mean | taste.rt.mean | num.beer.reviewed |
|---|---|---|---|---|---|---|---|
| 2093 | 90 Minute IPA | 4.125526 | 4.191094 | 4.175666 | 4.166550 | 4.279804 | 1426 |
| 412 | Old Rasputin Russian Imperial Stout | 4.171418 | 4.185674 | 4.353528 | 4.211333 | 4.318603 | 1403 |
| 1904 | Sierra Nevada Celebration Ale | 4.184345 | 4.073884 | 4.235552 | 4.076810 | 4.165325 | 1367 |
| 1093 | Two Hearted Ale | 4.353481 | 4.286917 | 4.183627 | 4.153405 | 4.329763 | 1307 |
| 4083 | Stone Ruination IPA | 4.152140 | 4.333463 | 4.175486 | 4.185214 | 4.322568 | 1285 |
| 680 | Brooklyn Black Chocolate Stout | 4.031915 | 4.105989 | 4.270686 | 4.162727 | 4.181245 | 1269 |
## [1] "The dimensions of the user-item matrix: c(1763, 2203)"
| # 100 | #9 | 10 Commandments | 1000 IBU | 10th Anniversary Double India Pale Ale | 120 Minute IPA | 12th Anniversary Undercover Investigation Shut-Down Ale | 14’ER ESB | 1554 Enlightened Black Ale | 15th Anniversary Wood Aged | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1fastz28 | 0 | 0.0 | 0 | 0 | 0.0 | 3.0 | 0 | 3 | 3.5 | 0 |
| 4DAloveofSTOUT | 0 | 0.0 | 0 | 0 | 0.0 | 3.5 | 0 | 0 | 0.0 | 0 |
| 99bottles | 0 | 0.0 | 0 | 0 | 0.0 | 3.0 | 0 | 0 | 0.0 | 0 |
| 9InchNails | 0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0 |
| aaronh | 0 | 2.0 | 0 | 0 | 3.5 | 3.5 | 0 | 0 | 5.0 | 0 |
| AaronHomoya | 0 | 3.5 | 0 | 0 | 0.0 | 4.0 | 0 | 0 | 0.0 | 0 |
| AaronRed | 0 | 3.0 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0 |
| aasher | 0 | 2.5 | 0 | 0 | 0.0 | 0.0 | 0 | 0 | 4.0 | 0 |
| abcsofbeer | 0 | 3.0 | 0 | 0 | 0.0 | 4.0 | 0 | 4 | 0.0 | 0 |
| abrand | 0 | 0.0 | 4 | 0 | 0.0 | 0.0 | 0 | 0 | 4.5 | 0 |
## model.name ID
## TP 7.38 TP
## FP 2.62 FP
## FN 354.24 FN
## TN 1828.76 TN
## precision 0.74 precision
## recall 0.03 recall
## TPR 0.03 TPR
## FPR 0.00 FPR
## [1] "Beers abcsofbeer may enjoy (below list):"
## beer_name
## 1 Belhaven Twisted Thistle IPA
## 2 Hex
## 3 Sless' Oatmeal Stout
## 4 Menace De Dieu
## 5 Wild Ride IPA
## 6 Gumballhead
## 7 Atwater Vanilla Java Porter
## 8 Uff-da
## 9 Brooklyn Local 1
## 10 Choklat
## List of 3
## $ d: num [1:1763] 1759 610 412 383 357 ...
## $ u: num [1:1763, 1:1763] -0.0203 -0.0106 -0.0163 -0.0106 -0.0276 ...
## $ v: num [1:2203, 1:1763] -0.0123 -0.0374 -0.0148 -0.0075 -0.0105 ...
Sum the squares of each singular value; Sum(beer_svd$d^2)
The model can retain 90% of the variability by keeping 650 singular values
Assign metadata from original user.beer.matrix
## [1] "The model can retain 90% of the variability by keeping: 650 singular values"
## [1] 650 650
## [1] 1763 650
## [1] 650 2203
## [1] 1763 2203
## [1] "min values: -2.06704979248077"
## [1] "max values: 6.70776345237745"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.1711 0.7400 0.6166 5.0000
## [1] "RMSE of SVD Model; 0.44"
Store the top 10 ratings to a df_total data frame
Provide context by displaying the average rating by the randomly generated user in the base data set
| Rating_Pred | beer_name | review_profilename | reviewer_ID |
|---|---|---|---|
| 5.00 | Gulden Draak (Dark Triple) | 1fastz28 | 1076 |
| 5.00 | Hoegaarden Original White Ale | 1fastz28 | 1076 |
| 5.00 | Pranqster | 1fastz28 | 1076 |
| 5.00 | Samuel Smith’s Nut Brown Ale | 1fastz28 | 1076 |
| 4.94 | Nut Brown Ale | 1fastz28 | 1076 |
| 4.90 | La Chouffe | 1fastz28 | 1076 |
## [1] "Top 10 Beer Recommendations for User: axeman9182 found below"
## Rating_Pred beer_name review_profilename
## 1 5.00 Festina Pêche axeman9182
## 2 5.00 La Fin Du Monde axeman9182
## 3 5.00 Temptation axeman9182
## 4 4.99 Racer 5 India Pale Ale axeman9182
## 5 4.98 Bell's Hopslam Ale axeman9182
## 6 4.98 Consecration axeman9182
## 7 4.97 Bourbon County Brand Vanilla Stout axeman9182
## 8 4.95 Terrapin Rye Pale Ale axeman9182
## 9 4.93 Founders CBS Imperial Stout axeman9182
## 10 4.90 Supplication axeman9182
## reviewer_ID
## 1 3908
## 2 3908
## 3 3908
## 4 3908
## 5 3908
## 6 3908
## 7 3908
## 8 3908
## 9 3908
## 10 3908
## [1] "The Average Beer Rating for User in The Base Dataset:"
## # A tibble: 0 x 8
## # Groups: reviewer_ID [0]
## # ... with 8 variables: reviewer_ID <int>, review_profilename <fct>,
## # overall.rt.mean <dbl>, aroma.rt.mean <dbl>, appreance.rt.mean <dbl>,
## # palate.rt.mean <dbl>, taste.rt.mean <dbl>, num.beer.reviewed <int>
`
This project allowed for a greater understanding of user discovery based recommender systems. More specifically, I had the opportunity to understand how to better produce and optimize recommendations with the collaborative filtering model and singular value decomposition model. In future iterations of this project, I would like to explore building a nice user interface to interact with the model.
https://www.rdocumentation.org/packages/data.table/versions/1.12.2/topics/dcast.data.table
http://rstudio-pubs-static.s3.amazonaws.com/183710_5f68cc1f7bce4921843e19ce0d5a9264.html
http://langvillea.people.cofc.edu/DISSECTION-LAB/Emmie%27sLSI-SVDModule/p5module.html
https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/