Project-Proposal
The goal for your final project is for you to build out a recommender system using a large dataset (ex: 1M+ ratings or 10k+ users, 10k+ items. There are three deliverables, with separate dates:
[1] Planning Document Find an interesting dataset and describe the system you plan to build out. If you would like to use one of the datasets you have already worked with, you should add a unique element or incorporate additional data. (i.e. explicit features you scrape from another source, like image analysis on movie posters). The overall goal, however, will be to produce quality recommendations by extracting insights from a large dataset. You may do so using Spark, or another distributed computing method, OR by effectively applying one of the more advanced mathematical techniques we have covered. There is no preference for one over the other, as long as your recommender works! The planning document should be written up and published as a notebook on GitHub or in RPubs.Please submit the link in the Unit 4 folder, due Thursday, July 5.
I found an interesting dataset to work with for the final project. Dataset The dataset is a collection of users and their ratings to different games. The goal would be create a recommender system that would recommend games to users based on their gaming habits and preferences.
Context
Steam is the world’s most popular PC Gaming hub, with over 6,000 games and a community of millions of gamers. With a massive collection that includes everything from AAA blockbusters to small indie titles, great discovery tools are a highly valuable asset for Steam. How can we make them better?
Metadata
This dataset is a list of user behaviors, with columns: user-id, game-title, behavior-name, value. The behaviors included are ‘purchase’ and ‘play’. The value indicates the degree to which the behavior was performed - in the case of ‘purchase’ the value is always 1, and in the case of ‘play’ the value represents the number of hours the user has played the game.
##
0%| | 0.00/1.46M [00:00<?, ?B/s]
68%|██████▊ | 1.00M/1.46M [00:00<00:00, 7.29MB/s]
100%|██████████| 1.46M/1.46M [00:00<00:00, 7.09MB/s]
## Downloading steam-video-games.zip to /localdisk/Data-612/Final
##
## Archive: steam-video-games.zip
## inflating: steam-200k.csv
## V1 V2 V3 V4 V5
## 1 151603712 The Elder Scrolls V Skyrim purchase 1.0 0
## 2 151603712 The Elder Scrolls V Skyrim play 273.0 0
## 3 151603712 Fallout 4 purchase 1.0 0
## 4 151603712 Fallout 4 play 87.0 0
## 5 151603712 Spore purchase 1.0 0
## 6 151603712 Spore play 14.9 0
## [1] 200000 5