I love music and I am trying to learn data science, it just so happens that I have also been tracking my listening history with last.fm for the last four years, which means that I have a decently large dataset. Since I’m lazy I used this website to pull the data out of last fm instead of writing my own parser.
Now onto the actual data science: the first thing that we do is import the data and clean it up into a usable state.
[1] 77147
It looks like I have scrobbled 77,147 songs (including repeat listens) since May 2015.
[1] 0
[1] 509
[1] 0
Only 509 observations are missing an album and the rest have both an artist and track, not bad. That totals up to only about 0.65% of the observations missing some sort of crucial information.
Now that we have the cleaned up data we can do some rudimentary observations on my listening history. Such as the artists that I have listened to the most along with the amount of songs that I have listened too from them.
We can also plot the amount of scrobbles per month on a nice colorful graph:

It’s extremely odd that the February and March of 2019 both have pretty much exactly the same listen count, February with 2374
scrobbles and March with 2375
scrobbles.
Comparison with my spotify library
I have been using the spotify library function as a rudiementary form of “liking” songs ever since I started tracking my music with last.fm, meaning that every time I have liked a song it goes into the spotify library which thankfully tracks the added time as well.
I wrote a simple python script to query the spotify api to get a list of all tracks and then to nicely save that to a csv file.
import requests
import csv
import logging
## Query spotify api and get data
logging.basicConfig(level=logging.DEBUG) # to show request logs
url = "https://api.spotify.com/v1/me/tracks"
headers = {'Authorization':'Bearer <Auth Token>'}
# to get the auth token, I just copy pasted the token that I got from testing
# out the developer console.
first_req = requests.get(url, headers=headers, params={'limit': 1,'market':'US'})
total_tracks = first_req.json()['total']
def paginate(total, per_page):
num_requests = total // per_page
current_page = 0
items = []
for i in range(num_requests + 1):
offset = 50 * i
params = {'limit': 50, 'offset': offset, 'market': 'US'}
req = requests.get(url, headers=headers, params=params)
items.extend(req.json()['items'])
return items
# the spotify api is capable of only 50 tracks per page
saved_tracks = paginate(total_tracks, 50)
## Save data from api to csv
formatted_data = []
for track in saved_tracks:
track_info = {}
track_info['timestamp'] = track['added_at']
track_info['artist'] = track['track']['artists'][0]['name']
track_info['album'] = track['track']['album']['name']
track_info['track'] = track['track']['name']
formatted_data.append(track_info)
with open('spotify-saved.csv', 'w') as save_file:
fields = ['timestamp', 'artist', 'album', 'track']
savewriter = csv.DictWriter(save_file, fieldnames=fields, quoting=csv.QUOTE_ALL)
savewriter.writeheader()
for row in formatted_data:
savewriter.writerow(row)
Now that we have the nicely formatted csv file with all the tracks that I have saved to my spotify library we can do some analysis on it and compare it with my listening history.
LS0tCnRpdGxlOiAiUGVyc29uYWwgTGFzdC5mbSBsaXN0ZW5pbmcgQW5hbHlzaXMiCm91dHB1dDoKICBodG1sX25vdGVib29rOgogICAgaGlnaGxpZ2h0OiBweWdtZW50cwotLS0KCkkgbG92ZSBtdXNpYyBhbmQgSSBhbSB0cnlpbmcgdG8gbGVhcm4gZGF0YSBzY2llbmNlLCBpdCBqdXN0IHNvIGhhcHBlbnMgdGhhdCBJCmhhdmUgYWxzbyBiZWVuIHRyYWNraW5nIG15IGxpc3RlbmluZyBoaXN0b3J5IHdpdGggW2xhc3QuZm1dKGh0dHBzOi8vbGFzdC5mbSkKZm9yIHRoZSBsYXN0IGZvdXIgeWVhcnMsIHdoaWNoIG1lYW5zIHRoYXQgSSBoYXZlIGEgZGVjZW50bHkgbGFyZ2UgZGF0YXNldC4KU2luY2UgSSdtIGxhenkgSSB1c2VkIFt0aGlzXShodHRwczovL21haW5zdHJlYW0uZ2hhbi5ubC9leHBvcnQuaHRtbCkgd2Vic2l0ZSB0bwpwdWxsIHRoZSBkYXRhIG91dCBvZiBsYXN0IGZtIGluc3RlYWQgb2Ygd3JpdGluZyBteSBvd24gcGFyc2VyLgoKTm93IG9udG8gdGhlIGFjdHVhbCBkYXRhIHNjaWVuY2U6IHRoZSBmaXJzdCB0aGluZyB0aGF0IHdlIGRvIGlzIGltcG9ydCB0aGUgZGF0YQphbmQgY2xlYW4gaXQgdXAgaW50byBhIHVzYWJsZSBzdGF0ZS4KCmBgYHtyIGltcG9ydCwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRSwgcGFnZWQucHJpbnQ9VFJVRX0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkobHVicmlkYXRlKQoKcmF3X3Njcm9iYmxlcyA8LSByZWFkX2Nzdigic2Nyb2JibGVzLmNzdiIpCgpzY3JvYmJsZXMgPC0gCiAgcmF3X3Njcm9iYmxlcyAlPiUKICAgIG11dGF0ZSh0aW1lc3RhbXAgPSBhc19kYXRldGltZSh1dHMpKSAlPiUKICAgIHNlbGVjdCh0aW1lc3RhbXAsCiAgICAgICAgICAgYXJ0aXN0LAogICAgICAgICAgIGFsYnVtLAogICAgICAgICAgIHRyYWNrLAogICAgICAgICAgIGFydGlzdF9tYmlkLAogICAgICAgICAgIGFsYnVtX21iaWQsCiAgICAgICAgICAgdHJhY2tfbWJpZCwKICAgICAgICAgICByYXdfdGltZXN0YW1wPXV0cykgJT4lCiAgICBhcnJhbmdlKGRlc2ModGltZXN0YW1wKSkKCm5yb3coc2Nyb2JibGVzKQpgYGAKCkl0IGxvb2tzIGxpa2UgSSBoYXZlIHNjcm9iYmxlZCA3NywxNDcgc29uZ3MgKGluY2x1ZGluZyByZXBlYXQgbGlzdGVucykgc2luY2UgTWF5CjIwMTUuCgpgYGB7ciBjaGVjayBoZWFsdGh9CnByaW50KHN1bShpcy5uYShzY3JvYmJsZXMkYXJ0aXN0KSkpCnByaW50KHN1bShpcy5uYShzY3JvYmJsZXMkYWxidW0pKSkKcHJpbnQoc3VtKGlzLm5hKHNjcm9iYmxlcyR0cmFjaykpKQpgYGAKCk9ubHkgNTA5IG9ic2VydmF0aW9ucyBhcmUgbWlzc2luZyBhbiBhbGJ1bSBhbmQgdGhlIHJlc3QgaGF2ZSBib3RoIGFuIGFydGlzdCBhbmQKdHJhY2ssIG5vdCBiYWQuIFRoYXQgdG90YWxzIHVwIHRvIG9ubHkgYWJvdXQgMC42NSUgb2YgdGhlIG9ic2VydmF0aW9ucyBtaXNzaW5nCnNvbWUgc29ydCBvZiBjcnVjaWFsIGluZm9ybWF0aW9uLgoKTm93IHRoYXQgd2UgaGF2ZSB0aGUgY2xlYW5lZCB1cCBkYXRhIHdlIGNhbiBkbyBzb21lIHJ1ZGltZW50YXJ5IG9ic2VydmF0aW9ucyBvbgpteSBsaXN0ZW5pbmcgaGlzdG9yeS4gU3VjaCBhcyB0aGUgYXJ0aXN0cyB0aGF0IEkgaGF2ZSBsaXN0ZW5lZCB0byB0aGUgbW9zdCBhbG9uZwp3aXRoIHRoZSBhbW91bnQgb2Ygc29uZ3MgdGhhdCBJIGhhdmUgbGlzdGVuZWQgdG9vIGZyb20gdGhlbS4KCmBgYHtyIHRvcCBhcnRpc3RzfQpzY3JvYmJsZXMgJT4lCiAgZ3JvdXBfYnkoYXJ0aXN0KSAlPiUKICBzdW1tYXJpc2UodG90YWw9bigpLCBzb25ncz1uX2Rpc3RpbmN0KHRyYWNrKSkgJT4lCiAgYXJyYW5nZShkZXNjKHRvdGFsKSkKYGBgCgpXZSBjYW4gYWxzbyBwbG90IHRoZSBhbW91bnQgb2Ygc2Nyb2JibGVzIHBlciBtb250aCBvbiBhIG5pY2UgY29sb3JmdWwgZ3JhcGg6CgpgYGB7ciBieSBtb250aCwgZWNobz1UUlVFfQpncm91cGVkX2J5X21vbnRoIDwtIHNjcm9iYmxlcyAlPiUKICBncm91cF9ieShtb250aCA9IGZsb29yX2RhdGUodGltZXN0YW1wLCAnbW9udGgnKSwgCiAgICAgICAgICAgeWVhciA9IGZhY3Rvcih5ZWFyKHRpbWVzdGFtcCkpKSAlPiUKICBzdW1tYXJpc2UoY291bnQ9bigpKQoKZ2dwbG90KGdyb3VwZWRfYnlfbW9udGgsIGFlcyhtb250aCwgY291bnQpKSArCiAgZ2VvbV9jb2woYWVzKGZpbGw9eWVhcikpICsKICBzY2FsZV9maWxsX2JyZXdlcihwYWxldHRlID0gIlNldDIiKSArCiAgbGFicyhmaWxsPSJZZWFyIiwgdGl0bGU9Ikxpc3RlbnMgLyBtb250aCIsIGNhcHRpb249InNvdXJjZTogbGFzdC5mbSIsCiAgICAgICB5PSJsaXN0ZW5zL3Njcm9iYmxlcyIsIHg9Ik1vbnRoIikKYGBgCgpJdCdzIGV4dHJlbWVseSBvZGQgdGhhdCB0aGUgRmVicnVhcnkgYW5kIE1hcmNoIG9mIDIwMTkgYm90aCBoYXZlIHByZXR0eSBtdWNoIApleGFjdGx5IHRoZSBzYW1lIGxpc3RlbiBjb3VudCwgRmVicnVhcnkgd2l0aCBgMjM3NGAgc2Nyb2JibGVzIGFuZCBNYXJjaCB3aXRoCmAyMzc1YCBzY3JvYmJsZXMuCgojIyBDb21wYXJpc29uIHdpdGggbXkgc3BvdGlmeSBsaWJyYXJ5CgpJIGhhdmUgYmVlbiB1c2luZyB0aGUgc3BvdGlmeSBsaWJyYXJ5IGZ1bmN0aW9uIGFzIGEgcnVkaWVtZW50YXJ5IGZvcm0gb2YKImxpa2luZyIgc29uZ3MgZXZlciBzaW5jZSBJIHN0YXJ0ZWQgdHJhY2tpbmcgbXkgbXVzaWMgd2l0aCBsYXN0LmZtLCBtZWFuaW5nIHRoYXQKZXZlcnkgdGltZSBJIGhhdmUgbGlrZWQgYSBzb25nIGl0IGdvZXMgaW50byB0aGUgc3BvdGlmeSBsaWJyYXJ5IHdoaWNoIHRoYW5rZnVsbHkKdHJhY2tzIHRoZSBhZGRlZCB0aW1lIGFzIHdlbGwuCgpJIHdyb3RlIGEgc2ltcGxlIHB5dGhvbiBzY3JpcHQgdG8gcXVlcnkgdGhlIHNwb3RpZnkgYXBpIHRvIGdldCBhIGxpc3Qgb2YgYWxsCnRyYWNrcyBhbmQgdGhlbiB0byBuaWNlbHkgc2F2ZSB0aGF0IHRvIGEgY3N2IGZpbGUuCgpgYGB7cHl0aG9uIHNwb3RpZnkgc2F2ZWQgdHJhY2tzLCBldmFsPUZBTFNFfQppbXBvcnQgcmVxdWVzdHMKaW1wb3J0IGNzdgppbXBvcnQgbG9nZ2luZwoKIyMgUXVlcnkgc3BvdGlmeSBhcGkgYW5kIGdldCBkYXRhCgpsb2dnaW5nLmJhc2ljQ29uZmlnKGxldmVsPWxvZ2dpbmcuREVCVUcpICMgdG8gc2hvdyByZXF1ZXN0IGxvZ3MKCnVybCA9ICJodHRwczovL2FwaS5zcG90aWZ5LmNvbS92MS9tZS90cmFja3MiCmhlYWRlcnMgPSB7J0F1dGhvcml6YXRpb24nOidCZWFyZXIgPEF1dGggVG9rZW4+J30KIyB0byBnZXQgdGhlIGF1dGggdG9rZW4sIEkganVzdCBjb3B5IHBhc3RlZCB0aGUgdG9rZW4gdGhhdCBJIGdvdCBmcm9tIHRlc3RpbmcKIyBvdXQgdGhlIGRldmVsb3BlciBjb25zb2xlLgoKZmlyc3RfcmVxID0gcmVxdWVzdHMuZ2V0KHVybCwgaGVhZGVycz1oZWFkZXJzLCBwYXJhbXM9eydsaW1pdCc6IDEsJ21hcmtldCc6J1VTJ30pCnRvdGFsX3RyYWNrcyA9IGZpcnN0X3JlcS5qc29uKClbJ3RvdGFsJ10KCmRlZiBwYWdpbmF0ZSh0b3RhbCwgcGVyX3BhZ2UpOgogICAgbnVtX3JlcXVlc3RzID0gdG90YWwgLy8gcGVyX3BhZ2UKICAgIGN1cnJlbnRfcGFnZSA9IDAKICAgIGl0ZW1zID0gW10KICAgIAogICAgZm9yIGkgaW4gcmFuZ2UobnVtX3JlcXVlc3RzICsgMSk6CiAgICAgICAgb2Zmc2V0ID0gNTAgKiBpCiAgICAgICAgcGFyYW1zID0geydsaW1pdCc6IDUwLCAnb2Zmc2V0Jzogb2Zmc2V0LCAnbWFya2V0JzogJ1VTJ30KICAgICAgICAKICAgICAgICByZXEgPSByZXF1ZXN0cy5nZXQodXJsLCBoZWFkZXJzPWhlYWRlcnMsIHBhcmFtcz1wYXJhbXMpCiAgICAgICAgCiAgICAgICAgaXRlbXMuZXh0ZW5kKHJlcS5qc29uKClbJ2l0ZW1zJ10pCiAgICAgICAgCiAgICByZXR1cm4gaXRlbXMKCiMgdGhlIHNwb3RpZnkgYXBpIGlzIGNhcGFibGUgb2Ygb25seSA1MCB0cmFja3MgcGVyIHBhZ2UKc2F2ZWRfdHJhY2tzID0gcGFnaW5hdGUodG90YWxfdHJhY2tzLCA1MCkKCiMjIFNhdmUgZGF0YSBmcm9tIGFwaSB0byBjc3YKCmZvcm1hdHRlZF9kYXRhID0gW10KCmZvciB0cmFjayBpbiBzYXZlZF90cmFja3M6CiAgICB0cmFja19pbmZvID0ge30KICAgIHRyYWNrX2luZm9bJ3RpbWVzdGFtcCddID0gdHJhY2tbJ2FkZGVkX2F0J10KICAgIHRyYWNrX2luZm9bJ2FydGlzdCddID0gdHJhY2tbJ3RyYWNrJ11bJ2FydGlzdHMnXVswXVsnbmFtZSddCiAgICB0cmFja19pbmZvWydhbGJ1bSddID0gdHJhY2tbJ3RyYWNrJ11bJ2FsYnVtJ11bJ25hbWUnXQogICAgdHJhY2tfaW5mb1sndHJhY2snXSA9IHRyYWNrWyd0cmFjayddWyduYW1lJ10KICAgIAogICAgZm9ybWF0dGVkX2RhdGEuYXBwZW5kKHRyYWNrX2luZm8pCgp3aXRoIG9wZW4oJ3Nwb3RpZnktc2F2ZWQuY3N2JywgJ3cnKSBhcyBzYXZlX2ZpbGU6CiAgICBmaWVsZHMgPSBbJ3RpbWVzdGFtcCcsICdhcnRpc3QnLCAnYWxidW0nLCAndHJhY2snXQogICAgc2F2ZXdyaXRlciA9IGNzdi5EaWN0V3JpdGVyKHNhdmVfZmlsZSwgZmllbGRuYW1lcz1maWVsZHMsIHF1b3Rpbmc9Y3N2LlFVT1RFX0FMTCkKICAgIHNhdmV3cml0ZXIud3JpdGVoZWFkZXIoKQogICAgCiAgICBmb3Igcm93IGluIGZvcm1hdHRlZF9kYXRhOgogICAgICAgIHNhdmV3cml0ZXIud3JpdGVyb3cocm93KQpgYGAKCgpOb3cgdGhhdCB3ZSBoYXZlIHRoZSBuaWNlbHkgZm9ybWF0dGVkIGNzdiBmaWxlIHdpdGggYWxsIHRoZSB0cmFja3MgdGhhdCBJIGhhdmUKc2F2ZWQgdG8gbXkgc3BvdGlmeSBsaWJyYXJ5IHdlIGNhbiBkbyBzb21lIGFuYWx5c2lzIG9uIGl0IGFuZCBjb21wYXJlIGl0IHdpdGggbXkKbGlzdGVuaW5nIGhpc3RvcnkuCgpgYGB7ciByZWFkIHNhdmVkLCBlY2hvPUZBTFNFfQpzYXZlZCA8LSByZWFkX2Nzdigic3BvdGlmeS1zYXZlZC5jc3YiLAogICAgICAgICAgICAgICAgICBjb2xfdHlwZXMgPSAKICAgICAgICAgICAgICAgICAgICBjb2xzKHRpbWVzdGFtcCA9IGNvbF9kYXRldGltZShmb3JtYXQgPSAiJVktJW0tJWRUJUg6JU06JVNaIikpKQoKc2F2ZWQKYGBgCgo=