The Data

Recently, Medium contributor Sebastian Wolf released a package that extracts music-listening data from last.fm. He wrote about it here

I decided to use this package to explore my music listening habits. In particular, I wanted look at differences in listening habits between weekdays and weekends

Let’s start by downloading the package “analyze_last_fm”

And load-in our dependencies :

An API Key from last.fm is required to use this package. It can be requested here

The 32-character key needs to be saved as a data-object, as per below

api_key <- “insert api key here”

Thereafter we can retrieve last.fm data for any specified user. My username is Perkski which I thought was cool back in 2008

Let’s get some data :

We can now extract the dataframes for each year, and bind into a single dataframe

The dataframe summarises artist, track, album and time of listening. It appears New Years Eve 2016 was a bit of a 70’s/80’s themed affair!

artist track album uts datetext
The Rolling Stones Start Me Up Jump Back - The Best of the Rolling Stones ’71 - ’93 (Remastered 2009) 1483181285 31 Dec 2016, 10:48
The Rolling Stones Start Me Up Jump Back - The Best of the Rolling Stones ’71 - ’93 (Remastered 2009) 1483181284 31 Dec 2016, 10:48
Whitesnake Here I Go Again ’87 (2003 Remaster) Best of Whitesnake 1483181072 31 Dec 2016, 10:44
Whitesnake Here I Go Again ’87 (2003 Remaster) Best of Whitesnake 1483181071 31 Dec 2016, 10:44
The Doors Light My Fire The Very Best of The Doors 1483180654 31 Dec 2016, 10:37
The Doors Light My Fire The Very Best of The Doors 1483180653 31 Dec 2016, 10:37

Data Cleaning

Cleaning this dataframe required three broad steps :

  • Removal of non-music scrobbles,
  • Removal of duplicate scrobbles
  • Correction of timezone

I’ll first remove non-music scrobbles

The aforementioned duplicates appear to be a quirk of my last.fm account. I believe it to be a glitch caused by using two devices to scrobble music. Occasionally, a single song-listen will be duplicated, once from each device

My first approach to remedying this problem is to remove any scrobble for which there exist identical date-times. These are clear duplicates, and removes ~ 4000 rows

The data-set still has a large number of duplicates, with non-identical date-times. Knowing my listening habits I rarely listen to songs back to back, so I know these are duplications

Although a little crude, my solution is to combine artists/tracks into a single column, then delete the first instance of any songs that occur in succession. This is not perfect as it will also delete any legitimate back-to-back listens of a song. However, it gives a reasonably strong approximation of my actual listening habits

This step removes ~ 1000 further rows, and creates a much more accurate data-set

The final problem is that the data is captured in Universal Standard Time. As such, the listening times are not accurate

Before remediating this, some re-formatting of the date-time data is required. The goals here are to:

  • Create separate date and time columns
  • Convert date and time from factors into appropriately formatted data types
  • Convert UST to Melbourne/Sydney EST

Let’s duplicate the date-time variable, then separate into date and time columns

Convert our date-time data into Melbourne Eastern Standard Time …

  • 11 hours from UST

Re-name our columns

Re-examining our table, we now have more flexible date and time data, in the correct format, and correct timezone!

ID Artist Track Album DateTime.UT Date.UT Time.UT DateTime.EST Date.EST Time.EST
1 The Rolling Stones Start Me Up Jump Back - The Best of the Rolling Stones ’71 - ’93 (Remastered 2009) 2016-12-31 10:48:00 2016-12-31 10:48 2016-12-31 21:48:00 2016-12-31 21:48:00
2 Whitesnake Here I Go Again ’87 (2003 Remaster) Best of Whitesnake 2016-12-31 10:44:00 2016-12-31 10:44 2016-12-31 21:44:00 2016-12-31 21:44:00
3 The Doors Light My Fire The Very Best of The Doors 2016-12-31 10:37:00 2016-12-31 10:37 2016-12-31 21:37:00 2016-12-31 21:37:00
4 Prince & The Revolution Purple Rain Ultimate: Prince 2016-12-31 10:27:00 2016-12-31 10:27 2016-12-31 21:27:00 2016-12-31 21:27:00
5 DMA’s lay down Lay Down - Single 2016-12-31 10:07:00 2016-12-31 10:07 2016-12-31 21:07:00 2016-12-31 21:07:00
6 The War on Drugs Red Eyes Lost in the Dream 2016-12-31 10:00:00 2016-12-31 10:00 2016-12-31 21:00:00 2016-12-31 21:00:00

Extract more information concerning day of the week

Now that we accurate date and time data, we can extract to further features necessary to compare day-to-day listening habits for weekdays versus weekends

The first feature will be to extract hour of the day as a grouping variable

The way to do this is to round the listening time to the nearest hour

For example, a song listened to at 1:49:23PM will be rounded to 2:00:00, while a song listened to at 1:29:59PM will be rounded to 1:00:00

This variable will serve as our x-axis to look at listening patterns across the day in a later plot

The second feature to be extracted is a grouping variable distinguishing weekdays from weekends. This is easy enough to do by first extracting out the day of the week, then further separating Monday-Friday from Saturday-Sunday

We’ll preview our table one last time, in its complete state

ID Artist Track Album DateTime.UT Date.UT Time.UT DateTime.EST Date.EST Time.EST Hour DayofWeek DayType
1 The Rolling Stones Start Me Up Jump Back - The Best of the Rolling Stones ’71 - ’93 (Remastered 2009) 2016-12-31 10:48:00 2016-12-31 10:48 2016-12-31 21:48:00 2016-12-31 21:48:00 22:00 Saturday Weekend
2 Whitesnake Here I Go Again ’87 (2003 Remaster) Best of Whitesnake 2016-12-31 10:44:00 2016-12-31 10:44 2016-12-31 21:44:00 2016-12-31 21:44:00 22:00 Saturday Weekend
3 The Doors Light My Fire The Very Best of The Doors 2016-12-31 10:37:00 2016-12-31 10:37 2016-12-31 21:37:00 2016-12-31 21:37:00 22:00 Saturday Weekend
4 Prince & The Revolution Purple Rain Ultimate: Prince 2016-12-31 10:27:00 2016-12-31 10:27 2016-12-31 21:27:00 2016-12-31 21:27:00 21:00 Saturday Weekend
5 DMA’s lay down Lay Down - Single 2016-12-31 10:07:00 2016-12-31 10:07 2016-12-31 21:07:00 2016-12-31 21:07:00 21:00 Saturday Weekend
6 The War on Drugs Red Eyes Lost in the Dream 2016-12-31 10:00:00 2016-12-31 10:00 2016-12-31 21:00:00 2016-12-31 21:00:00 21:00 Saturday Weekend

Create and visualise plots

We are now ready to explore our data. To begin, I’ve created a basic theme for our plots

Before directly contrasting weekdays with weekends, it’s worth a look at the top 20 artists I listened to between 2016 and 2018. Code and plot below

Clearly I’ve been on a War on Drugs binge :D

Let’s now construct a line-plot for listening habits across the day, for weekdays only

There are some clear trends in my weekday listening habits …

  • Almost all music listening occurs between 9AM - 5PM (while I’m at work!)
  • Peak listening times are around 12PM and 3-4PM
  • A clear drop occurs around 1230PM - 130PM … When I go on lunch!

And finally, a line-plot for listening habits across the day, for weekends

Weekends clearly produce a different pattern of music listening to weekdays

  • Peak listening time is between 7PM-8PM. I will hasten a guess this corresponds to music for ‘social purposes’ - dinner parties, guests over etc
  • Relatively speaking, a higher proportion of scrobbles in the wee-hours with quite a few listens around 12AM … Party-time!

Conclusion

My personal music listening habits vary between weekdays and weekends. On weekdays, most listening is concentrated in the early afternoon, and almost all listening occurs within work-hours. On weekends, most listening occurs in the early to late evening, probably overlapping with social engagement