Comparing weekend versus weekday music listening habits via the last.fm API

The Data

Recently, Medium contributor Sebastian Wolf released a package that extracts music-listening data from last.fm. He wrote about it here

I decided to use this package to explore my music listening habits. In particular, I wanted look at differences in listening habits between weekdays and weekends

Let’s start by downloading the package “analyze_last_fm”

devtools::install_github("zappingseb/analyze_last_fm")

And load-in our dependencies :

library(analyzelastfm)
library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
library(kableExtra)
library(anytime)
library(lubridate)
library(zoo)
library(ggplot2)
library(plotly)

An API Key from last.fm is required to use this package. It can be requested here

The 32-character key needs to be saved as a data-object, as per below

api_key <- “insert api key here”

Thereafter we can retrieve last.fm data for any specified user. My username is Perkski which I thought was cool back in 2008

Let’s get some data :

# Read in data for 2016, 2017 and 2018
last.fm.data.2018 <- UserData$new("Perkski", api_key, 2018)
last.fm.data.2017 <- UserData$new("Perkski", api_key, 2017)
last.fm.data.2016 <- UserData$new("Perkski", api_key, 2016)

We can now extract the dataframes for each year, and bind into a single dataframe

# Extract out datframes from last.fm environments
last.fm.df.2018 <- as.data.frame(as.list(last.fm.data.2018$data_table))
last.fm.df.2017 <- as.data.frame(as.list(last.fm.data.2017$data_table))
last.fm.df.2016 <- as.data.frame(as.list(last.fm.data.2016$data_table))

# Bind 2016 2017 and 2018
last.fm.df <- rbind(last.fm.df.2016, last.fm.df.2017, last.fm.df.2018)

The dataframe summarises artist, track, album and time of listening. It appears New Years Eve 2016 was a bit of a 70’s/80’s themed affair!

artist	track	album	uts	datetext
The Rolling Stones	Start Me Up	Jump Back - The Best of the Rolling Stones ’71 - ’93 (Remastered 2009)	1483181285	31 Dec 2016, 10:48
The Rolling Stones	Start Me Up	Jump Back - The Best of the Rolling Stones ’71 - ’93 (Remastered 2009)	1483181284	31 Dec 2016, 10:48
Whitesnake	Here I Go Again ’87 (2003 Remaster)	Best of Whitesnake	1483181072	31 Dec 2016, 10:44
Whitesnake	Here I Go Again ’87 (2003 Remaster)	Best of Whitesnake	1483181071	31 Dec 2016, 10:44
The Doors	Light My Fire	The Very Best of The Doors	1483180654	31 Dec 2016, 10:37
The Doors	Light My Fire	The Very Best of The Doors	1483180653	31 Dec 2016, 10:37

Data Cleaning

Cleaning this dataframe required three broad steps :

Removal of non-music scrobbles,
Removal of duplicate scrobbles
Correction of timezone

I’ll first remove non-music scrobbles

# Remove non-music scrobbles
last.fm.df <- last.fm.df[ !(last.fm.df$artist %in% 
c("<Sconosciuto>", "1 Hour of Relaxing Zelda", "2814", "Valhalla DSP",
  "Valhalla DSP Plugin Presets", "inclair Broadcast Group",
  "GoldenEye", "Larry David Is OK With Women Who Only Love Fame",
  "Outlaw King", "Game Of Thrones Season 6 Episode 10 Music",
  "ValhallaVintageVerb Ambient Guitar Jam", "Valhalla Shimmer Reverb",
  "Garageband Quick Tip", "Perkot", "Ozzy Man Reviews: Iguana vs Snakes",
  "The Simpsons")), ]

The aforementioned duplicates appear to be a quirk of my last.fm account. I believe it to be a glitch caused by using two devices to scrobble music. Occasionally, a single song-listen will be duplicated, once from each device

My first approach to remedying this problem is to remove any scrobble for which there exist identical date-times. These are clear duplicates, and removes ~ 4000 rows

# Remove duplicate tracks where scrobble has doubled
last.fm.df <- last.fm.df[!duplicated(last.fm.df[c("datetext")]),]

The data-set still has a large number of duplicates, with non-identical date-times. Knowing my listening habits I rarely listen to songs back to back, so I know these are duplications

Although a little crude, my solution is to combine artists/tracks into a single column, then delete the first instance of any songs that occur in succession. This is not perfect as it will also delete any legitimate back-to-back listens of a song. However, it gives a reasonably strong approximation of my actual listening habits

This step removes ~ 1000 further rows, and creates a much more accurate data-set

# Combine artist and song into single column 
last.fm.df <- last.fm.df %>% 
  mutate(artist.track = str_c(artist," - ",track))

# This will delete first instance of any two tracks occurring in succession
last.fm.df <- as.data.table(last.fm.df)[, .SD[1], by = rleid(artist.track)]

The final problem is that the data is captured in Universal Standard Time. As such, the listening times are not accurate

Before remediating this, some re-formatting of the date-time data is required. The goals here are to:

Create separate date and time columns
Convert date and time from factors into appropriately formatted data types
Convert UST to Melbourne/Sydney EST

Let’s duplicate the date-time variable, then separate into date and time columns

# Separate into time and date
last.fm.df <- separate(last.fm.df, datetext, 
                       into = c("Date.UT", "Time.UT"), 
                       sep = ",")

# convert data into more suitable format
last.fm.df$Date.UT = as.Date(last.fm.df$Date.UT, "%d %B %Y")

# Combine date and time back together
last.fm.df <- last.fm.df %>% 
  unite(DateTime.UT, c(Date.UT, Time.UT), sep = " ", remove = FALSE)

# change to time format 
last.fm.df$DateTime.UT <- as.POSIXct(last.fm.df$DateTime.UT, 
                                     format="%Y-%m-%d %H:%M")

Convert our date-time data into Melbourne Eastern Standard Time …

11 hours from UST

# Change to EST
last.fm.df$DateTime.EST <- last.fm.df$DateTime.UT + hours(11)

# Duplicate Date Time
last.fm.df$DateTime.EST2 = last.fm.df$DateTime.EST

# Separate into time and date
last.fm.df <- separate(last.fm.df, DateTime.EST2, 
                       into = c("Date.EST", "Time.EST"), 
                       sep = " ")

# Convert date from character format to date format
last.fm.df$Date.EST = as.Date(last.fm.df$Date.EST)

# Delete unrequired UST column  
last.fm.df <- subset(last.fm.df, select = -c(5))

Re-name our columns

# Change Names
colnames(last.fm.df) <- c("ID", "Artist", "Track", "Album", 
                          "DateTime.UT", "Date.UT", "Time.UT",
                          "DateTime.EST", "Date.EST", "Time.EST")

Re-examining our table, we now have more flexible date and time data, in the correct format, and correct timezone!

ID	Artist	Track	Album	DateTime.UT	Date.UT	Time.UT	DateTime.EST	Date.EST	Time.EST
1	The Rolling Stones	Start Me Up	Jump Back - The Best of the Rolling Stones ’71 - ’93 (Remastered 2009)	2016-12-31 10:48:00	2016-12-31	10:48	2016-12-31 21:48:00	2016-12-31	21:48:00
2	Whitesnake	Here I Go Again ’87 (2003 Remaster)	Best of Whitesnake	2016-12-31 10:44:00	2016-12-31	10:44	2016-12-31 21:44:00	2016-12-31	21:44:00
3	The Doors	Light My Fire	The Very Best of The Doors	2016-12-31 10:37:00	2016-12-31	10:37	2016-12-31 21:37:00	2016-12-31	21:37:00
4	Prince & The Revolution	Purple Rain	Ultimate: Prince	2016-12-31 10:27:00	2016-12-31	10:27	2016-12-31 21:27:00	2016-12-31	21:27:00
5	DMA’s	lay down	Lay Down - Single	2016-12-31 10:07:00	2016-12-31	10:07	2016-12-31 21:07:00	2016-12-31	21:07:00
6	The War on Drugs	Red Eyes	Lost in the Dream	2016-12-31 10:00:00	2016-12-31	10:00	2016-12-31 21:00:00	2016-12-31	21:00:00

Extract more information concerning day of the week

Now that we accurate date and time data, we can extract to further features necessary to compare day-to-day listening habits for weekdays versus weekends

The first feature will be to extract hour of the day as a grouping variable

The way to do this is to round the listening time to the nearest hour

For example, a song listened to at 1:49:23PM will be rounded to 2:00:00, while a song listened to at 1:29:59PM will be rounded to 1:00:00

This variable will serve as our x-axis to look at listening patterns across the day in a later plot

# Round clock-time to nearest hour

# Duplicate time of song scrobble columsn to create 'hour' column 
last.fm.df$Hour = last.fm.df$Time.EST
# Convert to time format
last.fm.df$Hour <- as.POSIXct(last.fm.df$Hour, format = "%H:%M:%S")
# Round to nearest hour
last.fm.df$Hour = format(round(last.fm.df$Hour, units = "hours"), format = "%H:%M")

The second feature to be extracted is a grouping variable distinguishing weekdays from weekends. This is easy enough to do by first extracting out the day of the week, then further separating Monday-Friday from Saturday-Sunday

# Day of the week
last.fm.df$DayofWeek <- weekdays(as.Date(last.fm.df$Date.EST))

# Weekend/Weekday
last.fm.df$DayType[
  last.fm.df$DayofWeek == "Saturday" |
    last.fm.df$DayofWeek == "Sunday"] <-
  "Weekend"
last.fm.df$DayType[is.na(last.fm.df$DayType)] <- "Weekday"

We’ll preview our table one last time, in its complete state

ID	Artist	Track	Album	DateTime.UT	Date.UT	Time.UT	DateTime.EST	Date.EST	Time.EST	Hour	DayofWeek	DayType
1	The Rolling Stones	Start Me Up	Jump Back - The Best of the Rolling Stones ’71 - ’93 (Remastered 2009)	2016-12-31 10:48:00	2016-12-31	10:48	2016-12-31 21:48:00	2016-12-31	21:48:00	22:00	Saturday	Weekend
2	Whitesnake	Here I Go Again ’87 (2003 Remaster)	Best of Whitesnake	2016-12-31 10:44:00	2016-12-31	10:44	2016-12-31 21:44:00	2016-12-31	21:44:00	22:00	Saturday	Weekend
3	The Doors	Light My Fire	The Very Best of The Doors	2016-12-31 10:37:00	2016-12-31	10:37	2016-12-31 21:37:00	2016-12-31	21:37:00	22:00	Saturday	Weekend
4	Prince & The Revolution	Purple Rain	Ultimate: Prince	2016-12-31 10:27:00	2016-12-31	10:27	2016-12-31 21:27:00	2016-12-31	21:27:00	21:00	Saturday	Weekend
5	DMA’s	lay down	Lay Down - Single	2016-12-31 10:07:00	2016-12-31	10:07	2016-12-31 21:07:00	2016-12-31	21:07:00	21:00	Saturday	Weekend
6	The War on Drugs	Red Eyes	Lost in the Dream	2016-12-31 10:00:00	2016-12-31	10:00	2016-12-31 21:00:00	2016-12-31	21:00:00	21:00	Saturday	Weekend

Create and visualise plots

We are now ready to explore our data. To begin, I’ve created a basic theme for our plots

# Minimalistic theme for visualisation 
theme_plot_text <-
  theme(
    plot.title = element_text(size = 10, hjust = 0.5),
    axis.title = element_text(size = 10),
    legend.title = element_text(size = 10),
    axis.text = element_text(size = 7))

Before directly contrasting weekdays with weekends, it’s worth a look at the top 20 artists I listened to between 2016 and 2018. Code and plot below

# Top Artists
Top.Artist <-
  filter(
    last.fm.df) %>% 
      group_by(Artist) %>% 
      summarise(Artist.Listens = n()) %>%
      arrange(desc(Artist.Listens)) %>% 
      top_n(20) %>%
      ggplot(
        aes(x = reorder(Artist, -Artist.Listens), 
            y = Artist.Listens,
            fill = Artist.Listens,
            group = 1)) +
      geom_bar(stat = "identity") +
      labs(x=NULL, y=NULL) +
      scale_fill_gradient2(low = "#DD7E84", mid = "#C72833" , high = "#77181E",
                             midpoint = 600) +
      labs(title = "Total Artist Listens") + 
      theme_bw() +
      theme_plot_text +
      labs(fill = 'Artist Listens') +
      theme(axis.text.x=element_text(angle = -45, hjust = 0))

Clearly I’ve been on a War on Drugs binge :D

Let’s now construct a line-plot for listening habits across the day, for weekdays only

# Weekday
Weekday <-
filter(
  last.fm.df, DayType == "Weekday")  %>% 
  group_by(Hour) %>% 
  summarise(Hourly.Listens = n()) %>%
  ggplot(
    aes(x = Hour, 
        y = Hourly.Listens,
        colour = Hourly.Listens,
        group = 1)) +
  geom_point() +
  geom_line() +
  ylab("Total song listens per hour") +
  xlab("Time of day") +
  scale_x_discrete(breaks=c("00:00", "03:00", "06:00", "09:00", 
                            "12:00", "15:00", "18:00", "21:00"), drop = FALSE) +
  scale_colour_gradient2(low = "#DD7E84", mid = "#C72833" , high = "#77181E",
                         midpoint = 1200) +
  labs(title = "Frequency of music listening across the day (Weekdays)") + 
  theme_bw() +
  theme_plot_text +
  labs(color = 'Hourly Listens')

There are some clear trends in my weekday listening habits …

Almost all music listening occurs between 9AM - 5PM (while I’m at work!)
Peak listening times are around 12PM and 3-4PM
A clear drop occurs around 1230PM - 130PM … When I go on lunch!

And finally, a line-plot for listening habits across the day, for weekends

# Weekend
Weekend <-
filter(
  last.fm.df, DayType == "Weekend")  %>% 
  group_by(Hour) %>% 
  summarise(Hourly.Listens = n()) %>%
  ggplot(
    aes(x = Hour, 
        y = Hourly.Listens,
        colour = Hourly.Listens,
        group = 1)) +
  geom_point() +
  geom_line() +
  ylab("Total song listens per hour") +
  xlab("Time of day") +
  scale_x_discrete(breaks=c("00:00", "03:00", "06:00", "09:00", 
                            "12:00", "15:00", "18:00", "21:00"), drop = FALSE) +
  scale_colour_gradient2(low = "#DD7E84", mid = "#C72833" , high = "#77181E",
                         midpoint = 200) +
  labs(title = "Frequency of music listening across the day (Weekends)") + 
  theme_bw() +
  theme_plot_text +
  labs(color = 'Hourly Listens')

Weekends clearly produce a different pattern of music listening to weekdays

Peak listening time is between 7PM-8PM. I will hasten a guess this corresponds to music for ‘social purposes’ - dinner parties, guests over etc
Relatively speaking, a higher proportion of scrobbles in the wee-hours with quite a few listens around 12AM … Party-time!

Conclusion

My personal music listening habits vary between weekdays and weekends. On weekdays, most listening is concentrated in the early afternoon, and almost all listening occurs within work-hours. On weekends, most listening occurs in the early to late evening, probably overlapping with social engagement