A Breakdown of Spotify Curated Playlists

Findings

Average Song Popularity by Year and Genre

The scatter plot below depicts the average popularity score of songs included in the spotify curated playlists by year for each genre. Popularity scores ranged from 0 to 1000, with 1000 being the most popular. There is an obvious trend that pop is the most popular genre. There is a clear recency bias towards newer songs, which is logical based on the normality of users listening to more recent songs.

prod =  spotify.groupby(['Year','playlist_genre']).track_popularity.mean().sort_index(ascending = False).head(126).reset_index(name='mean')
prod['mean'] = prod['mean']*10

plt.figure(figsize=(22,15))

plt.scatter(prod['playlist_genre'],prod['Year'], marker ='8', cmap = 'YlOrRd',
            c=prod['mean'], s = prod['mean'], edgecolors = 'black')
plt.title('Average Song Popularity by Year and Genre', fontsize = 20)
plt.xlabel('Genre', fontsize = 18, labelpad = 35)
plt.ylabel('Year', fontsize = 18,labelpad = 35 )

cbar = plt.colorbar()
cbar.set_label('Average Popularity', rotation=270, fontsize=18, color = 'black', labelpad = 35);
colorbar_ticks = [200,250,300,350,400,450,500,550];
cbar.set_ticks(colorbar_ticks);


plt.xticks(fontsize = 14, color = 'black');

y_ticks = [*range(int(prod['Year'].min()), 2021,1   )];
plt.yticks(y_ticks, fontsize = 18, color = 'black');

Trend of Songs Included in Spotify Curated Playlists by Release Date Year

The Line plot below shows the number of songs included in the playlists based on their release date. Songs are more likely to be included with increased popularity, so the same recency bias trend seen in the previous chart, is also demonstrated here. There is also a potential skew in the data based on the relatively new technology of streaming music. Many artists that have older songs may not release music on Spotify. This effect is likely compounded by the age range of Spotify users. Many people who would potentially listen to the older songs may not use Spotify to listen to music.


spotifyRY = spotify.groupby(['Year'])['track_name'].count().reset_index(name='Tracks_included');
spotifyRY = pd.DataFrame(spotifyRY);
last_row = len(spotifyRY) -1
spotifyRY = spotifyRY.drop(spotifyRY.index[last_row]);

fig = plt.figure(figsize = (18,10));
ax= fig.add_subplot(1,1,1);

plt.plot(spotifyRY['Year'], spotifyRY['Tracks_included'], c = 'DarkOrange')
plt.title('Trend of Songs Included in Spotify Curated Playlists by Release Date Year', fontsize = 20)
plt.xlabel('Year', fontsize = 15, labelpad = 35)
plt.ylabel('Songs Released', fontsize = 15,labelpad = 35 )
plt.grid(axis = 'y')

ax.yaxis.set_major_formatter(FuncFormatter(lambda x, p: format(int(x), ',')));

Top 10 Tracks by Number of Times Song Was Included in a Different Spotify Curated Playlists

This bar chart depicts the top 10 songs that have been included the most times in different curated playlists. This is another way to evaluate the most popular songs at the time (Jan. 2020). The more popular the song, the more likely it will be included on more playlists.


trackpop = spotify.groupby('track_name').track_popularity.count().reset_index().sort_values(by='track_popularity', ascending = False).head(10)
trackpop = pd.DataFrame(trackpop)

plt.figure(figsize = (16,9))
plt.bar(trackpop['track_name'], trackpop['track_popularity'], label = 'Track_popularity', color= 'DarkOrange')
plt.title('Top 10 Tracks by Number of Times Song Was Included in a Different Spotify Curated Playlists', fontsize = 17)
plt.xlabel('Song', fontsize = 15, labelpad = 25)
plt.ylabel('Times Included', fontsize = 15, labelpad = 25 )
plt.grid(axis = 'y')

Number of Songs From Each Genre and Sub-Genre

The pie chart depicts the percentage of genres and respective sub-genres that were included in the selection of songs/playlists in the data set. It is clear that the data was evenly extracted between the 6 genres: EDM, Rock, Rap, Latin, R&B, and Pop. I do not think this would be an accurate description of the entire Spotify song catalog, but does provide a great sample size and understanding of the songs when evaluating specific trends across the genres and sub-genres.


genre = spotify.groupby(['playlist_genre'])['playlist_subgenre'].value_counts().reset_index(name='Total')
genre = pd.DataFrame(genre);

num_out_c = len(genre.playlist_genre.unique());
out_c_ref = np.arange(num_out_c)*4;

num_in_c = len(genre.playlist_subgenre.unique());
all_c_ref = np.arange(num_out_c + num_in_c);

in_c_ref =[]
for each in all_c_ref:
    if each not in out_c_ref:
        in_c_ref.append(each);

fig = plt.figure(figsize = (12,12))
ax = fig.add_subplot(1,1,1);

colormap = plt.get_cmap('cet_glasbey_cool')
outer_colors = colormap(out_c_ref);

all_fines = genre.Total.sum();

genre.groupby(['playlist_genre'])['Total'].sum().plot(
kind = 'pie', radius = 1, colors = outer_colors, pctdistance = 0.85, labeldistance = 1.1, 
wedgeprops = dict(edgecolor = 'w'), textprops = {'fontsize':18}, 
    autopct = lambda p: '{:.1f}%\n({:.0f})'. format(p,(p/100)*all_fines), startangle = 90)

inner_colors = colormap(in_c_ref);

genre.Total.plot(
kind = 'pie', radius = .7, colors = inner_colors, pctdistance = 0.47, labeldistance = .77, 
wedgeprops = dict(edgecolor = 'w'), rotatelabels =True,
    textprops = dict(rotation_mode = 'anchor', va='center', ha='center'), 
    labels = genre.playlist_subgenre,
    autopct = '%1.1f%%', startangle = 90)


ax.yaxis.set_visible(False);
plt.title('Number of Songs From Each Genre and Sub-Genre', fontsize = 20);
ax.axis('equal');
plt.tight_layout()

Number of Songs Included on Curated Playlists by Energy and Tempo

The heat map below breaks down the sample songs into 10 equal energy rating buckets and 10 equal tempo range buckets. There is a clear distinction that songs with a higher energy rating and a tempo between 96-144 are far more likely to be included in the playlists. Many of these songs with increased energy and tempo are viewed as “catchy” or “happy”, which will likely increase the popularity of the song.

labels = ["0-24", "24-48", "48-72", "72-96", "96-120", "120-144", "144-168", "168-192", "192-216", "216-240"];

spotify['bucket'] = pd.cut(spotify['tempo'],bins = [0, 24, 48, 72, 96, 120, 144, 168, 192, 216, 240],  labels= labels);

labels2 = ["0.0-0.1", "0.1-0.2", "0.2-0.3", "0.3-0.4", "0.4-0.5", "0.5-0.6", "0.6-0.7", "0.7-0.8", "0.8-0.9", "0.9-1.0"];

spotify['enbuck'] = pd.cut(spotify['energy'],bins = [0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1],  labels= labels2);

hm = spotify.groupby(['bucket','enbuck'])['track_name'].count().reset_index(name='new');

hmcount = spotify.groupby(['bucket'])['danceability'].mean().reset_index();

hm_df = pd.pivot_table(hm, index='bucket', columns = 'enbuck',values = 'new');

fig = plt.figure(figsize = (15,12));
ax = fig.add_subplot(1,1,1);

comma_fmt = FuncFormatter(lambda x , p: format(int(x), ','));

ax = sns.heatmap(hm_df, linewidth = 0.2, annot =True, cmap = 'coolwarm', fmt = ',.0f', 
                square = True, annot_kws = {'size':11},
                cbar_kws = {'format': comma_fmt, 'orientation':'vertical'})
plt.title('Number of Songs Included on Curated Playlists by Energy and Tempo', fontsize = 18, pad = 15)
plt.xlabel('Energy Rating', fontsize = 18, labelpad = 10);
plt.ylabel('Tempo', fontsize = 18, labelpad =10);
plt.yticks(rotation = 0, fontsize = 14);
plt.xticks(fontsize = 14);

ax.invert_yaxis();

cbar = ax.collections[0].colorbar;
cbar.set_label('Number of Songs', rotation = 270, color = 'black', fontsize = 14, labelpad = 25);

A Breakdown of Spotify Curated Playlists

11/12/2023

Introduction

Dataset