import os 
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'D:/Anaconda3/Library/plugins/platforms' 

Data Set

This public data set is a representation of the top 50 songs in 2019 from Spotify. From this data set, an analysis was performed in order to understand the popularity between the songs, artists, and genre, with a further focus on the beats per minute of the songs.

Popularity

Line Chart

The beginning analysis with the Line Chart was to analyze the average popularity of songs by genre. Understanding that this data is only 50 songs, it was very interesting to note the amount of genres that are represented and how varied the results are across each.

import pandas as pd 
import matplotlib.pyplot as plt

df = pd.read_csv("U:/Spotify_Dataset2019.csv", usecols = ['Popularity', 'Genre'])

avg_popularity = df.groupby('Genre')['Popularity'].mean()
fig, ax = plt.subplots(figsize=(10,6))
avg_popularity.plot(kind='line', ax=ax)
ax.set_title('Average Popularity by Genre')
ax.set_ylabel('Popularity', fontsize=14)
ax.set_xlabel('Genre', fontsize=14)
xticks = [i for i in range(len(avg_popularity.index))]
xticklabels = list(avg_popularity.index)
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
plt.xticks(rotation=90)
plt.show()

Scatter Plot

Diving further into a Scatter Plot of an analysis of popularity of songs by genre, this is able to provide those individual artist counts across both data points. While the Line Chart provided the average to see the moving line across all genres, the Scatter Plot allows for a more in depth look at to what is making up those averages and providing a little more insight into how the averages are as low or as high as they are showing.

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv("U:/Spotify_Dataset2019.csv", usecols = ['Popularity', 'Genre', 'Artist.Name'])

x = df.groupby(['Genre', 'Popularity'])['Artist.Name'].count().reset_index(name='Count of Artists')

plt.figure(figsize=(14,9))

plt.scatter(x['Popularity'], x['Genre'], c=x['Count of Artists'], s=100, edgecolors='black')

plt.title('Top 50 Songs by Artist Count of Genre and Popularity', fontsize=18)
plt.xlabel('Popularity', fontsize=14)
plt.ylabel('Genre', fontsize=14)

cbar = plt.colorbar()
cbar.set_label('Number of Artists', rotation=270, fontsize=14, color='black', labelpad=10)

my_colorbar_ticks = [1,2]
cbar.set_ticks(my_colorbar_ticks)

my_x_ticks = [*range(x['Popularity'].min(), x['Popularity'].max()+1,1)]
plt.xticks(my_x_ticks, fontsize=11, color='black')
plt.show()

Beats Per Minute

Donut Chart

There was some notion that the beats per minute of a song is what renders its popularity. Performing an overall analysis of beats per minute by genre through a Donut Chart provides some of that insight as shown in percentages.

import pandas as pd 
import matplotlib.pyplot as plt

df = pd.read_csv("U:/Spotify_Dataset2019.csv", usecols = ['Genre', 'Beats.Per.Minute'])

total_bpm_by_genre = df.groupby("Genre")["Beats.Per.Minute"].sum()
total_bpm = total_bpm_by_genre.sum()

labels = total_bpm_by_genre.index
sizes = total_bpm_by_genre.values

fig, ax = plt.subplots(figsize=(12,12))

ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=45, pctdistance=0.85, textprops={'fontsize':8})
ax.pie([1], colors=['w'], radius=0.5)
center_circle = plt.Circle((0, 0), 0.5, color='black', fc='white', linewidth=0)
fig.gca().add_artist(center_circle)

plt.text(0, 0, f"{total_bpm}\nTotal BPM", ha='center', va='center', fontsize=18)

plt.title("Total Beats per Minute by Genre", fontsize = 18)
plt.show()

Box Plot

Continuing with the analysis into genre by beats per minute, a Box Plot was utilized to show that distribution and pull any outliers.

import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("U:/Spotify_Dataset2019.csv", usecols = ['Genre', 'Beats.Per.Minute'])

grouped_data = df.groupby('Genre')['Beats.Per.Minute'].mean()

sns.boxplot(x='Genre', y='Beats.Per.Minute', data=df)
plt.xticks(rotation=90)
plt.xlabel("Genre", fontsize=14)
plt.ylabel("Beats Per Minute", fontsize=14)
plt.title("Genre by Beats per Minute", fontsize=20)
plt.show()

Conclusion

Bar Charts

Finally, looking at two bar chart comparisons of popularity and beats per minute by the top 15 track names, it can be seen that the top song that has the most beats per minute is NOT actually the most popular song. From this analysis, one can see that there is no direct correlation between the popularity of songs to the songs’ beats per minute.

import pandas as pd 
import matplotlib.pyplot as plt

df = pd.read_csv("U:/Spotify_Dataset2019.csv", usecols = ['Track.Name', 'Beats.Per.Minute'])
df2 = df[['Track.Name', 'Beats.Per.Minute']].sort_values(by = 'Beats.Per.Minute', ascending=False)
df2.reset_index(drop=True,inplace=True)
df3 = df2.head(15)

plt.bar("Track.Name", "Beats.Per.Minute", data=df3, color="blue")
plt.xlabel("Track Name", fontsize=14)
plt.xticks(rotation=90)
plt.ylabel("Beats Per Minute", fontsize=14)
plt.title("Top 15 Songs by Beats per Minute", fontsize=20)

plt.show()

import pandas as pd 
import matplotlib.pyplot as plt

df = pd.read_csv("U:/Spotify_Dataset2019.csv", usecols = ['Track.Name', 'Popularity'])
df2 = df[['Track.Name', 'Popularity']].sort_values(by = 'Popularity', ascending=False)
df2.reset_index(drop=True,inplace=True)
df3 = df2.head(15)
plt.bar("Track.Name", "Popularity", data=df3, color="red")
plt.xlabel("Track Name", fontsize=14)
plt.xticks(rotation=90)
plt.ylabel("Popularity", fontsize=14)
plt.title("Top 15 Songs by Popularity", fontsize=20)

plt.show()