Introduction

This RMarkdown file provides an analysis of the Spotify Most Streamed Songs dataset using Python for data manipulation and visualization.

Dataset

The dataset contains the following columns:

import pandas as pd

# Load dataset
file_path = "C:/Users/leoan/OneDrive/Desktop/Spotify Most Streamed Songs.csv"
spotify_data = pd.read_csv(file_path)

# Display dataset columns
spotify_data.columns.tolist()
## ['track_name', 'artist(s)_name', 'artist_count', 'released_year', 'released_month', 'released_day', 'in_spotify_playlists', 'in_spotify_charts', 'streams', 'in_apple_playlists', 'in_apple_charts', 'in_deezer_playlists', 'in_deezer_charts', 'in_shazam_charts', 'bpm', 'key', 'mode', 'danceability_%', 'valence_%', 'energy_%', 'acousticness_%', 'instrumentalness_%', 'liveness_%', 'speechiness_%', 'cover_url']

Findings

1. Top Artists by Streams

A bar chart showcasing the top 10 artists by total streams.

import matplotlib.pyplot as plt

# Ensure 'streams' is numeric
spotify_data["streams"] = pd.to_numeric(spotify_data["streams"], errors="coerce")

# Top artists by streams
top_artists = spotify_data.groupby("artist(s)_name")["streams"].sum().nlargest(10)

plt.figure(figsize=(10, 6))
top_artists.plot(kind="bar", color="skyblue")
plt.title("Top 10 Artists by Streams", fontsize=14)
plt.xlabel("Artist", fontsize=12)
plt.ylabel("Total Streams", fontsize=12)
plt.xticks(rotation=45, ha="right")
## (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), [Text(0, 0, 'The Weeknd'), Text(1, 0, 'Taylor Swift'), Text(2, 0, 'Ed Sheeran'), Text(3, 0, 'Harry Styles'), Text(4, 0, 'Bad Bunny'), Text(5, 0, 'Olivia Rodrigo'), Text(6, 0, 'Eminem'), Text(7, 0, 'Bruno Mars'), Text(8, 0, 'Arctic Monkeys'), Text(9, 0, 'Imagine Dragons')])
plt.show()

2. Danceability Distribution

The distribution of danceability scores among the tracks.

import seaborn as sns

plt.figure(figsize=(10, 6))
sns.histplot(spotify_data["danceability_%"], bins=20, kde=True, color="green")
plt.title("Danceability Distribution", fontsize=14)
plt.xlabel("Danceability (%)", fontsize=12)
plt.ylabel("Frequency", fontsize=12)
plt.show()

4. Streams by Danceability

A scatter plot showing the relationship between streams and danceability.

plt.figure(figsize=(10, 6))
plt.scatter(spotify_data["danceability_%"], spotify_data["streams"], alpha=0.7, color="coral")
plt.title("Streams vs Danceability", fontsize=14)
plt.xlabel("Danceability (%)", fontsize=12)
plt.ylabel("Streams", fontsize=12)
plt.grid(True, alpha=0.5)
plt.show()

5. Mode Distribution

A pie chart to visualize the distribution of songs in major and minor keys.

mode_counts = spotify_data["mode"].value_counts()

plt.figure(figsize=(8, 8))
plt.pie(mode_counts, labels=mode_counts.index, autopct='%1.1f%%', startangle=90, colors=["gold", "lightblue"])
## ([<matplotlib.patches.Wedge object at 0x0000015F9F710AD0>, <matplotlib.patches.Wedge object at 0x0000015FA3685F70>], [Text(-1.067868878923177, -0.26392434034654233, 'Major'), Text(1.067868854212792, 0.26392444032764156, 'Minor')], [Text(-0.5824739339580964, -0.14395873109811397, '57.7%'), Text(0.5824739204797047, 0.143958785633259, '42.3%')])
plt.title("Distribution of Modes (Major vs Minor)", fontsize=14)
plt.show()

6. Streams by Release Year

A boxplot showing the distribution of streams for each release year.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Prepare heatmap data
heatmap_data = spotify_data.groupby("released_year").agg(
    total_streams=("streams", "sum"),
    avg_energy=("energy_%", "mean")
).dropna()

heatmap_data["total_streams"] = heatmap_data["total_streams"] / 1e6  # Convert to millions

# Create a heatmap
plt.figure(figsize=(12, 6))
sns.heatmap(
    heatmap_data,
    annot=True,
    fmt=".1f",
    cmap="coolwarm",
    linewidths=0.5,
    cbar_kws={"label": "Values"}
)
plt.title("Yearly Total Streams (Millions) and Average Energy", fontsize=14)
plt.xlabel("Metrics")
plt.ylabel("Year")
plt.show()

Conclusion

This analysis includes six visualizations that highlight key aspects of Spotify’s most-streamed songs dataset. These insights provide a foundation for deeper exploration of the data.