Analyzing Streaming and Concert Data

Author

J. Farmer

Introduction

Music is a daily part of millions of people’s lives. It is a multi-billion dollar industry that impacts the economy and society as a whole. When analyzing music trends, it is possible to learn about what society values. Millions of people go to concerts each year across the world, and millions more stream music daily. My goal with this project is to find out what factors people value when it comes to their music. What things do they consider when listening to music? Do concerts result in more music streams? How can we analyze the success or a song or artist? Throughout this document, I will answer these questions and more as I explore the Spotify top streamed songs of 2023. In doing this, I think we can learn more about the world around us, specifically when it comes to music.

About the Dataset

The dataset I am using is titled “Most Streamed Spotify Songs 2023 🎵📈🎧.” In this dataset, each row is a song. Each column is a different variable:

  • track_name represents the name of the song

  • artist(s)_name represents the name of artist(s) for the song

  • streams represents the total number of streams on Spotify

  • artist_count represents the number of artists contributed to the song

  • released_year represents the year the song was released

    • There are also columns for month and day
  • in_spotify_playlists represents the number of Spotify playlists the song is in

    • There are also columns for Apple and Deezer
  • in_spotify_charts represents the rank of the song on Spotify charts

    • There are also columns for Apple, Deezer, and Shazam
  • bpm represents the beats per minute (tempo)

  • key represents the key of the song

  • mode represents the mode of the song (major or minor)

  • danceability_% represents how suitable the song is for dancing

  • valence_% represents positivity of the song

  • energy_% represents the perceived energy level of the song

  • acousticness_% represents the acoustic sound of the song

  • instrumentalness_% represents the amount of instrumental content in the song

  • liveness_% represents the presence of live performance elements

  • speechiness_% represents the amount of spoken words in the song

All of these variables give us good insight into each song in this dataset. We are able to get a good understanding of what makes a song unique. With this information, I can analyze this data and see if there is any relationship between the variables and the number of streams.

This dataset was found on Kaggle; if you want to learn more about it, that can be found here: https://www.kaggle.com/code/adhamtarek147/most-streamed-spotify-songs-2023/notebook

A link to download the dataset can be found here: https://myxavier-my.sharepoint.com/:x:/g/personal/farmerj5_xavier_edu/EYuF6zb-PpZGuwQ98nf6p40BRTR9gJUe5N7Axu7Qgl3EWA?download=1

Below are some interesting summary statistics for this dataset. It includes the total number of unique songs, artists and streams, the average number of streams, median, min, and max number of streams, and the top song along with it’s artist. In this case, the most streamed song in 2023 was “Blinding Lights” by The Weekend with 3,703,895,074 streams!

       Statistic           Value
1    total_songs             953
2 unique_artists             645
3  total_streams    489458828542
4    avg_streams       514137425
5 median_streams       290530915
6    max_streams      3703895074
7    min_streams            2762
8       top_song Blinding Lights
9     top_artist      The Weeknd

Descriptive Analysis with Visualizations

There are many different relationships we can explore with this dataset. We can use this information to gain an understanding about what factors result in more or less streams. I am looking to determine if there is relationship between the factors in the dataset and streams. If a relationship exists, I want to learn more about the extent of that relationship.

Does the Number of Artists on a Song Effect It’s Streams?

The first plot showed the relationship between total streams and if a song had one or more than one artist. This was a good analysis; however, I want to take it a step further. Now I am going to take a look at the relationship between number of artists on a song and streams. To do this, I created a box plot that shows the number of streams on the y-axis, and the number of artists on the x-axis. This will help us see if the number of artists on a song changes the number of streams, what the average number of streams is, and show outliers.

Here, we get a good view of the streams as well as how many songs apply to each artist. We see that songs with one artist have higher streams and that there are several outliers with very high streams. This is likely for the same reason as the first chart, songs with one artist make up a majority of any given artist’s work. Songs with two and three artists also have outliers with a high amount of streams, as songs with a duo or trio are quite popular among artists. From there, it decreases until we get to songs with seven and eight artists. Here we see a higher median for songs with seven artists.

What is the Impact of Energy, BPM, and Danceability on Streams?

Now we will analyze three variables in the dataset. These are energy, danceability, and BPM. Energy percentage is the perceived energy level of the song, Danceability is how suitable a song is for dancing, and BPM is the beats per minute. These variables do a good job of describing how the song feels when listening to it. The next three scatterplots show how these variables relate to streams. The y-axis is the streams, and the x-axis has each of the variables.

Each of these plots has a line of best fit. The slope of this line shows how related the two variables are for each plot. For energy and BPM, the slope of the line is very close to zero. This means that there is very little variation. In this case, that means energy and BPM do not have any meaningful relationship to streams. An increase or decrease in these variables will have little to no effect on streams. With danceability, the slope is slightly negative. This means that there is a small relationship between danceability and streams, but it is not drastic. A small change in danceability will in turn have a small effect on streams. Songs with higher danceability might get fewer streams, but it will not be meaningful.

These three charts confirm that danceability, energy, and BPM have little to no effect on streams. Artists should not consider these things when creating music, as there are more influential variables which will be discussed throughout this document.

Does the Month a Song is Released Effect It’s Streams?

You may not think about the reasoning behind when a song is released. You may think artists release music at random; this is far from the truth. I am going explore why when artists choose to release music is anything but random. This bar plot shows the average number of streams for songs released each month of the year.

We see that the month of September has the highest number of streams. This means that songs released in the month of September have higher average streams than any other month of the year. Why is this? After doing some research, it is because of the Grammy’s. According to the Recording Academy, the deadline for the 2024 Grammy’s was in September of 2023. This would explain why so many songs and albums were released that month. This means the month of September is a great time for artists to release their new work, so that they can be recognized for the following year’s Grammy Awards.

Read more here: https://www.recordingacademy.com/news/2024-grammys-online-entry-process-oep-explainer-webinar-how-to-submit-music-grammy-nominations-awards

After looking at the chart, we see that the month of January is a close second behind September. Why is this? According to Musik and Film, this may be because “after Christmas radio is starved for new releases.” This makes sense, as the month of December is dominated by Christmas music; there isn’t a high demand for new music that month. This could also explain why songs released in December have the second lowest average streams.

Read more here: https://musikandfilm.com/why-is-january-great-for-your-music-release17799-2/

How Many Songs are Released Each Month?

We already analyzed the average number of streams by the month a song is released. But how many songs are actually released each month? This matters because the volume of releases can influence the average. If a month has significantly more song releases, listeners may be divided in what they listen to, potentially lowering average streams per song. At the same time, months with fewer releases might see higher averages if competition is lower. This chart aims to address this question. The y-axis shows total songs released while the x-axis shows the month of the year. This chart will give us a good idea as to why some months have higher stream averages than others.

Here we see the months of January and May have most songs released. As discussed, the month of January has more releases likely because of the post-Christmas demand for new music. May has the second most released songs. After doing some research, it is likely because of the start of summer. According to PlaylistPush, people tend to demand “pump-up” and “vibey” music as they participate in spring activities. Following a steep decline, the months of June and March follow. Something interesting to consider is the month of September. While it has the highest average number of streams, it has the second lowest number of songs released. This may be attributed to the Grammy push artists have that month.

Does the Season a Song is Released Effect It’s Streams?

After taking a look at individual months, lets break it down into seasons. This will give a broader look at which time of year produces songs with the highest average streams. We know that the months of September and January have the two highest stream averages, but is that enough for these months to carry the whole time of year? To analyze this, I created another bar chart that has four bars, one for each season. Winter includes the months of December, January, and February, Spring includes the months of March, April, and May, Summer includes the months of June, July, and August, and Fall includes the months of September, October, and November.

We see that fall takes the top spot, likely because of the Grammy influence in September. Winter is second likely because of the post-Christmas music releases in the month of January. After this, we see Summer and Spring. It is unclear if anything is dragging these seasons down, but it is clear they don’t have the advantages that the months of September and January have, which help the winter and fall seasons.

Does the number of Spotify Playlists a Song is in Effect that Song’s Streams?

Let’s take a look at Spotify playlists. This number in the dataset represents the number of Spotify playlists a song is in. Does this number effect streams at all? To analyze this, I created a scatterplot. Each dot on the plot represents a song in this dataset, meaning there are 953 songs on this chart. This should give a good representation and should be enough data to answer the question. On the y-axis, there is the total number of streams and on the x-axis, there is the number of Spotify playlists the song is in.

We see that there is absolutely a correlation between the number of streams and the number of Spotify playlists a song is in. This could be for many reasons, but my interpretation is that people listen to playlists often. People have playlists for just about everything: driving, cleaning, showering, exercise, etc. If Spotify users use their playlists regularly, it makes sense that these songs would have more streams.

Do Spotify or Apple Playlists Have a Greater Effect on Streams?

According to RouteNote, Spotify is the largest music streaming service and Apple Music is the seventh largest. However, the two are fierce competitors, especially in the United States. Which one has a greater effect on streams? To analyze this, I created two scatterplots, one for Spotify and one for Apple Music. Each has total streams on the y-axis and number of playlists for their respective services on the x-axis. Each plot also has a line of best fit for quick analysis.

We see that the line of best fit is slightly steeper for Apple than Spotify. This suggests that a song being in more Apple playlists is more strongly associated with higher streams than being in more Spotify playlists. Overall, the relationship is stronger for Apple playlists. Both services have a sharp slope, suggesting that a small change in the playlist count for either service could result in a large change in the number of total streams for a song.

A Look at the Top 25 Artists

This dataset has 953 artists. I want to take a close look at the top artists. To do this, I am going to use the number of songs artists have in this dataset. So, the top 25 artists with the most songs in the dataset will be analyzed. I feel this is a good metric to use, because it doesn’t rely on artists who only have one or two hit songs, but rather focuses on overall, consistent, success. These 25 artists have multiple songs in the dataset, meaning their success is well-rounded. For this chart, the y-axis represents artist, and the x-axis represents the number of top songs they have in the dataset.

We see here that Taylor Swift overwhelmingly has the most songs on this dataset. This means that her songs stream very well, and she has many songs that do so. The Weekend, SZA, Bad Bunny, and Harry Styles also do the same thing, just less than Taylor Swift. While artists have more songs than others on the graph, they all have multiple consistently streamed songs, which is quite an accomplishment.

This list of the top 25 artists will be used for the next part of the analysis, where I explore the relationship between streams and concerts.

About the Secondary Data Source

Now, I am going to use an API to input a secondary data source to analyze the relationship between concerts and streams. The API I chose to use is from the website setlist.fm, which is service that collects and shares setlists for concerts. A setlist is essentially a list of songs that an artist plays at a specific concert. Setlist.fm is a free service, but an API key is needed to access their data.

Check out the website here: https://www.setlist.fm/

In my analysis, I am not interested in the specific songs played at concerts. I am interested in how many songs each artist (in the top 25 list) performed in the calendar year 2023. To do this, I used the API to pull the artist’s name, event date, city, country, and venue. This process is comprised of creating a function to pull every page for each artist, a function for getting each setlist, pulling the top 25 artists (as mentioned earlier), looping it, and combining the results. Following this, I created an additional data frame with the artist name and the number of concerts they had in 2023. The last step was combining this dataset with the Top 25 Artist one I created earlier. This was a long and tedious process that took an exceptional amount of time because of how much troubleshooting and debugging needed to happen. However, I am proud of the result and am excited to share it with you!

Which Artists Had the Most Concerts in 2023?

Of the top 25 artists mentioned earlier, which ones had the most concerts? As a reminder, these top 25 artists are the ones with the most top streamed songs in the year 2023. The chart below shows individual artists on the y-axis and the number of concerts they had in 2023 on the x-axis. With this, we will get a good visual that shows how much artists performed relative to each other.

We see that Adele had the most concerts, followed by Drake and Ed Sheeran. All of these artists had either major tours or residencies in the year 2023. Something interesting about this graph is Taylor Swift’s number of concerts. I wondered why her total wasn’t higher because of The Eras Tour. After doing some research, her tour started in mid-March of 2023 and ended in December of 2024. This means a majority of her concerts were not in 2023, which explains why her concert count isn’t higher than one might expect. Something else that caught my eye are the three artists who are in the top 25 for streams, but did not have any concerts in 2023; they are BTS, Drake and 21 Savage, and Maneskin. How can that be? After doing some research all three have reasons for not touring.

Drake and 21 Savage formed a team when they released their album “Her Loss.” Their songs on the top 25 were from this album. The reason there are zero concerts is because the two did not go on tour together. Drake did tour, and he did have 21 Savage on stage for some concerts, but not all of the songs for tour were from the album the two of them released together.

After doing some research Maneskin did perform concerts in 2023, but the opened for other groups, including the Rolling Stones. They also performed at various music festivals. I believe these things do not count as a complete concert because these events were not solely their concerts, which is why their concert count may be zero in the dataset.

Lastly, BTS did not tour in 2023 because the group is on hiatus. This is because of mandatory military service in South Korea, where the band members are from.

What is the Relationship Between the Number of Top Streamed Songs and Number of Concerts?

I want to see what the relationship between the number of top streamed songs and total concerts, both for the year 2023. I want to see how these variables relate to one another and see if there is any correlation. To do this, I created a scatterplot. The y-axis shows the number of top streamed songs while the x-axis shows the total concerts in 2023. This should give a good idea if performing more concerts has any correlations with streams or vice versa.

Here we see the relationship between artists who have the most top streamed songs and how many concerts they had in 2023. There are some interesting takeaways from this. The first artist I want to discuss is Adele, who had a residency in Las Vegas in 2023. This residency consisted of two weekend shows every week, and there were a total of 100 shows. This explains why she has such a high number of shows, but less songs of the top streamed songs. The second artist is Taylor Swift. Clearly, she dominates the top streamed songs with 34 songs, the next closest being The Weekend with 22. She is in the middle when it comes to concerts because most of her record-breaking her Eras Tour took place in 2024. Lastly, I want to discuss Harry Styles. He is roughly in the middle for both categories. He has 17 top songs and performed 51 total concerts. These concerts are from his Love On Tour tour, which started in 2021 and ended in 2023.

Now, let’s add a line of best fit and see if we can learn anything from that.

This line of best fit is flat. This means that there is no meaningful relationship between how many concerts an artist performed and how many top songs they had. If an artist increased or decreased their number of concerts, it likely would not affect how many top songs they have.

Conclusion

After exploring and analyzing the factors that may influence streams, there are some valuable conclusions that can be made. This dataset has 953 total songs and 645 unique artists. The total number of streams for all songs in this dataset is 489,458,828,542. The songs in this dataset average 514,137,425 streams. The most streamed song in the year 2023 was “Blinding Lights” by the Weekend.

Songs that have one artist average more streams than songs with multiple artists, likely because solo songs make up a majority of most artist’s discographies. We see that after solo songs, songs with two and three artists follow in terms of average streams. This is then followed by songs with seven, four, eight, five, and six artists. Songs released in September had the highest average streams for the year 2023, likely because of the Grammy nominations. This is followed by songs released in January, likely because of post-Christmas/ New Year’s demand for new music. These two months carry their respective seasons of the year as well, with the fall and winter seasons having the highest average streams for songs released in those respective seasons. This is followed by the seasons summer and spring.

When looking at energy, danceability, and BPM, we see that energy and BPM do not have any meaningful relationship to streams. An increase or decrease in these variables will have little to no effect on streams. With danceability, there is a small relationship with streams. Songs with increased danceability might get slightly fewer streams, but it will be a small difference.

The difference between Spotify and Apple Music playlists was also analyzed. For Spotify, the more playlists a song is in, the higher streams the song will get. This makes sense, as many Spotify users use playlists on a regular basis. When comparing Spotify and Apple Music playlists, we see that Apple playlists are more strongly associated with higher streams. However, both have a steep slope, which suggests that even a small change in the number of playlists a song is in could have dramatic effects on that song’s streams.

To analyze overall performance of an artist, I took the top 25 artists in terms of how many songs they had in the dataset. This is a good measure of consistency and high performance. Using this metric, the top artist was Taylor Swift with 34 songs in the dataset. Considering her popular Eras Tour and loyal fanbase, this makes sense. After a sharp decline, the next artists are The Weekend with 22 songs, SZA and Bad Bunny with 19 songs, and Harry Styles with 17 songs. These is quite the discrepancy here, as Harry Styles only has half the number of songs in the dataset as Taylor Shift.

Following this, I used the setlist.fm API as a secondary data source. Here, my goal was to see if there was any relationship between the number of concerts a group performs and the number of songs they had in the dataset. The artists who had the most performances are Adele with 130, Drake with 127, Ed Sheeran with 100, Arctic Monkeys with 89, and IVE with 78. All of these artists were on tour for at least part of 2023, which explains their large number of concerts. The one exception to this is Adele, who had a residence in Las Vegas, where she performed twice every weekend. While Taylor Swift has the most songs in the dataset, she just missed the top five in terms of concerts with 67, putting her in 6th place. Her Ears Tour did not take place until March of 2023, so this explains why her number is lower than expected.

When looking at a scatterplot of concerts performed and songs in the dataset, we get a good visualization of the relationship between concerts and streams. When we add a line of best fit, we see that it is flat. This means there is not a meaningful relationship between concerts performed and how many top streamed songs an artist had. Any change in the number of concerts would have little to no effect on their number of streams.