<- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/carricoc2_xavier_edu/EXm5n8GU6LZApl-wToMW32kBwOIok3l62D1_z9lwVWClcA?download=1") Top500_2013
Final Project: BAIS-462
Introduction
Music is more than just an entertainment industry. It’s a reflection of culture, identity, and history that can be traced back millenia. The greta thing about music, and quite possibly why it is so univerally indeared, is that everyone has a unique taste. The middle-aged farmer from Iowa loves to listen to Kenny Chesney and Willy Nelson, while a teenage girl from Spain listens to Billie Eilish and Ariana Grande. Music is so vast and expansive, that there is always something for everyone to enjoy.
Music, however, has expanded to a science. Which artists are more talented?Which songs have more complex musical composition? What makes a great song or album great? When these questions begun to be asked, we saw music morph into a debate topic, one in which major publications chimed in on.
Few publications have had as much influence on shaping and ranking musical taste as Rolling Stone. In 2013, the publication company released an updated list of the “500 Greatest Albums of All Time,” a vast collection that spans decades, genres, and musical eras. This list serves not only as a celebration of musical diversity, but a bridge for people to cross to increase exposure to new music.
Music has been such a core aspect of my life. Many of my hobbies revolve around the idea of music, such as collecting vinyls, going to concerts, or watching YouTube videos about my favorite songs or artists. Because of this passion for music, I thought it would be beneficial to conduct an analysis on what Rolling Stone considers to be the “best of the best” throughout history, and who knows, even expand my music taste to eras that I have not previously explored.
Questions Driving Insight
As previously stated in the previous section, music has a very expansive impact on culture. Due to this impact, I believe there is grounds for many questions to be asked that could have an answer. For the sake of clarity, each question that is posed will fit into one of the following categories:
Time and Era Specific Trends
This section explores how the distribution of albums across decades reflects changes in musical influence and cultural relevance over time.
Genre & Subgenre Analysis
Here, we examine which musical styles are most represented within the rankings and how well the diversity of genres is represented across the list.
Artist-Level Patterns
This section looks at which artists appear most frequently and how their albums are ranked, revealing patterns of critical acclaim and legacy.
Historical & Cultural Impact
We investigate how cultural context, identity, and historical moments may have influenced which albums were recognized and celebrated.
Comparative Insight
This section compares albums across dimensions like debut vs. legacy works or early vs. modern releases to uncover biases or trends in ranking criteria.
Data
In order to answer any of the categories of questions that have been introduced, data will be necessary. A dataset provided via Kaggle will be used. In the data description section on the website, the creator of this dataset credits MusicBrainz and the Discogs API as the two sources from which they pulled from and referenced.
For this analysis, I extend an invitation to anyone that would like to follow along, or expand this analysis to uncover rresults that I have not. A link to the can be found at https://myxavier-my.sharepoint.com/:x:/g/personal/carricoc2_xavier_edu/EXm5n8GU6LZApl-wToMW32kBwOIok3l62D1_z9lwVWClcA?download=1
The dataset holds a total of six variables, all of which will be used for this analysis. The first variable is Number, which is a ranking of all the Top 500 albums in this dataset, with 1 attributed to what Rolling Stone deems as the greatest album of all time, and 100 as their 100th best album. Next is the Year variable, which logs the year in which the album was released. The Album variables states the formal name of the Album within this list. For this variable, note that some albums, either because of how long the name is or because of other factors, are referred to as abbreviations. A popular example of this is Kanye West’s “My Beautiful Dark Twisted Fantasy” which is commonly known more as “MBDTF”. This variable logs the formal names of albums, or the more popular name if an album has changed names. The Artist variable includes the name of either the solo artist that is credited with the album, or the band. For bands, it does not list each individual member, but rather the entire band’s name. For instance, an observation would be Queen, and not Freddie Mercury, Brian May, John Deacon, and Roger Taylor. Genre and Subgenre are similar variables, with Genre stating the more popular genre the album is associated with, and Subgenre being slightly more specific a genre may be Rock, and a Subgenre may be Hard Rock. Both of these last two variables were taken from the Discogs API.
In any form of data analysis, an important introductory step is to analyze and interpret the key summary statistics for all the variables used in this analysis. Typically, summary statistics are commonly associated with Mean, Standard Deviation, Minimum Value, and Maximum Value. This dataset, however, works more with qualitative variables, so the traditional approach to summary statistics will not be applied. Rather, this section will be used to gather some baseline understanding of the data that we are working with. The code to run this analysis, as well as interpretations, are below:
%>%
Top500_2013 mutate(decade = floor(Year / 10) * 10) %>%
count(decade, sort = TRUE)
# A tibble: 7 × 2
decade n
<dbl> <int>
1 1970 186
2 1960 105
3 1980 85
4 1990 72
5 2000 40
6 1950 10
7 2010 2
The first table that was generated shows each of the 7 decades that are represented in this dataset, along with how many albums in this Top 500 list fall within those decades. Using this output, we can see that the 1970’s dominate this list with 186 albums in the Rolling Stone Top 500, with the 1960’s in second with 105, and the 1980’s rounding out the top three with 85. Unsurprisingly, the 2010’s are sparsely represented on this list with 2 albums, but that is not surprising, as the list was released in 2013.
%>%
Top500_2013 count(Genre, sort = TRUE) %>%
slice_max(n, n = 10)
# A tibble: 11 × 2
Genre n
<chr> <int>
1 Rock 249
2 Funk / Soul 38
3 Hip Hop 29
4 Electronic, Rock 19
5 Rock, Pop 18
6 Rock, Blues 16
7 Folk, World, & Country 13
8 Rock, Folk, World, & Country 9
9 Blues 8
10 Electronic 7
11 Jazz 7
The next output shows the 10 most popular genres that are in this dataset. From this output, we can conclude that Rock accounts for ~50% of all the albums in this dataset. This can be explained by the fact that Rock has more subgenres than any other genre of music, so more music falls under this genre.
%>%
Top500_2013 count(Subgenre, sort = TRUE) %>%
slice_max(n, n = 10)
# A tibble: 10 × 2
Subgenre n
<chr> <int>
1 None 29
2 Pop Rock 22
3 Soul 13
4 Indie Rock 12
5 Alternative Rock 11
6 Classic Rock 10
7 Blues Rock 8
8 Rhythm & Blues, Soul 7
9 Country 6
10 Psychedelic Rock 6
The third output is similar to the second one, but instead of Genre, we are looking at Subgenre. Based on the results, we see that (excluding None) Pop Rock is the most popular subgenre, with 22 observations and Soul is second with 13.
%>%
Top500_2013 count(Artist, sort = TRUE) %>%
filter(n > 1)
# A tibble: 111 × 2
Artist n
<chr> <int>
1 Bob Dylan 10
2 The Beatles 10
3 The Rolling Stones 10
4 Bruce Springsteen 8
5 The Who 7
6 David Bowie 5
7 Elton John 5
8 Led Zeppelin 5
9 Radiohead 5
10 U2 5
# ℹ 101 more rows
Progressing forward, our fourth output has compiled a list of all the Artists in the dataset that have more than 1 album on the Rolling Stone’s Top 500 list. The first place spot is a three-way tie between Bob Dylan, The Beatles, and The Rolling Stone (That seems a little biased!). Each of these three have 10 total albums on the list. A fun little stat is that each artist/band makes up 2% of this list!
%>%
Top500_2013 count(Year) %>%
arrange(desc(n))
# A tibble: 56 × 2
Year n
<dbl> <int>
1 1970 26
2 1972 24
3 1973 23
4 1969 22
5 1968 21
6 1971 21
7 1967 20
8 1975 18
9 1977 18
10 1978 16
# ℹ 46 more rows
Our fifth and final key summary observation is looking into which years have contributed the most to this list. From this result, 1970 curated 26 albums, 1972 had 24, and 1973 pulls in 23 (talk about a Golden Era for music).
While these findings are not the typical summary statistics you would see in a quantitative analysis, they offer some additional insight into the data that we will be working with during this analysis on Rolling Stone’s Top 500 Albums of All Time.
Descriptive Analysis
With an established understanding of the data, and some introductory information about popular trends, we can begin to transition this analysis to bring about new and exciting information. In this section of the analysis, the goal is to visually show insights obtained through this analysis. Each visualization will be followed by a translation that describes what the table is showing, and what that implies about trends and biases within the music industry, specifically by Rolling Stone
Does Rolling Stone Show Bias Toward Older Albums?
The first question that I want to examine is whether the Rolling Stone has any bias in ranking albums when it comes to age. This relationship between Release Year and Rank can be examined in two ways. First, how many albums within the Top 500 are from earlier decades, and are older albums ranked higher than new albums. The appropriate code and visualization are below.
ggplot(Top500_2013, aes(x = Year, y = Number)) +
geom_point(alpha = 0.5, color = "blue") +
geom_smooth(method = "loess", se = FALSE, color = "red") +
scale_y_reverse() +
labs(title = "Do Older Albums Rank Higher?",
x = "Release Year", y = "Rolling Stone Rank") +
theme_minimal()
Looking at this graph, we can conclude, with evidence, that there is a possibility Rolling Stone has a bias for older Albums. This is confirmed in two parts - 1. Rolling Stone has more albums from the 1960’s to 1980’s than from the 1980’s onward. 2. The trend line shows that ranking goes down as time goes on. It is important to note, however, that this may not be entirely a bias. There is a possibility that higher quality music was made in this era, but since music is subjective to the listener, a bias is likely.
Does Practice Make Perfect? The Evolution of Artist Rankings
The next question that I want to test relates with the “practice makes perfect” mantra. Essentially, I want to see if the ranking of albums for artists that appear multiple times on this list improves with each subsequent work. This will essentially “prove” whether the quality of an artist’s work improves as they accumulate experience.
%>%
Top500_2013 filter(Artist == "Bob Dylan" | Artist == "The Beatles" | Artist == "The Rolling Stones") %>%
arrange(Artist, Year) %>%
group_by(Artist) %>%
mutate(album_order = row_number()) %>%
ggplot(aes(x = album_order, y = Number, color = Artist)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
scale_y_reverse() +
labs(title = "Top 3 Most Frequent Artists: Rank Trajectories",
x = "Album Release Order", y = "Rolling Stone Rank") +
theme_minimal() +
theme(legend.title = element_blank())
This graph shows the trend of Album Rankings for the top three artists (based on occurrences in the dataset) according to Rolling Stone. Recall, we know these artists due to our introductory data work from earlier. For clarification, the Y-Axis represents the chronological order of album releases, and the X-Axis shows the ranking of the album. What we see here is the opposite of what was anticipated. Normally, it would be assumed that a profession like music means that more practice allows for a refinement of skills, and better quality of work. This graph shows an inverse relationship, as ranking actually starts high for the earlier works, and sees declines (in general) over release timelines. For example, the Beatles first appearance ranked in the top 50, and their last in the bottom 300. The Rolling Stones started in the 100’s, and ended in the 200’s. Lastly, Bob Dylan started around 100, and finished outside of the top 200.
Ranking Consistency: Which Genres Are More Stable?
The next question I want to examine is if some genres are more consistent in ranking than others. This could provide insight into what genres are typically more well-received in the court of public opinion, and which ones are harder to gain traction with.
%>%
Top500_2013 group_by(Genre) %>%
summarise(Number_SD = sd(Number), count = n()) %>%
filter(count >= 15) %>%
ggplot(aes(x = reorder(Genre, Number_SD), y = Number_SD, fill = Genre)) +
geom_col() +
coord_flip() +
labs(title = "Ranking Volatility by Genre",
x = "Genre", y = "Standard Deviation of Ranks") +
theme_minimal() +
theme(legend.position = "none")
This output only includes genres for which there are at least 15 observations for within this dataset. From this visualization, we can conclude that the low Standard Deviation Genres (Hip Hop, Rock-Pop, Funk/Soul) are ranked close together in the dataset, while the high Standard Deviation Genres (Rock-Blues, Rock, Electronic-Rock) are far from each other. This could indicate one of two factors. Either the quality of music from the low SD genre’s is around the same for each album, or the curator of this list has a personal bias.
Which Artists Cross the Most Genres?
In this analysis, I want to see the artists that are found across multiple different genres.
%>%
Top500_2013 group_by(Artist) %>%
summarise(Genre_Count = n_distinct(Genre)) %>%
filter(Genre_Count > 1) %>%
slice_max(Genre_Count, n = 10, with_ties = FALSE) %>%
ggplot(aes(x = reorder(Artist, Genre_Count), y = Genre_Count, fill = Artist)) +
geom_col() +
coord_flip() +
labs(title = "Top 10 Most Genre-Crossing Artists",
x = "Artist", y = "Number of Genres Represented") +
theme_minimal() +
theme(legend.position = "none")
Looking at this output, we can see that “Various Artists” is found across 5 different genres. No, this isn’t the name of some underground, cross-cultural artist. This includes movie soundtracks, Broadway shows, etc. so the result that this is the most common occurrence of genre swapping. One specific result that is worth highlighting is Prince. Due to his wide music portfolio, there is common debate about what genre he is best known as, which is fitting of his iconic nickname - “The Artist”, an indication he transcended genre.
The Top Subgenres Across Album Rank Tiers
If we split the Top 500 into three sections, what are the most popular subgenres in the Rolling Stone list. Can any inferences be made about which subgenres could result in a higher ranking?
%>%
Top500_2013 mutate(rank_group = case_when(
<= 100 ~ "Top 100",
Number <= 300 ~ "Mid (101–300)",
Number TRUE ~ "Bottom 200"
%>%
)) count(rank_group, Subgenre) %>%
group_by(rank_group) %>%
slice_max(n, n = 5, with_ties = FALSE) %>%
ungroup() %>%
ggplot(aes(x = reorder_within(Subgenre, n, rank_group),
y = n, fill = rank_group)) +
geom_col(show.legend = FALSE, width = 0.7) +
facet_wrap(~ rank_group, scales = "free_y") +
scale_x_reordered() +
coord_flip() +
labs(title = "Top 5 Subgenres in Each Rank Tier",
x = "Subgenre", y = "Number of Albums") +
theme_minimal(base_size = 12) +
theme(strip.text = element_text(face = "bold"))
Visualization 5 shows how subgenres are distributed across rank tiers (Top 100, Mid 101–300, and Bottom 200). A common theme among all three groups is that Rock is popular. Outside of Top 100’s interest in Soul, all present subgroups incorporate rock in some way. Another conclusion that can be drawn from this visualization is the idea that 101-300 and Bottom 200 groups most popular subgenre is None, potentially indicating that a more diverse album that can span across different subgenre’s may lead to a boost into the Top 100.
Tracking Genre Diversity Across Decades
This section explores how the variety of musical genres represented in Rolling Stone’s Top 500 albums has changed over time. By measuring the number of unique genres per decade, we can potentially identify when musical diversity peaked and how it has evolved since.
%>%
Top500_2013 mutate(Decade = floor(Year / 10) * 10) %>%
group_by(Decade) %>%
summarise(Genre_Diversity = n_distinct(Genre)) %>%
ggplot(aes(x = Decade, y = Genre_Diversity)) +
geom_line(size = 2, color = "purple") +
geom_point(size = 4, color ='blue') +
labs(title = "Genre Diversity Over Time",
x = "Decade", y = "Unique Genres") +
theme_minimal()
This visualization measures the variety of album rankings over different years using Genre as the deciding factor. The higher the number of Unique Genres, the more “diverse” the music is. This graph shows that diversity was at it’s highest in the 1970’s. As time progresses beyond the 70’s, however, we see that the diversity of genre’s is going down.
How Genre Composition Shifts Across Rank Quartiles
Pushing forward, another analysis that can be done is on whether or not some genres are disproportionately favored in higher or lower quartiles. This question is primarily asked because of previous analysis uncovering that Rock is a very common and popular genre within Rolling Stone’s rankings.
%>%
Top500_2013 mutate(Genre = fct_lump(factor(Genre), n = 8)) %>%
mutate(rankquartile = ntile(Number, 4)) %>%
count(rankquartile, Genre) %>%
group_by(rankquartile) %>%
mutate(percent = n / sum(n)) %>%
ggplot(aes(x = factor(rankquartile), y = percent, fill = Genre)) +
geom_bar(stat = "identity") +
labs(title = "Genre Composition by Rank Quartile",
x = "Rank Quartile (1 = Best)", y = "Proportion") +
scale_y_continuous(labels = scales::percent_format()) +
theme_minimal()
This visualization shows how genre representation shifts across the four rank quartiles in the Rolling Stone Top 500 list. Genres that make up a larger share of Quartile 1 (Such as Rock and Hip Hop) are more critically favored and more likely to appear in albums in the upper quartile. In contrast, genres more common in lower quartiles may be less consistently praised. A stable presence in the Rock genre across all quartiles suggests broad critical appeal. The “Other” category highlights the presence of less common genres.
Which Years Had the Most Top 100 Albums?
Our 8th analysis will come in the form of looking deeper into the years that brought the most albums that Rolling Stone has listed as the top 100 albums.
%>%
Top500_2013 filter(Number <= 100) %>%
count(Year) %>%
filter(n >= 2) %>%
ggplot(aes(x = Year, y = n)) +
geom_col(fill = "darkorchid1") +
labs(title = "Breakout Years in Top 100 Albums",
x = "Year", y = "Top 100 Albums Released") +
theme_minimal()
Looking at these results, we are seeing a heavy skew to the left, with a majority of the top albums coming in the late 1960’s and early 1970’s. What is jarring, however, is the lack fo Top 100 albums in an era many associate to be a “Golden Era” for music. In the 1980’s we see a true lack of quantity for Top 100 albums. This could further our synopsis that Rolling Stone holds a bias for older music.
Does Prolific Output Correlate with Better Rankings?
The ninth analysis will help answer the question: Do more prolific artists score better rankings, or are fewer albums better?
%>%
Top500_2013 group_by(Artist) %>%
summarise(count = n(), AVG_rank = mean(Number)) %>%
filter(count >= 2) %>%
ggplot(aes(x = count, y = AVG_rank)) +
geom_point(alpha = 1, color = "black") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
scale_y_reverse() +
labs(title = "Do More Albums Mean Better Average Rank?",
x = "Number of Albums", y = "Average Rank") +
theme_minimal()
What can be seen in the graph above is that there seems to be a positive, linear relationship between the average Album Ranking and the quantity of albums within the top 500. However, I refuse to generalize that “more is better” based on these results. We see that for instances where an Artist has 2 or 3 Albums in the dataset, the mean of their ranking is higher than that of instances of 7,8, or even 10 Albums by a single artist are in the top 500. Because of this, more is not better in the case of average album ranking.
The Lifespan of Genres in the Rolling Stone Top 500
Our final analysis, thanks to some additional help from the R subreddit, will look into the lifespan of each genre within this dataset. The objective here is to see when a genre broke through and gained respect, and when it “fell out” of public preference.
%>%
Top500_2013 separate_rows(Genre, sep = ",\\s*") %>%
group_by(Genre) %>%
summarise(
First_Year = min(Year),
Last_Year = max(Year),
Lifespan = Last_Year - First_Year,
Count = n()
%>%
) filter(Count >= 5) %>% # Filter to more prominent genres
ggplot(aes(x = reorder(Genre, -Lifespan), ymin = First_Year, ymax = Last_Year)) +
geom_linerange(color = "steelblue", linewidth = 2) +
geom_point(aes(y = First_Year), color = "forestgreen", size = 2) +
geom_point(aes(y = Last_Year), color = "firebrick", size = 2) +
coord_flip() +
labs(
title = "Genre Lifespan in the Rolling Stone Top 500",
subtitle = "Years Between First and Last Album Appearance by Genre",
x = "Genre", y = "Album Release Year"
+
) theme_minimal()
Looking at this visualization, we can clearly see that Rock has the longest lifespan out of all the genres. Jazz and Pop entered in the same year, 1955! Hip Hop is out youngest genre, which makes sense, as this genre was part of a major, global cultural shift.
Comparison To Newest Rolling Stone Top 500 List
I hope you are all still along for the ride, cause the next stage is about to start. While the information and conclusions that was obtained in the last section was expansive, there is one key limitation: Data.
The data that we have been using up to this point is from a Rolling Stone article from 2013. This was 12 years ago, and so much more incredible music, made by incredible artists, has been made in this window. To fully get the entire scope of how music has evolved in since 2013, our analysis should expand to the most recent Rolling Stone Top 500 Album Ranking List.
In 2023, Rolling Stone published an updated ranking of the Top 500 Albums of All Time. The only problem is that there is no Kaggle dataset that has all of these albums in a nice, clean package for us to examine.
To alleviate this challenge, we can scrape the needed information to analyze the similarities and differences between the two lists utilizing some popular R web-scraping techniques. Feel free to reference to the code below to follow along with this analysis:
Web Scraping Code for Rolling Stone 2023 Top 500 Albums of All Time
<- "https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Albums/500"
Wikipedia_Top500_URL
<- read_html(Wikipedia_Top500_URL)
Wikepedia_Page
<- Wikepedia_Page %>%
Wiki_Table html_nodes("table.wikitable") %>%
html_table(fill = TRUE)
<- Wiki_Table[[1]]
DataFrame
colnames(DataFrame) <- c("Rank", "Album", "Artist", "Contributors")
<- DataFrame %>%
DataFrame filter(!is.na(Rank) & Rank != "")
$Rank <- str_trim(DataFrame$Rank)
DataFrame$Album <- str_trim(DataFrame$Album)
DataFrame$Artist <- str_trim(DataFrame$Artist)
DataFrame$Contributors <- str_trim(DataFrame$Contributors)
DataFrame
<- merge(DataFrame, Top500_2013, by = c("Album", "Artist"), suffixes = c("_new", "_original"))
Merged_Data
<- Merged_Data %>%
Merged_Data mutate(Rank_new = as.numeric(Rank),
Rank_original = as.numeric(Number)) %>%
filter(!is.na(Rank_new) & !is.na(Rank_original))
<- Merged_Data %>%
Merged_Data mutate(Rank_diff = Rank_new - Rank_original)
Running this code will add new elements to your R Environment. The primary source that will be used in the next two visualizations is the “Merged_Data” table. In this data frame information on albums that are in both the 2013 and 2023 Rolling Stone Top 500 Albums of All Time. The key information that was coded into this new data frame was the addition of a variable named “Rank_diff” which shows the increase or decrease in which an album had in the 10 years between articles. With this additional insight, the following 2 visualizations were created to provide extra insight on the differences between the two lists:
Top 10 Albums with the Largest Rank Changes (2013 vs. 2023)
<- Merged_Data %>%
Rank_Changes arrange(desc(abs(Rank_diff))) %>%
slice_head(n = 10)
ggplot(Rank_Changes, aes(x = reorder(Album, Rank_diff), y = Rank_diff, fill = Rank_diff > 0)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 10 Albums with Biggest Rank Changes",
x = "Album",
y = "Rank Difference (Original Rank - New Rank)") +
scale_fill_manual(values = c("red", "green"),
labels = c("Rank Decreased", "Rank Improved")) +
theme_minimal() +
theme(legend.title = element_blank())
Comparison of Original vs. New Album Rankings
ggplot(Merged_Data, aes(x = Rank_original, y = Rank_new)) +
geom_point(aes(color = Rank_diff > 0), alpha = 0.6, size = 2) +
geom_abline(intercept = 0, slope = 1, color = "blue", linetype = "dashed") +
labs(title = "New Rank vs Original Rank",
x = "Original Rank",
y = "New Rank",
color = NULL) +
scale_color_manual(values = c("red", "green"),
labels = c("Rank Decreased", "Rank Improved")) +
theme_minimal()
Conclusion
As we’ve seen throughout this analysis, the Rolling Stone Top 500 Albums of All Time list is anything but set in stone (see what I did there?). Whether it’s a long-overdue boost for a beloved classic or a surprising slide for a former favorite (Looking at you Kanye West), this comparison shows how our collective taste in music evolves over time. Some albums stood the test of time, others got their flowers at last, and a few found themselves drifting down the charts. In the end, these rankings remind us that music is personal, cultural, and ever-changing. I end with a statement that no matter what the charts say, the best album is the one YOU keep on repeat. So keep streaming, mixing, spinning, or however else you listen to music, and know that music is “sic”.