Assignment 1

Author

Shanna Dubay

Introduction

The Rolling Stone’s 500 Greatest Albums data set contains a ranked list of albums with many decades of work crossing multiple music genres. This data set provides ample opportunity to analyze the change in music over time and to spot patterns for trends.

For my analysis, I created two categorical visualizations that apply different Gestalt principles to help with data visualization. The first visualization shows the amount of albums each decade acquired in the list, while using a proximity to highlight clusters. The second one shows the percentage of of albums in each genre over time, applying similarity to emphasize patterns. These visualizations help give insight to how music preferences and the music industry changed over several decades.

Code
library(tidyverse)
library(ggplot2)
library(RColorBrewer)
df<-read_csv("https://jsuleiman.com/datasets/Rolling_Stones_Top_500_Albums.csv")
locale = locale(encoding = "ISO-8859-2", asciify = TRUE)
b_album_counts <- df |>
count(Year, name = "num_albums")
b_album_counts <- b_album_counts |>
mutate(Decade = (Year %/% 10) * 10)
blue_palette <- colorRampPalette(c("#00A8E8", "#0077B6", "#005691", "#003f7f", "#002654"))(length(unique(b_album_counts$Decade)))
ggplot(b_album_counts, aes(x = Year, y = num_albums, fill = as.factor(Decade))) +
geom_col() +
scale_fill_manual(values = blue_palette, name = "Decade") +
labs(title = "Number of Albums Released Per Year",
x = "Year",
y = "Number of Albums") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

In the first visual I noticed that the key distribution patterns were the counts of albums around the 1970s and 1980s were highest. I felt this was best represented with my visualization choices of a bar chart to show height of album counts as well as the color scale of blue to show lightest blue in the earlier decades to darkest blue in the later decades. I feel the grouping of decades was critical to show the counts of albums over time as it helps cluster the years together and shows smoother distribution over time.

Code
library(stringr)
dfp <- df %>%
mutate(Primary_Genre = ifelse(is.na(Genre) | Genre == "", "Unknown", 
str_extract(Genre, "^[^,]+")))
genre_counts <- dfp %>%
mutate(Decade = floor(Year / 10) * 10) %>%
group_by(Decade, Primary_Genre) %>%
summarise(Count = n(), .groups = "drop")
genre_percent <- genre_counts %>%
group_by(Decade) %>%
mutate(Percent = (Count / sum(Count)) * 100) %>%
ungroup()
ggplot(genre_percent, aes(x = Decade, y = Percent, color = Primary_Genre)) +
geom_point(size = 3, alpha = 0.8) +  
geom_line(aes(group = Primary_Genre), alpha = 0.5) +  
scale_color_viridis_d(option = "turbo") +
labs(title = "Dominant Genre Percentage Over Decades",
x = "Decade",
y = "Percentage of Albums",
color = "Genre") +
theme_minimal()

My second visuals key distribution patterns were the genre clusters that took only the first genre listed in the album because adding other genre lists would create far too many variables in the chart and confuse my audience with random combinations of genres. For my visualization I chose to have the genres in a rainbow of colors because I felt the genres were a spectrum of musical tastes. While I think my visual communicates the relation of genres percentages to one another over decades, I do think I could have done a better job with filtering the genres that were a mix. I fear my audience will get the wrong impression that rock dominated the musical world as many albums were “Rock, Hip Hop, Jazz” or other combinations and that may have skewed this data for the chart.

Conclusion

With these two visualizations, I was able to detect the changes in trends of popularity to different music genres over the decades. The first chart helped me to see that the 1970s held the most albums, while the second chart showed me that Rock was a very high percentage above other genres throughout most time in the list starting in 1960.

The visuals offered me a chance to come up with other questions like what would it look like to check the ranking of albums based on genre. Also, would there be years that would allow greater genre diversity.

I would hope that future data sets will include streaming services and sales as well as albums associated with movies. I also wonder if the data set could be bias due to the magazine being of a rock tendency.