Assignment 1

Author

Alex St.Hilaire

Code
library(tidyverse)
library(wesanderson)
top_albums <- read_csv(
  "https://jsuleiman.com/datasets/Rolling_Stones_Top_500_Albums.csv",
    locale = locale(encoding = "ISO-8859-2", asciify = TRUE))

Introduction

The data set provided is Rolling Stone Top 500 Albums, with data showing album ranking, name, release year, artist, and both main genre and subgenre. It immediately stood out to me that Rock, in some form, was a major segment of the data observations, and this became clearly once I started working on the first visualization.

Subgenre would have been an interesting variable to visualize as well, but there are 290 unique observations in Subgenre in a dataset of 500 observations. An interesting visualization for someone with more Rstudio experience might be to visualize how many times each category of music appears in Subgenre.

First Categorical Analysis

The first Gestalt principle I wanted to demonstrate was Similarity, and also Preattentive Processing. I initially plotted this information knowing that I wanted to show how much the “Rock” genre appeared in the data set. Initially, I had three categories: “Rock”; “RockPlus”, which was any genre that included Rock and another genre; and “Other” which is non-Rock. I combined “Rock” and “RockPlus” to really drive home the message that Rock rules all.

Code
top_albums |>
  ggplot(aes(Year, fill = ifelse(grepl("Rock", Genre),"Rock","Other")))+ #fill ifelse
## COMMENTED OUT CODE  ggplot(aes(Year, fill = ifelse(Genre == "Rock", "Rock", ifelse(grepl("Rock,", Genre),"RockPlus","Other")))) ##
  geom_rect(aes(xmin = 1960, xmax = 1980, ymin = 0, ymax = Inf), fill = "grey85")+ #geom_rect syntax & error
  geom_bar()+
  scale_fill_manual(values= wes_palette("GrandBudapest1"))+#Karthik Ram github
  theme(line = element_blank(),
        rect = element_blank(),
        plot.background = element_rect("whitesmoke"),
        legend.title = element_blank(),
        legend.position = c(0.8,0.8))+#position syntax
  guides(fill = guide_legend(reverse = TRUE))+
  # guides(fill = guide_legend(reverse = TRUE))+
   labs(
    title = "Top 500 Greatest Albums of All Time",
    subtitle = "The 60's and 70's dominate through rock music",
    x = "",
    y = "")

Key Distribution Patterns & Visualization Choices

In this plotting, I also noticed just how many Top 500 Albums were from the 1960-1980 period of time. In fact, in a data set spanning 56 years (1955 through 2011), 58% of the observations occurred in the 20 year period between 1960 - 1980.

With this visualization, I highlighted the “Rock” genres into one, bright red color showing Similarity, and then I also applied a rectangle geom in the background to draw the eye to this period of time, further highlighting the classic rock period; this is an example of using Preattentive Processing.

Schwabish discusses the reduction of clutter as a guideline for better graphics, which I sought to emulate by removing unnecessary elements in the theme layer. I prefer clean, minimalist visuals that convey the message with just the right amount of “spice”.

Critical Evaluation

While bar charts aren’t very “exciting”, I do think this chart demonstrates the key findings that Rock is the most prevalent genre, and that the 1960s and 1970s were an important period for music. I found a series of Wes Anderson inspired color palettes last year and was excited to use one for the first time. While only demonstrating two colors, I find this palette very aesthetically pleasing.

Second Categorical Analysis

The second Gestalt principle demonstrated was Enclosure, which I’ve done by plotting a scatter plot with an enclosing ellipse, as demonstrated in coursework videos. I knew I wanted to plot a scatter plot of some kind because they’re so versatile and common.

Code
top_albums_mod <- top_albums
top_albums_mod$RVO <- ifelse(grepl("Rock",top_albums_mod$Genre), "Rock","Other")
top_albums_mod$Legends <- ifelse(top_albums_mod$Artist %in% c("The Beatles", "The Rolling Stones", "Bob Dylan", "Bruce Springsteen"),top_albums_mod$Artist, "Other")
# assistance to modify new column using ifelse/grepl
top_albums_mod$Legends <- factor(top_albums_mod$Legends, 
                                  levels = c("The Beatles",
                                             "The Rolling Stones",
                                             "Bruce Springsteen",
                                             "Bob Dylan",
                                             "Other")) #assistance in suggestion to factor data prior to plot
Code
library(ggforce)
ellipse_data <- data.frame(
   x0 = 1970,
   y0 = 2,
   a = 16, #horizontal
   b = 0.6, #vertical
   angle = 0
)
Code
ggplot(top_albums_mod, aes(x = Year, y =RVO))+
  geom_point(aes(color = Legends, shape = Legends),
             position = "jitter",
             alpha = 0.8)+
  scale_shape_manual(values = c("The Beatles" = 16,
                       "Bruce Springsteen" = 16,
                       "The Rolling Stones" = 18,
                       "Bob Dylan" = 15,
                       "Other" = 1))+ #assistance with getting shapes properly scaled
  scale_color_manual(values= c("The Beatles" = "cornflowerblue",
                               "Bob Dylan" = "mediumseagreen",
                               "The Rolling Stones" = "firebrick3",
                               "Bruce Springsteen" = "gold3",
                               "Other" = "grey50"))+
  theme(line = element_blank(),
        rect = element_blank(),
        plot.background = element_rect("snow1"),
        legend.position = "top",
        legend.title = element_blank(),
        axis.text.y = element_text(angle = 90))+ #angle syntax, but it suggested axis.title.y, whereas it actually is axis.text.y
  labs(title = "Legendary Classic Artists in Rock Music",
       subtitle = "Top 500 Greatest Albums",
       x= "",
       y= ""
       )+
  geom_ellipse(data = ellipse_data, 
               aes(x0 = x0, y0 = y0, a = a, b = b, angle = angle), 
               fill = "cadetblue3", color = "cadetblue3", alpha = 0.1,
               inherit.aes = FALSE) # class video & chatgpt for error assistance

Code
### change angle of y-axis label

Key Distribution Patterns and Visualization Choices

Upon reviewing the data a little more closely, it became obvious that The Beatles represented a large number of observations, which prompted me to examine the data set for other artists with a large number of appearances. I chose an arbitrary cutoff of 8 appearances which seemed like an appropriately high number on this list, resulting in four artists highlighted. It easily could be expanded to visualize twice as many artists, but would clutter the visualization.

In order to create highlight the chosen artists and order the legend in the preferred way, I ended up creating a duplicate data set to modify and factor prior to plotting. This would have also probably made the initial bar plot easier.

I labelled these four artists as “classic legends” for this visualization, and using Enclosure, I show how their activities primarily fall in the period previously identified as being important: Rock music in the 1960s and 1970s. This enclosed period is also heavily populated with Other albums from the data set. I also used Preattentive Processing methods in this visualization by changing the color and shape of points of the legendary artists and making “Other” artists grey and open-style points, reducing their importance to the visualization.

Critical Evaluation

I do like the way this visualization looks, but I almost think it looks more interesting than it shows valuable data. It does show a nice breakdown of Rock vs Other music over the Years on the x-axis, however, because the geom_point layer is jittered, the y-axis doesn’t really show meaningful data beyond the separation of genres. There is probably a better way to show this data.

Conclusion

The two visualizations work together well to demonstrate that according to this data set, the 1960s and 1970s were an important period for music, particularly with the formation and driving force of Rock music. Much of modern music stands on the shoulders of these giants.

It also demonstrates how much music has changed; it raises the question of what counts of “Rock” music? Its obvious that “Rock” dominates the Rolling Stone Top 500, but has the definition of what Rock music changed substantially over the years, or has it always been broad? It surprised me that Bob Dylan was listed primarily as a Rock musician. Undoubtedly, his music is among the most covered in history especially by Rock and Pop musicians, but a modern Bob Dylan would likely be called Indie or Folk over Rock.

It also raises the question of what other stories could be told from this data set; what non-Rock artists might appear frequently and do they appear in the same 20 year period, or in another time?

Sourcing & Generative AI

ChatGPT was used frequently to assist in this report. Chatgpt was used as an aid to learn syntax of some functions calls that were giving me difficulty; and also used to decipher errors. Some of Chatgpt’s suggestions were inaccurate or otherwise discarded; and the Tidyverse Function Reference was used in conjunction to achieve desired results. Chatgpt Log provided below.

Tidyverse Function Reference
https://ggplot2.tidyverse.org/reference/index.html

Wes Anderson Palette

https://github.com/karthik/wesanderson#readme

ChatGPT Log

https://chatgpt.com/share/67b12931-c0b8-800d-a95f-611a3d1dd2d7