Rolling Stone Data Import

First, I import the Rolling Stone Top 500 albums and take a quick look at the structure using str() to see which variables are character variables. I’m interested in looking at the artists since I see 289 artists in the data, meaning some artists have multiple Albums on the list. For the second character variable, I’m interested in genre since I’m curious to see the genres of the artists with the most albums in the top 500.

library(tidyverse)
library(ggplot2)
library(scales)
top_albums <- read_csv("http://jamessuleiman.com/teaching/datasets/Rolling_Stones_Top_500_Albums.csv",
    locale = locale(encoding = "ISO-8859-2",
    asciify = TRUE))
str(top_albums)
## tibble [500 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Number  : num [1:500] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Year    : num [1:500] 1967 1966 1966 1965 1965 ...
##  $ Album   : chr [1:500] "Sgt. Pepper's Lonely Hearts Club Band" "Pet Sounds" "Revolver" "Highway 61 Revisited" ...
##  $ Artist  : chr [1:500] "The Beatles" "The Beach Boys" "The Beatles" "Bob Dylan" ...
##  $ Genre   : chr [1:500] "Rock" "Rock" "Rock" "Rock" ...
##  $ Subgenre: chr [1:500] "Rock & Roll, Psychedelic Rock" "Pop Rock, Psychedelic Rock" "Psychedelic Rock, Pop Rock" "Folk Rock, Blues Rock" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Number = col_double(),
##   ..   Year = col_double(),
##   ..   Album = col_character(),
##   ..   Artist = col_character(),
##   ..   Genre = col_character(),
##   ..   Subgenre = col_character()
##   .. )
top_albums %>% distinct(Artist)
## # A tibble: 289 x 1
##    Artist                     
##    <chr>                      
##  1 The Beatles                
##  2 The Beach Boys             
##  3 Bob Dylan                  
##  4 Marvin Gaye                
##  5 The Rolling Stones         
##  6 The Clash                  
##  7 Elvis Presley              
##  8 Miles Davis                
##  9 The Velvet Underground     
## 10 The Jimi Hendrix Experience
## # … with 279 more rows
top_albums %>% distinct(Genre)
## # A tibble: 63 x 1
##    Genre                                    
##    <chr>                                    
##  1 Rock                                     
##  2 Rock, Pop                                
##  3 Funk / Soul                              
##  4 Rock, Blues                              
##  5 Jazz                                     
##  6 Jazz, Rock, Blues, Folk, World, & Country
##  7 Funk / Soul, Pop                         
##  8 Blues                                    
##  9 Pop                                      
## 10 Rock, Folk, World, & Country             
## # … with 53 more rows
top_albums %>% distinct(Artist, Genre)
## # A tibble: 342 x 2
##    Artist             Genre      
##    <chr>              <chr>      
##  1 The Beatles        Rock       
##  2 The Beach Boys     Rock       
##  3 Bob Dylan          Rock       
##  4 The Beatles        Rock, Pop  
##  5 Marvin Gaye        Funk / Soul
##  6 The Rolling Stones Rock       
##  7 The Clash          Rock       
##  8 Bob Dylan          Rock, Blues
##  9 Elvis Presley      Rock       
## 10 Miles Davis        Jazz       
## # … with 332 more rows
artist_genre <- top_albums %>% group_by(Artist, Genre) %>% mutate(count_album=n()) %>% select(Genre, Artist, count_album) %>% distinct(Genre, Artist, count_album) %>% 
ungroup() %>% slice_max(order_by = count_album, n = 20, with_ties = FALSE) 

artist_genre2 <- top_albums %>% group_by(Artist, Genre) %>% mutate(count_album=n()) %>% select(Genre, Artist, count_album) %>% distinct(Genre, Artist, count_album) %>% 
ungroup() 

artist_genre3 <- inner_join(artist_genre, artist_genre2, by="Artist") %>% rename(Genre = Genre.y, count_album = count_album.y) %>% select(Genre, Artist, count_album) 

Using group, mutate, and slice_max

To find the artists with the most albums by genre, I group the artists by genre and count the number of albums they have in the data. I then make sure I’m selecting unique values (since each time the artist’s name appears, we would see the same value) and find the artists with the most albums on the top 500 list. I noticed that some artists within the top 20 albums had one or two albums in different genres. To capture those albums, I did an inner_join by artist so that all of the artists’ albums were grouped together and separated by genre.

Note: I looked at 10 to 15 artists to start but saw minimal variability in genres, so I expanded the number of Artists to 20 to display more diversity in genre.

Visualize Relationship with Dot Plot

I initially chose a bar chart to display the data but felt it used too much “ink.” I decided on the dot plot because it uses minimal ink.

artist_genre3 %>% 
  mutate(Artist = fct_reorder(Artist, count_album, .fun='sum')) %>%
  ggplot(aes(x = Artist, y = count_album, colour = Genre)) + geom_point(size = 3, pch = 16, position = position_dodge(width = 0.5)) +
  scale_colour_brewer(palette = "Paired") +
  scale_y_discrete(name ="Number of Albums on top 500 list", 
                    limits=c("1","2", "3", "4", "5", "6", "7", "8", "9", "10")) +
  theme_minimal() +
  labs(title="Artists with the Most Number of Albums on Top 500 (by Album Genre)") + theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()

Rock Artists Top the List

The dot plot indicates that the artists with the most albums on the top 500 list belong in the Rock genre (The Rolling Stones, Bruce Springsteen, The Who, The Beetles, and so on). Many albums on this list are some variation of the rock genre (for example, rock pop and rock blues). We don’t start seeing some variety in genres until we look at Artists who have 3-4 albums on the list where we see genres like Jazz and Funk/Soul.