Code
library(tidyverse)
<- read_csv(
top_albums "https://jsuleiman.com/datasets/Rolling_Stones_Top_500_Albums.csv",
locale = locale(encoding = "ISO-8859-2", asciify = TRUE))
This dataset contains The Rolling Stone’s top 500 albums from 1950 to Present. The data given consists of: Number, Year, Album, Artist, Genre, and Sub-genre.
For this assignment, the aspects I chose to analyze were the correlation between the best albums to what year they came out. And what genre the top 50 albums typically are.
A.Visualization
library(tidyverse)
<- read_csv(
top_albums "https://jsuleiman.com/datasets/Rolling_Stones_Top_500_Albums.csv",
locale = locale(encoding = "ISO-8859-2", asciify = TRUE))
|>
top_albums ggplot(aes(x=Year, y = Number)) +
geom_point(aes(color=Number)) +
labs(y = "Album Position", title = "Top 500 Albums") +
scale_y_reverse()
B. Analysis and Reflection
Key Distribution Patterns:
What I found with the graph I made was I not only came to the conclusion that a higher number of top albums came out before the 80’s, but out of all top 500 albums, a majority of them came out before the 80’s. I also beleive this follows the proximity principle
Visualization Choices:
For this graph, I chose to have the gradient go from the lowest number to the highest. This is due to the fact that the darker objects catch the eye much quicker than the lighter ones, allowing for the focus to be on where the top albums are sitting. In order to do this I just did a quick y scale reverse at the end of my code.
Critical Evaluation:
I believe my strengths sit with how easily you can see where the clusters sit and what year the most popular albums came out. I think my struggle was deciding whether or not to lower the sequence for album position, i’m not sure if it’s too broad of a scale but while dealing with 500 units, I felt it was the best choice.
A. Visualization
<- top_albums |>
top_50 head(50)
ggplot(top_50, aes(x=Number/25, y=Genre, fill=Genre)) +
geom_bar(stat="Identity") +
theme_minimal() +
labs(title = "Top 50 Album Genres", x = "Count", y= "Genre") +
scale_x_continuous(breaks = seq(0, 50, by = 5)) +
theme( axis.text.y = element_text(angle = 30, hjust = 1),legend.position = "none")
B. Analysis and Reflection
Key Distribution Patterns:
This data shows the correlation between the most popular genres in the top 50 of the 500 top albums. Right off the bat we can see that Rock is a very popular genre, having a lead over the second place genre of Rock/Blues by over 5 times. While every other genre is under a count of 3.
Visualization Choices:
For this, I chose to have the genres on the y axis due to the size of some of them not fitting on the x axis. I think I did a good job giving each genre a different color and giving the genre titles a slight tilt to fit a little better.
Critical Evaluation:
This was definitely the more difficult of the 2 graphs. My biggest problem, and the most obvious one, is I could not figure out how to sort the Y axis by number of occurrences, so I was forced to leave that how it is. I also did not know how to sort some of the genres out, for example, there’s one section that’s “rock, pop” and another that’’s “rock/blues”. So the information is slightly incorrect due to not being able to take the “rock” part and put it into its own section. However, I still think my graph sends the message of the story quite well showing how popular rock is in the top 50 albums of all time, or even how an album with ONLY rock can dominate the charts.
I think my graphs fit well with telling a story together. Especially with the order they are in, looking at the first graph you’d think about how crazy it is that a majority of the best albums came out before the 80’s, and to be honest, just by doing that graph, it sparked the idea for me to see what the top 50 albums actually consisted of when it came to the genre. I think for a question I would ask is, if I had to make a 3rd graph I would attempt to see who the most popular artists are out of the 500 top albums and see who is making all of these rock songs.