Assignment 1

Author

Brady Heath

Code
library(tidyverse)
library(ggplot2)
top_albums <- read_csv(
  "https://jsuleiman.com/datasets/Rolling_Stones_Top_500_Albums.csv",
    locale = locale(encoding = "ISO-8859-2", asciify = TRUE))
Code
glimpse(top_albums)
Rows: 500
Columns: 6
$ Number   <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ Year     <dbl> 1967, 1966, 1966, 1965, 1965, 1971, 1972, 1979, 1966, 1968, 1…
$ Album    <chr> "Sgt. Pepper's Lonely Hearts Club Band", "Pet Sounds", "Revol…
$ Artist   <chr> "The Beatles", "The Beach Boys", "The Beatles", "Bob Dylan", …
$ Genre    <chr> "Rock", "Rock", "Rock", "Rock", "Rock, Pop", "Funk / Soul", "…
$ Subgenre <chr> "Rock & Roll, Psychedelic Rock", "Pop Rock, Psychedelic Rock"…

After viewing the data set for top_albums I have chosen to analyze the correlation between genre and decade. To show the co

Code
 colnames(top_albums)
[1] "Number"   "Year"     "Album"    "Artist"   "Genre"    "Subgenre"
Code
top_albums <- top_albums |> 
  mutate(decade = floor(Year / 10) * 10)

I mutated years into decades making my bar graph less cluttered.

Code
top_albums <- top_albums |> 
  count(decade, Genre) |> 
  rename(album_count = n) |> 
  arrange(decade, desc(album_count))

To ensure my bar graph isn’t, cluttered taking away from the readability I consolidated the genre names.

Code
top_albums <- top_albums |> 
  mutate(
    Genre = case_when(
      str_detect(Genre, "Rock|Punk|Alternative") ~ "Rock",
      str_detect(Genre, "Pop") ~ "Pop",
      str_detect(Genre, "R&B|Soul|Funk") ~ "R&B/Soul",
      str_detect(Genre, "Jazz|Blues") ~ "Jazz/Blues",
      str_detect(Genre, "Electronic|EDM|House|Techno") ~ "Electronic",
      str_detect(Genre, "Country|Folk|Americana") ~ "Country/Folk",
    )
  )
Code
ggplot(top_albums, aes(x = factor(decade), y = album_count, fill = Genre)) +
  geom_col(position = "dodge") +  
  theme_minimal() +
  labs(
    title = "Album Count by Genre Across Decades",
    x = "Decade",
    y = "Number of Albums",
    fill = "Genre"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 


Analysis & Reflection

1. The bar chart shows that Rock has been generally the top genre overall and Rock peaked in the 1970s with almost 120 albums.

  1. The reason that I ultimately chose a simple bar graph was bar graphs do a good job at visualizing correlation. I wanted visualize the correlation between genre and decade. Focusing on readability which is emphasized in chapter 1 of the textbook as an important aspect of making visualizations more appealing.

  2. I would say my visualization did a good job at communicating my objective of the correlation between decade and genre but the simple bar graph does have its limitations. Bar graphs are very good at explaining correlation but when to many factor come into play they can become to cluttered and hard to read.

Second Visualization

- In my second visualization I will be showing the proximity between Artists and Decades.

Code
library(tidyverse)
library(ggplot2)
top_albums <- read_csv(
  "https://jsuleiman.com/datasets/Rolling_Stones_Top_500_Albums.csv",
    locale = locale(encoding = "ISO-8859-2", asciify = TRUE))
Code
 colnames(top_albums)
[1] "Number"   "Year"     "Album"    "Artist"   "Genre"    "Subgenre"
Code
top_albums <- top_albums |> 
  mutate(decade = floor(Year / 10) * 10)
Code
decade_counts <- top_albums %>%
  group_by(decade) %>%
  summarise(Count = n(), .groups = "drop")
Code
ggplot(decade_counts, aes(x = as.factor(decade), y = Count)) +
  geom_bar(stat = "identity", fill = "pink", color = "black") +
  labs(
    title = "Number of Albums by Decade",
    x = "Decade",
    y = "Number of Albums"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Analysis & Reflection

  1. The bar chart shows the same trend pattern with the peak being in 1970.

  2. I chose the bar chart to show an easy to read visualization that makes it easy for the reader to see the proximity I am examining in this case the closeness between Decade and Albums.

  3. For the closeness between Decade and Albums I belive the bar chart is the better option as it does an excellent job at portraying what similarities you may be trying to visualize where other chart options may be more confusing to read and have the potential to be unpleasant for the reader to attempt to make sense of the data.

Conclusion

Both of my visualizations look at two different categorical comparisons yet show similar results. My first bar chart showed the correlation between Genre and Decade while my second bar chart shows the correlation between number of albums and Decade. Both these visualizations show that the rolling stones top 500 peaked in the 1970’s.