music ratings

This first chink is just for adding all of the needed libraries.

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(gganimate)
library(magick)

## Warning: package 'magick' was built under R version 4.0.4

## Linking to ImageMagick 6.9.11.57
## Enabled features: cairo, freetype, fftw, ghostscript, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fontconfig, x11

This chunk sets the working directory and reads in the dataset.

setwd("C:/Users/noahz/Desktop/Data 110 R/Datasets/Hate crime data sets")
albums <-read_csv("albums.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   id = col_double(),
##   artist_id = col_double(),
##   album_title = col_character(),
##   genre = col_character(),
##   year_of_pub = col_double(),
##   num_of_tracks = col_double(),
##   num_of_sales = col_double(),
##   rolling_stone_critic = col_double(),
##   mtv_critic = col_double(),
##   music_maniac_critic = col_double()
## )

This is the beginning of the cleaning and is just for condensing all of the critic scores into a single variable.

condensed_scores <- pivot_longer(albums, cols = 8:10, names_to = "reviewer", values_to = "critic" )
condensed_scores

## # A tibble: 300,000 x 9
##       id artist_id album_title genre year_of_pub num_of_tracks num_of_sales
##    <dbl>     <dbl> <chr>       <chr>       <dbl>         <dbl>        <dbl>
##  1     1      1767 Call me Ca~ Folk         2006            11       905193
##  2     1      1767 Call me Ca~ Folk         2006            11       905193
##  3     1      1767 Call me Ca~ Folk         2006            11       905193
##  4     2     23548 Down Mare   Metal        2014             7       969122
##  5     2     23548 Down Mare   Metal        2014             7       969122
##  6     2     23548 Down Mare   Metal        2014             7       969122
##  7     3     17822 Embarrasse~ Lati~        2000            11       522095
##  8     3     17822 Embarrasse~ Lati~        2000            11       522095
##  9     3     17822 Embarrasse~ Lati~        2000            11       522095
## 10     4     19565 Standard I~ Pop          2017             4       610116
## # ... with 299,990 more rows, and 2 more variables: reviewer <chr>,
## #   critic <dbl>

This pivots the set wide to separate all of the genres into their own variables.

scores_by_genre <- pivot_wider(condensed_scores, names_from = "genre", values_from = "critic")
scores_by_genre

## # A tibble: 300,000 x 45
##       id artist_id album_title year_of_pub num_of_tracks num_of_sales reviewer
##    <dbl>     <dbl> <chr>             <dbl>         <dbl>        <dbl> <chr>   
##  1     1      1767 Call me Ca~        2006            11       905193 rolling~
##  2     1      1767 Call me Ca~        2006            11       905193 mtv_cri~
##  3     1      1767 Call me Ca~        2006            11       905193 music_m~
##  4     2     23548 Down Mare          2014             7       969122 rolling~
##  5     2     23548 Down Mare          2014             7       969122 mtv_cri~
##  6     2     23548 Down Mare          2014             7       969122 music_m~
##  7     3     17822 Embarrasse~        2000            11       522095 rolling~
##  8     3     17822 Embarrasse~        2000            11       522095 mtv_cri~
##  9     3     17822 Embarrasse~        2000            11       522095 music_m~
## 10     4     19565 Standard I~        2017             4       610116 rolling~
## # ... with 299,990 more rows, and 38 more variables: Folk <dbl>, Metal <dbl>,
## #   Latino <dbl>, Pop <dbl>, `Black Metal` <dbl>, Progressive <dbl>,
## #   `Pop-Rock` <dbl>, Retro <dbl>, Western <dbl>, `K-Pop` <dbl>, Indie <dbl>,
## #   Lounge <dbl>, `J-Rock` <dbl>, `Hard Rock` <dbl>, Unplugged <dbl>,
## #   Jazz <dbl>, Trap <dbl>, Ambient <dbl>, Rap <dbl>, `Heavy Metal` <dbl>,
## #   Dance <dbl>, Alternative <dbl>, `Death Metal` <dbl>, Live <dbl>,
## #   Blues <dbl>, Compilation <dbl>, Gospel <dbl>, Country <dbl>, `Deep
## #   House` <dbl>, `Brit-Pop` <dbl>, Parody <dbl>, Techno <dbl>, Rock <dbl>,
## #   Punk <dbl>, `Boy Band` <dbl>, Indietronica <dbl>, `Holy Metal` <dbl>,
## #   `Electro-Pop` <dbl>

This first filters the genres to only 6 that I believe are diverse and the most popular. It then pivots the set longer to compile all of the filtered genres into a single genre variable.

filtered <- scores_by_genre %>%
  select('Metal', 'Pop', 'Rock', 'Jazz', 'Rap', 'Country', 'year_of_pub', 'num_of_sales', 'num_of_tracks') %>%
  pivot_longer(cols = 1:6, names_to = "genre", values_to = "score")
filtered

## # A tibble: 1,800,000 x 5
##    year_of_pub num_of_sales num_of_tracks genre   score
##          <dbl>        <dbl>         <dbl> <chr>   <dbl>
##  1        2006       905193            11 Metal      NA
##  2        2006       905193            11 Pop        NA
##  3        2006       905193            11 Rock       NA
##  4        2006       905193            11 Jazz       NA
##  5        2006       905193            11 Rap        NA
##  6        2006       905193            11 Country    NA
##  7        2006       905193            11 Metal      NA
##  8        2006       905193            11 Pop        NA
##  9        2006       905193            11 Rock       NA
## 10        2006       905193            11 Jazz       NA
## # ... with 1,799,990 more rows

This chunk first creates a simple box and whisker plot comparing the genre to the scores and adds custom colors and a title. Then using gganimate it has the boxes change as a factor of the year. Finally, using the Magick library the chart is animated.

animation <- ggplot(filtered) +
  geom_boxplot(mapping = aes(y = genre, x = score, fill = genre), na.rm = TRUE) +
  ggtitle("music genre critic rating") +
  scale_fill_manual(name = "genre", labels = c("Country", "Jazz", "Metal", "Pop", "Rap", "Rock"), values = c("green3", "gold1", "grey47", "pink1", "slateblue", "tan4")) +
  transition_states(
    year_of_pub,
    transition_length = 2,
    state_length = 1) +
  enter_fade() +
  exit_shrink() +
  ease_aes('sine-in-out') +
  labs(caption = '{closest_state}')
animate(animation, duration = 15, fps=10, renderer = magick_renderer())

Conclusion

To address a potential problem upfront, this dataset seems to have been artificially created using data that may not be representative of any events. This dataset is from Kagle, it titled “Music Label Dataset” by Revil Rosa. The main variables that I have used from this dataset are: “genre”, which is the genre of the album; “year_0f_pub”, which is the year that the album was published; “rolling_stone_critic”, “mtv_critic”, “music_maniac_critic”, all these variables are scores that different music critics have given the album. there are other variables, but I did not use them in my project. I knew generally what I wanted to do with this dataset before I had started working on it, as such most of the tidying was done in service of that goal, I believe that the dataset was already tidy for general use. To clean the data into a form that I could use for my purposes I first pivoted the dataframe longer to compile all of the critic scores from the 3 sources into 1 variable. Then I pivoted it wider making the different genres their own variable. I then filtered the dataset down to only the variables that I had listed previously and only the 6 genres of music that I believe are the most popular and diverse in the dataset as I believe that having too many genres to compare would get visually cluttered, as I did not need any of the other variables, I left them out of the filter. Doing this allowed me to pivot longer yet again returning the genres to being observations again but now with all the critic score condensed into 1 variable. Once I had the data in a tidy form that I needed I started by making a basic box plot that compared the genres to the scores. Once I had the basic plot done, I used gganimate to have each box plot change to represent how the scores changed as time passed. now that I know that the data is artificially created and random, I do find that it is interesting that the medians of all the genre’s changes depending on the year, but only some of the first quartile change and none of the third quartile change. This initially seemed weird to me as if the data were randomly created you would expect that it stays consistent, but after considering it further for it to always be the same would require a level of consistency that would be extremely unlikely for random numbers. there were some avenues that I wanted to explore, like do score and sales or number of songs have any sort of correlation and do different genres tend to differ in the number of songs but given that this dataset is random all they produced were identical and completely random charts so I chose to leave them out as I felt they would not add anything to the project.

music ratings

Noah ZImmer

3/6/2021

Conclusion