Case Study 1: Music Over Time

Author

Team Names!

Run this code chunk first!

library(tidyverse)
music_data <- read_csv("songs_cleaned_1960-now.csv") |>
  mutate(decade = as.factor(decade))

Background Information

Spotify provides data about the characteristics of songs on its platform. Among the ones you’ll consider are:

  • Valence, a measure of how happy or positive a song sounds;

  • Danceability, how easy the song is to dance to based on its tempo and rhythm;

  • Song duration, how long the song is;

  • Instrumentalness and Speechiness, how heavy the song is on instrumentals or lyrics/speed respectively.

You’ll look at some of these measures (or related ones) across decades.

Original vinyl records could only have 4-5 minutes of music per side, but in the late 1950s this was expanded to 9-12 minutes per side. Cassettes and then CDs had even more storage capacity. Since then, music distribution has switched largely to streaming, often focused on individual songs as opposed to albums. Most recently, more music is emerging on (or at least is optimized for) shorter form platforms like TikTok.

The dataset contains information from the 1950s through the 2020s.

Your Analysis

First, fill in the blanks below to choose three decades from the dataset to compare. The | means or here, so fill each blank with one of the decades you want to compare.

music_data_decades <- filter(music_data,
                             decade == 2010 | decade == 2000 | decade == 1990)

Continuing the discussion of song duration from class, you are going to make a table and a graph or set of graphs to compare how long songs were across the three decades. The variable for song duration is called track_duration_s.

First, make a table. Include any statistics you think are relevant for comparing the song duration data across decades. Below the table, point out any major similarities or differences and describe what they tell you about song duration over time.

summarize(music_data_decades,
         track_duration_s)
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
# A tibble: 7,304 × 1
   track_duration_s
              <dbl>
 1              149
 2              212
 3              213
 4              267
 5              182
 6              193
 7              300
 8              659
 9              255
10              169
# ℹ 7,294 more rows

Your interpretation:

Now, make a graph or graphs to more fully show the different song durations in the different decades. You get to choose the kind of graph here! Below the graph, explain what the graphs show and provide some interpretation of what they tell us about song duration over time.

ggplot(music_data_decades)+
  aes(x = track_duration_s) +
  geom_bar()+
  facet_wrap(~decade)

Your interpretation:

The songs from the 1990s have a slightly larger mean. The largest differance though is that there are a lot more songs when you move up through the decades. They all basically have the same shape. The most information we can get from these graphs is that the trend of music being produced seems to be increasing greatly with each passing century. Even so it tells us that the duration of songs has stayed relatively the same over the centuries.

Finally, you’ll also make a table and a graph or set of graphs for one of the other musical features. Choose one of the other variables (all listed and explained at the top). Similarly to what you did with song duration, make and interpret a table of summary statistics, and then make and interpret a graph or graphs.

summarize(music_data_decades,
         Danceability)
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
# A tibble: 7,304 × 1
   Danceability
          <dbl>
 1        0.26 
 2        0.648
 3        0.66 
 4        0.683
 5        0.769
 6        0.653
 7        0.557
 8        0.497
 9        0.739
10        0.524
# ℹ 7,294 more rows

Your interpretation: It isn’t awfully easy to read a table such as this since it has so much data placed down in very finite quantity per page and the data is so vast that the page count exceeds 100. Anyways despite me not being able to see which decade these are from I will try my best. The highest numerical value is placed as .9 in the tenths place and all the numbers seem to be limited from ranges of 0 - .999. This tells us that .9 is considered very much above average danceability while a number such as .2 would be considered almost undanceable.

ggplot(music_data_decades)+
  aes(x = Danceability) +
  geom_bar()+
  facet_wrap(~decade)

Your interpretation: Now that we have a clearer depiction of the data. I can firmly state that over the decades songs have seemed to remain relatively the same in their danceablity and I can discern that the only defining feature which distinguishes each graph is that over the centuries there has been and increasing trend of making music. Their means are all about the some being around 62.5 mark. I could make a lofty suggestion that in the 2010s the music produced was more danceable than that of previous decades having noticed a great presence of peaks right of the 62.5 mark. This is all I can think of.