Data Description

  • CSV file containing Taylor Swift song data from Spotify Web API as of November 6, 2021
  • Includes deluxe and Taylor’s Version albums over the original versions, so non-deluxe albums, Fearless (2008), and Red (Taylor’s Version) (2021) are not included
  • File was downloaded from Kaggle.com
  • Adjustments were made to the CSV file due to the apostrophe’s not appearing correctly
  • The data frame includes the following variables: Song Name, Album, Release Date, Song Length, Popularity, Danceability, Energy, Valence, Tempo, and more.
  • An additional column for release year was added to the data frame in R using the given release date

Album Lengths

The data set includes nine of Taylor Swift’s albums each with various amounts of songs. The album Taylor Swift is the shortest in terms of total song length with 53 minutes. This could be because it was her first album. Fearless (Taylor’s Version) is the longest with 107 minutes which could be because it has additional songs that were not included in the original Fearless album. Two albums have total song lengths of less than 60 minutes while three are above 90 minutes.

Number of Songs per Year

Taylor Swift has been releasing music since 2006. That year, she released 15 songs, but the year with the most number of songs is 2021 with 43. This is because two albums were released that year: evermore (deluxe version) and Fearless (Taylor’s Version). According to the pie chart below, these albums thus contain around 25% of Taylor Swift’s songs in the data set.

Oldest vs Newest Albums

In order to compare the popularity of the oldest album, Taylor Swift, with the newest album, Fearless (Taylor’s Version), a new data frame needed to be created which contains only the songs from these two albums and their individual popularity. This data frame, called oldvsnew, was used to get the five number summary of the popularity for both albums. This was all done using the package dplyr in order to filter the data, add new columns, and summarize the song data in each album.

# data frame with just the old and new album, their songs, and popularity 
oldvsnew = music %>%
  filter(album %in% c("Taylor Swift", "Fearless (Taylor's Version)")) %>%
  select(name, album, popularity) %>%
  mutate(colors = ifelse(album == "Taylor Swift", "lightblue", "goldenrod3"))
# five number summary and mean for both albums
stats = oldvsnew %>%
  group_by(album) %>%
  summarize(min = min(popularity), 
            q1 = quantile(popularity, 0.25), 
            median = median(popularity), 
            mean = mean(popularity),
            q3 = quantile(popularity, 0.75), 
            max = max(popularity)) %>%
  arrange(min) # rearrange

Oldest vs Newest Albums (continue)

Using the new data frame stats from the previous slide, it can be seen that Fearless (Taylor’s Version) is more popular overall than the debut album Taylor Swift. All elements of the five number summary and the mean of popularity are greater for the newest album. The popularity of songs on Spotify are rated on a scale from 0 to 100 with 100 being the most popular.

## # A tibble: 2 x 7
##   album                         min    q1 median  mean    q3   max
##   <chr>                       <int> <dbl>  <dbl> <dbl> <dbl> <int>
## 1 Taylor Swift                   46  48       49  50.1  51.5    59
## 2 Fearless (Taylor's Version)    60  62.2     64  65.6  66.8    76

Song Elements

  • There are other elements of Taylor Swift songs that can be compared.
  • For example, the 3D scatter plot on the next slide shows the danceability vs the energy vs the tempo for each song.
  • Danceability and energy are rated on a scale from 0.0 to 1.0 with 1.0 being the highest.
  • Tempo is in beats per minute.
  • When looking at the 3D plot on the next slide, there is no clear relationship between these variables.
  • The colorization of the plot shows the popularity grouped into four categories from lowest to highest (0 to 25, 25 to 50, 50 to 75, and 75 to 100).
  • The lower bound for each range is included in that category while the upper bound is not.
  • Most songs appear to be in the 50 to 75 popularity range.

Song Elements (continue)

Danceability vs Valence

Focusing on just two variables, it can be possible to see if there is a linear relationship between them. For example, the valence and danceability of each song can be compared. Valence is a measure of how positive or negative a track is with 0.0 being mostly negative and 1.0 being mostly positive. Danceability is a measure of if the song is good for dancing with the range being from 0.0 to 1.0.

To create a linear model between these two variables, the lm function was used on all of the Taylor Swift songs in the data set. As seen below, the R2 value is around 0.144 while the p-value is 3.0*10-7. The low p-value means that valence does play a role in affecting the danceability of a song. However, the low R2 value shows that it may not be the only variable at play. Thus, a multiple linear regression may be a better choice in order to see what variables affect danceability.

# danceability vs valence 
model = lm(data = music, danceability ~ valence)
r2 = summary(model)$r.squared 
p = summary(model)$coefficients[2, 4]
data.frame("R2" = r2, "p_value" = p)
##          R2      p_value
## 1 0.1442375 3.004083e-07

Danceability vs Valence (continue)

The relationship between valence and danceability seems to be positive. This means that the more positive a song, the more likely it is that the song is good for dancing. However, as mentioned on the previous slide, there can be more variables at play because of the low R2 value.

Final Thoughts

Taylor Swift has made a large variety of music over the years. This means that her songs have various amounts of positivity, energy, danceability, lengths, and popularity. Thus, there is a lot more analysis that can be done on her music. For example, linear models can be made for each album instead of her entire music discography. The popularity of original albums can be compared to the Taylor’s Versions that have been coming out more recently. Also, the role of the song lengths and how they affect other variables can be analyzed. Overall, Taylor Swift has a large presence in the music industry, so the analysis of her work through Spotify data can reveal many details.