1 Introduction

The Top Spotify Tracks of 2017 dataset contains 100 the most popular songs, we will try to analyze the data see if we can find out what are the secret ingradients(Tempo, Keys, name) to make a song popular.

1.1 Data preparation

We will rescale some variables, such as danceability, energy , speechiness, liveness, valence and accoustiness for visualization purpose. In addition, We will combine some variables (key and mode) and categorize tempo for them to make sense musically.

1.1.1 Data rescaling

we will rescale danceability, energy , speechiness, liveness, valence and accoustiness by multiply 100.

  • Danceability describes how suitable a track is for dancing,0.0 is least danceable and 1.0 is most danceable.

*Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity.

  • Speechiness detects the presence of spoken words in a track. Values above 0.66 describe tracks that are probably made entirely of spoken words.

  • acousticness :A confidence measure from 0.0 to 1.0 of whether the track is acoustic

  • liveness: Higher liveness values represent an increased probability that the track was performed live.

  • valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric),

music$danceability<- music$danceability*100
music$energy<- music$energy*100
music$speechiness<- music$speechiness*100
music$acousticness<- music$acousticness*100
music$instrumentalness<- music$instrumentalness*100
music$liveness<- music$liveness*100
music$valence<- music$valence*100

1.1.2 Data preparation

We will categrize the tempo by general classical music standard. In addition. we will show the tonality of each song by combining the variable key and mode. then group the tonalatiy of the songs by key signature.

  • Key :The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C sharp/D flat, 2 = D, and so on.
  • Mode : Major is represented by 1 and minor is 0.

Key Characterstics and Mood * C major :Innocently Happy * C minor :Innocently Sad, Love-Sick * C sharp minor : Despair, Wailing, Weeping * C sharp major: Grief, Depressive * D major: Triumphant, Victorious War-Cries * D minor: Serious, Pious, Ruminating * D sharp minor: Deep Distress, Existential Angst * D sharp major: Cruel, Hard, Yet Full of Devotion * E major: Quarrelsome, Boisterous, Incomplete Pleasure * E minor: Effeminate, Amorous, Restless * F major: Furious, Quick-Tempered, Passing Regret * E sharp minor: Obscure, Plaintive, Funereal * F sharp major : Conquering Difficulties, Sighs of Relief * F sharp minor: Gloomy, Passionate Resentment * G major: Serious, Magnificent, Fantasy * G minor: Discontent, Uneasiness * G sharp major : Death, Eternity, Judgement * G sharp minor : Grumbling, Moaning, Wailing * A major : Joyful, Pastoral, Declaration of Love * A minor : Tender, Plaintive, Pious * A sharp major: Joyful, Quaint, Cheerful * Bflat minor : Terrible, the Night, Mocking * B major: Harsh, Strong, Wild, Rage * B minor: Solitary, Melancholic, Patience

music$tone <- ifelse(music$mode==0, "minor", "major")

music$scale <-ifelse (music$key==0, "C",
                      ifelse(music$key==1, "C#",
                             ifelse(music$key==2, "D",
                               ifelse(music$key==3,"D#" ,
                                      ifelse(music$key==4, "E",
                                             ifelse(music$key==5, "E#",
                                                    ifelse(music$key==6,"F",
                                                           ifelse(music$key==7, "F#",
                                                                  ifelse(music$key==8, "G",
                                                                         ifelse(music$key==9,"G#",
                                                                                ifelse(music$key==10,"A",
                                                                                       "A#"))))))))))) 


music$keys <- paste(music$scale, music$tone, sep= " ")

music$keysign <- ifelse (music$keys %in% c("C major","A minor" ), "Original",
                         ifelse(music$keys %in% c("G major","E minor","D# minor" ), "F sharp",
                          ifelse(music$keys %in% c("D major","B minor" ), "F,C Sharp",
                            ifelse(music$keys %in% c("A major","F# minor" ), "F,C,G Sharp",    
                            ifelse(music$keys %in% c("E major","C# minor" ), "F,A,G,D Sharp",
                            ifelse(music$keys %in% c("B major","G# minor" ), "F,C,G,D,A Sharp",
                            ifelse(music$keys %in% c("F# major","G# minor" ), "F,A,C,G,D,A,E Sharp",
                             ifelse(music$keys %in% c("C# major","A# minor" ), "F,A,C,G,D,A,E,B Sharp",
                             ifelse(music$keys %in% c("F major","D minor","E# major" ), "B Flat",
                              ifelse(music$keys %in% c("G minor", "A# major" ), "B,E Flat",
                               ifelse(music$keys %in% c("C minor","D# major"), "B,E, A Flat",
                               ifelse(music$keys %in% c("F minor","G# major","E# minor" ), "B,A,D,E Flat",
                              "Unknown"))))))))))))


music$keylabel <- ifelse(music$keys== "C major",  "C major :Innocently Happy", 
                  ifelse (music$keys=="C minor", "C minor :Innocently Sad, Love-Sick",
                  ifelse(music$keys=="C# minor" , "C sharp minor : Despair, Wailing, Weeping", 
                   ifelse(music$keys=="C# major","C sharp major:Fullness, Sonorousness, Euphony",
                  ifelse(music$keys=="D major", "D major: Triumphant, Victorious War-Cries",
                  ifelse(music$keys=="D minor", "D minor: Serious, Pious, Ruminating",
                   ifelse(music$keys=="D# minor", "D sharp minor: Deep Distress, Existential Angst",
                   ifelse(music$keys=="D# major", "Cruel, Hard, Yet Full of Devotion",
                   ifelse(music$keys=="E major", "E major: Quarrelsome, Boisterous, Incomplete Pleasure",
                    ifelse(music$keys=="E minor", "E minor: Effeminate, Amorous, Restless",
                    ifelse(music$keys %in% c("E# major" ,"F major"), "F major: Furious, Quick-Tempered, Passing Regret",
                    ifelse(music$keys %in% c("F minor", "E# minor"),  "F minor: Complaisance and calm",
                    ifelse(music$keys=="F# major", "F sharp major : Conquering Difficulties, Sighs of Relief",
                    ifelse(music$keys=="F# minor", "F sharp minor: Gloomy, Passionate Resentment",
                    ifelse(music$keys=="G major", "G major: Serious, Magnificent, Fantasy",
                    ifelse(music$keys=="G minor", "G minor: Discontent, Uneasiness",
                    ifelse(music$keys=="G# major", "G sharp major : Death, Eternity, Judgement",
                    ifelse(music$keys=="G# minor", "G sharp minor: Grumbling, Moaning, Wailing",
                    ifelse(music$keys=="A major", "A major : Joyful, Pastoral, Declaration of Love",
                    ifelse(music$keys=="A minor", "A minor : Tender, Plaintive, Pious",
                    ifelse (music$keys=="A# major", "A sharp major: Joyful, Quaint, Cheerful",
                    ifelse(music$keys=="A# minor", "A sharp minor: Terrible, the Night, Mocking",
                           "Unknown")  )))))))      )))       )))))))))))



# tempo classification
music$tempoc[music$tempo >= 66 & music$tempo <76] <- "Adagio" 
music$tempoc[music$tempo >=  76 & music$tempo <108] <- "Andante" 
music$tempoc[music$tempo >= 108 & music$tempo <120] <- "Moderato" 
music$tempoc[music$tempo >= 120 & music$tempo <156 ] <- "Allegro" 
music$tempoc[music$tempo >= 156 & music$tempo <176] <- "Vivace" 
music$tempoc[music$tempo >= 176 ] <- "Presto" 


music$tlabel[music$tempo >= 66 & music$tempo <76] <-" 66- 76"
music$tlabel[music$tempo >=  76 & music$tempo <108] <- "76-108" 
music$tlabel[music$tempo >= 108 & music$tempo <120] <- "108- 120" 
music$tlabel[music$tempo >= 120 & music$tempo <156 ] <- "120 -156" 
music$tlabel[music$tempo >= 156 & music$tempo <176] <- "156-176" 
music$tlabel[music$tempo >= 176 ] <- "> 176" 

2 Analysis on 100 Top song tracks

2.2 Individual Player analysis

We will see the radarchart of artists who has more then two songs.

2.2.1 Ed sheeran

Ed sheeran demonstrated a substantial diversity abilities in his songs. Especially Shape Of You, the tension is presented in various aspects.

tsong1 <- filter(music, artists %in% c("Ed Sheeran"))
tsong2<-  tsong1[, c(2,4,5,9,10,12,13)] 


# radar chart
rownames(tsong2)=tsong2$name
tsong3 <- tsong2[, c(2,3,4,5,6,7)]
data=rbind(rep(100,6) , rep(0,6) , tsong3)


colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9), rgb(0.5,0.4,0.8,0.9) )
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4) , rgb(0.5,0.4,0.8,0.4))
radarchart( data  , axistype=1 , 
    #custom polygon
    pcol=colors_border , pfcol=colors_in , plwd=4 , plty=1,
    #custom the grid
    cglcol="grey", cglty=1, axislabcol="grey", caxislabels=seq(0,100,20), cglwd=0.5,
    #custom labels
    vlcex=1 , title="Ed Sheeran Top Songs"
    )
legend(x=1.3, y=1.0, legend = rownames(data[-c(1,2),]), bty = "n", pch=20 , col=colors_in , text.col = "black", cex=0.7, pt.cex=1.5)

2.2.2 The Chainsmokers

Chainsmoker’s songs emphasiszed in energy and danceability.

tsong11 <- filter(music, artists %in% c("The Chainsmokers"))
tsong21<-  tsong11[, c(2,4,5,9,10,12,13)] 


# radar chart
rownames(tsong21)=tsong21$name
tsong31 <- tsong21[, c(2,3,4,5,6,7)]
data1=rbind(rep(100,6) , rep(0,6) , tsong31)


colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9), rgb(0.5,0.4,0.8,0.9) )
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4) , rgb(0.5,0.4,0.8,0.4))
radarchart( data1  , axistype=1 , 
    #custom polygon
    pcol=colors_border , pfcol=colors_in , plwd=4 , plty=1,
    #custom the grid
    cglcol="grey", cglty=1, axislabcol="grey", caxislabels=seq(0,100,20), cglwd=0.5,
    #custom labels
    vlcex=1 , title="The Chainsmokers Top Songs"
    )
legend(x=1.3, y=1.0, legend = rownames(data1[-c(1,2),]), bty = "n", pch=10 , col=colors_in , text.col = "black", cex=0.6, pt.cex=1.5)

2.2.3 Drake

Drake’s work definitely focus on danceability

tsong13 <- filter(music, artists %in% c("Drake"))
tsong23<-  tsong13[, c(2,4,5,9,10,12,13)] 


# radar chart
rownames(tsong23)=tsong23$name
tsong33 <- tsong23[, c(2,3,4,5,6,7)]
data3=rbind(rep(100,6) , rep(0,6) , tsong33)


colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9))
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4))
 radarchart( data3  , axistype=1 , 
    #custom polygon
    pcol=colors_border , pfcol=colors_in , plwd=4 , plty=1,
    #custom the grid
    cglcol="grey", cglty=1, axislabcol="grey", caxislabels=seq(0,100,20), cglwd=0.5,
    #custom labels
    vlcex=1 , title="Drake Top Songs"
    )
legend(x=1.3, y=1.0, legend = rownames(data3[-c(1,2),]), bty = "n", pch=10 , col=colors_in , text.col = "black", cex=0.6, pt.cex=1.5)

2.2.4 Martin Garrix

Martin’s tracks are very interstering, his songs almost have showed an identical distribution. we can see his liveness is quite strong.

tsong14 <- filter(music, artists %in% c("Martin Garrix"))
tsong24<-  tsong14[, c(2,4,5,9,10,12,13)] 


# radar chart
rownames(tsong24)=tsong24$name
tsong34 <- tsong24[, c(2,3,4,5,6,7)]
data4=rbind(rep(100,6) , rep(0,6) , tsong34)


colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9))
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4))
radarchart( data4  , axistype=1 , 
    #custom polygon
    pcol=colors_border , pfcol=colors_in , plwd=4 , plty=1,
    #custom the grid
    cglcol="grey", cglty=1, axislabcol="grey", caxislabels=seq(0,100,20), cglwd=0.5,
    #custom labels
    vlcex=1 , title="Martin Garrix Top Songs"
    )
legend(x=1.3, y=1.0, legend = rownames(data4[-c(1,2),]), bty = "n", pch=10 , col=colors_in , text.col = "black", cex=0.6, pt.cex=1.5)

3 Music Theory 101

In order for our work to make sense musically, we need to go over 2 basic components of music very quick: Key and Rhythm.

3.1 Tonality VS Key signature

Starbucks?

Beer?

Key signature refers to what notes are sharp and flat. I understand tonality as the color of a music piece, two pieces of music can have same key signature, but totally different tonality. For example, First chapter of Mozart G major violin is happy and bubbly whereas First Chapter Mendelssohn E minor violin concerto shares a color of myseterious and sorrowness. But they are both in F sharp.

3.2 Tempo Classifation

Tempo

  • Adagio - slowly with great expression[10] (66-76 bpm)

  • Andante - at a walking pace (76-108 bpm)

  • Moderato - at a moderate speed (108-120 bpm)

  • Allegro - fast, quickly, and bright (120-156 bpm)

  • Vivace - lively and fast (156-176 bpm)

  • Presto - very, very fast (168-200 bpm)

4 Basic Music Theory Analysis

We will take a look at what tone the top 100 songs use most frequently, and what is the most often used key signituares. and temp

4.1 Tonality of Top 100 songs

In general , Major chords revoke a happy feeling, whereas Minor chords represent sadness. I m very surprised Csharp major is used so often. In addition, Top 3 categories all share a grief, sad emotions.

tone1 <- group_by(music, keylabel )
tone2 <- dplyr::summarise(tone1,  count=n())
tone2 <- arrange(tone2, desc(count))


# Tonality treemap
treemap(tone2, index="keylabel", vSize="count", type="index", 
        palette="Pastel2", title="Top 100 Songs Key charactersics and Emotion", fontsize.title=12)

4.2 Major vs Minor

We can see more songs use Major chords indeed.

ctone1 <- group_by(music, keys )
ctone2 <- dplyr::summarise(ctone1,  count=n())
ctone2 <- arrange(ctone2, desc(count))


# Tonality treemap
treemap(ctone2, index="keys", vSize="count", type="index", 
        palette="Pastel2", title="Top 100 Songs Key charactersics", fontsize.title=12)

# major vs Minor
major <- group_by(music, tone )
major2 <- dplyr::summarise(major,  count=n())

# Major treemap
treemap(major2, index="tone", vSize="count", type="index", 
        palette="Pastel1", title="Top 100 Songs Major", fontsize.title=12)

5 Key signature analysis

5.1 Emotion with Key signature

From the graph, we can see more than 15 songs used C# major and A# minor, which means F,A,C,G,D,A,B sharp. Im quite surprised the popular songs used such complicated key signatures.

keystone1 <- group_by(music, keysign, keys )
keystone2 <- dplyr::summarise(keystone1,  count=n())
keystone2 <- arrange(keystone2, desc(count))
keystone2$pos <- keystone2$count- (0.5*keystone2$count)

keysp1<- ggplot(data=keystone2, aes(x=reorder(keysign,count), y=count))+geom_bar(aes(y=count, fill=keys),stat="identity", alpha=0.8 ) +ylab("Value")+ xlab("Variables to a song")+coord_flip()+ggtitle("Key signature and Emotion")
keysp1

5.2 What key signiture uses the most

# Key signature count
keys1 <- group_by(music, keysign )
keys2 <- dplyr::summarise(keys1,  count=n())
keys2 <- arrange(keys2, desc(count))

# key signiature treemap
treemap(keys2, index="keysign", vSize="count", type="index", 
        palette="Pastel2", title="Top 100 Key Signature", fontsize.title=12)

6 Rythme analysis

6.1 Tempo Classification

Now we tak a look at what is the most popular Tempo type amont the top 100 songs, We will use the categorized tempo type to do the analysis. We can see about half of the songs has tempo between 76 to 108. So Andante is the most used tempo. Adagio is the least used tempo. I guess slow songs is not that popular then.

tempo1 <- group_by(music, tempoc ,tlabel )
tempo2 <- dplyr::summarise(tempo1,  count=n())
tempo2 <- arrange(tempo2, desc(count))

tempop1<- ggplot(data=tempo2, aes(x=reorder(tempoc,count), y=count))+geom_bar(aes(y=count),stat="identity", alpha=0.8,fill="skyblue" )+ ylab("Count")+ xlab("Tempo Type")+ggtitle("What is the most popular Tempo type? ")+
  geom_text(aes(label=tlabel), vjust=1, color="maroon", size=3.5)+ theme_minimal()
tempop1

6.2 general Tempo range

music$id1=seq(1, nrow(music))

plot1 <- ggplot(music, aes(x=reorder(id1,tempo),y=tempo)) +geom_bar(stat = "identity", col = "pink", fill = "pink")+theme_minimal()
plot1

qqnorm(music$tempo)
qqline(music$tempo, col="red")

7 Categorical analysis on Key and Tempo

7.1 Heatmap Valence respect to Key and Tempo

The songs showed the most valence with the Moderato tempo in C# major.

vorig <- group_by(music, tempoc , keys)
vorig1 <- summarise(vorig,  count=n() ,rate= mean(valence))

 ggplot(vorig1, aes(x=tempoc, y=keys, fill = rate)) + 
    geom_tile(colour = "white")  +
    scale_fill_gradient(low="skyblue", high="Pink") +
    labs(x="Tempo", y=NULL, title="Heatmap of Valence" ,fill="Valence")

7.2 Heatmap Danceability respect to Key and Tempo

We can see from the Adagio and G minor give the most dance ability. We can try to dance with Bach Sonata No.1 in G minor, BWV 1001- Adagio at home.

orig <- group_by(music, tempoc , keys)
orig1 <- summarise(orig,  count=n() ,rate= mean(danceability))

 ggplot(orig1, aes(x=tempoc, y=keys, fill = rate)) + 
    geom_tile(colour = "white")  +
    scale_fill_gradient(low="lightgreen", high="violetred") +
    labs(x="Tempo", y=NULL, title="Heatmap of Danceability" ,fill="Danceability")

7.3 Heatmap Energy respect to Key and Tempo

we can see Vivace and A# major demonstrated high energy. I think higher speed will represent higher energy. However, we notice with Vivace and C# major, the energy is quite low. I think that’s because C# has grief depressive feeling

eorig <- group_by(music, tempoc , keys)
eorig1 <- summarise(eorig,  count=n() ,rate= mean(energy))

 ggplot(eorig1, aes(x=tempoc, y=keys, fill = rate)) + 
    geom_tile(colour = "white")  +
    scale_fill_gradient(low="yellow", high="red") +
    labs(x="Tempo", y=NULL, title="Heatmap of Engery" ,fill="Engery")

7.4 Heatmap Speechiness respect to Key and Tempo

 sorig <- group_by(music, tempoc , keys)
sorig1 <- summarise(eorig,  count=n() ,rate= mean(speechiness))

 ggplot(sorig1, aes(x=tempoc, y=keys, fill = rate)) + 
    geom_tile(colour = "white")  +
    scale_fill_gradient(low="yellow", high="green") +
    labs(x="Tempo", y=NULL, title="Heatmap of Speechiness" ,fill="Speechiness")

7.4.1 Mosaic Plot

Now we want to inverstigate if the key has relationship with rythmn . From the mosaic graph, there is no evidence to show Minor Key or Major Key has association with the Rythmn.

ktstat <- group_by(music, tone, tempoc)
ktstat3 <- summarise(ktstat,  count=n())

v.lm <- loglm(count ~  tempoc+ tone, data=ktstat3)
v.m1<-mosaic(v.lm , clip=FALSE, gp_args = list(interpolate = c(1, 1.8)))

8 Does the name of a song matter?