Introduction

Music is an art form that has been with humanity for ages. It can be said that it is constantly evolving, since there are new genres invented every now and then, but I would argue that some elements in popular music stay the same. Long ago, musicians discovered sounds that go well with each other and these sounds were combined into chords, which then formed chord progressions. When examining many popular pop songs from different times, one will find that they share exactly the same chord progressions. Since there are many different layers to the song, the casual listener may have not realized this, but in music theory terms this is quite boring. That is probably why genres like jazz exist - to push what is considered to be musically harmonious into unknown territories. In this project, I used association rules mining to discover chords that frequently appear together in compositions of a famous jazz pianist - Bill Evans. The results can be inspiring to any musician interested in broadening their songwriting repertoire.

Dataset

The source of the data is https://www.e-chords.com, which I scraped with BeautifulSoup and Selenium and saved song names, song keys and chords separated by a semicolon into a .csv file. I presented the loaded dataset below.

billEvans <- read_csv('bill_evans_chords.csv')

name	key	chords
Autumn Leaves	Dm	Gm;Gm9;Gm7;C7;Am;Dm7;Am7;E7;Dsus4;Dm
My Foolish Heart	A	D;D7M;D6;Fdim;F#m;F#m7+;F#m7;B7/9-;A;D9;E7;A7M;A7;Bm5-/7;A7/13;Dm6;F#7;B7;A7M/9
Lucy In The Sky With Diamonds	E	E;D;C;A;F;G
Like Someone In Love	G	G;G/F#;Em7;D9;D7;Cdim;Bm7;G7;C;Am7;Cm7;Edim;Am;Am7+;Am7/G;A7;F
A Time For Love	D	A7;D7M;D;D9;Gdim;D7M/9;Bm7;F#m;Fdim;A;A9;Em7;A7/9;F#7;Cdim;Bm;B7;A7M;Bm7/E;E;C#7;Edim
Emily	D	D7M;G6/9;Em7;A7;D6;D9;D7/9;G;Gdim;A;B;G#7;C#m7;F#7;Bm7;Bm7/E;E7;E7/13;A7/13-;D;D7;Bm;F#m;Fdim;B7;Cdim;G/F#;G/B

The planned preprocessing of the chords requires that the third column is separated into individual rows.

billEvans <- billEvans %>% 
  select(name, key, chords) %>%
  separate_rows(chords, sep=';') %>%
  filter(grepl("^[A-Z][#b]*", chords))

name	key	chords
Autumn Leaves	Dm	Gm
Autumn Leaves	Dm	Gm9
Autumn Leaves	Dm	Gm7
Autumn Leaves	Dm	C7
Autumn Leaves	Dm	Am
Autumn Leaves	Dm	Dm7

Preprocessing

Transposing all chords from the original key to the key of C major or C minor

It will be much easier to understand the results of association rules mining when all songs are in the same key. For anyone not familiar with music theory, changing a key means that the song gets a higher or lower pitch than before. Its structure is not affected, because all chords are relative to a key. First, I defined the positions of each note in the chromatic scale relative to the C note.

keys <- list("C"=1, "C#"=2, "D"=3, "Eb"=4, "E"=5, "F"=6, "F#"=7, "G"=8, "G#"=9, "A"=10, "Bb"=11, "B"=12)

I need to take into account that depending on the key, some chords may have two different names. The function below changes these chords into a common name to simplify the transposition.

replaceSimilar <- function (chord){
  if (grepl("^D#", chord)){
    return(str_replace_all(chord, "^D#", "Eb"))
  }
  else if (grepl("^Db", chord)){
    return(str_replace_all(chord, "^Db", "C#"))
  }
  else if (grepl("^Gb", chord)){
    return(str_replace_all(chord, "^Gb", "F#"))
  }
  else if (grepl("^Ab", chord)){
    return(str_replace_all(chord, "^Ab", "G#"))
  }
  else if (grepl("^A#", chord)){
    return(str_replace_all(chord, "^A#", "Bb"))
  }
  else {
    return(chord)
  }
}

Now in order to change the keys of the songs into C major or C minor, each chord in a song needs to be moved by the difference of that song key’s root note to the C note. I wrote functions that achieve this result with regular expressions included in the stringr package from tidyverse. I extracted only the chord’s root note (so a capital letter plus a “b” sign for a “flat” note or a “#” for a “sharp” note), because the other characteristics of the chord (whether it is major, minor, diminished or a 7 chord etc.) will stay constant.

transposeKey <- function (key, chord){
  idxBase <- keys[["C"]]
  idxKey <- keys[[str_extract(key, "^[A-Z][#b]?")]]
  diffKey <- idxKey-idxBase
  idxNew <- (12+keys[[str_extract(chord, "^[A-Z][#b]?")]]-diffKey)%%12
  if (idxNew != 0){
    return(names(keys)[idxNew])
  }
  else {
    return(names(keys)[12])
  }
}

changeChord <- function (key, chord){
  return(str_replace_all(chord, "^[A-Z][#b]?", transposeKey(key, chord)))
}

Some chords are so-called “slash” chords, meaning they consist of bass notes indicated by the note after the slash. The function below will include these notes in the transposition.

slashChord <- function(key, chord) {
  if (grepl("\\/[A-Z]", chord)){
    afterSlash <- str_extract(chord, "(?<=\\/)[A-Z]")
    afterSlash <- replaceSimilar(afterSlash)
    return(str_replace_all(chord, "(?<=\\/).*", changeChord(key, afterSlash)))
  }
  else {
    return(chord)
  }
}

Finally, the functions can be applied to the data.

billEvans$key <- mapply(replaceSimilar, chord=billEvans$key)
billEvans$chords <- mapply(replaceSimilar, chord=billEvans$chords)
billEvans$newKey <- mapply(changeChord, key=billEvans$key, chord=billEvans$key)
billEvans$newChord <- mapply(changeChord, key=billEvans$key, chord=billEvans$chords)
billEvans$newChord <- mapply(slashChord, key=billEvans$key, chord=billEvans$newChord)

Representing the chords as roman numerals

Since chord progressions are independent of key, they are often written as roman numerals. The roman numeral represents one of seven chords characteristic to the specific key. I have decided to use this notation, because it will make it much more clear, which chords are out of key in a song.

cMajor <- list("C"="I", "C7M"='I', "Dm"="ii", "Dm7"="ii", "Em"="iii", "Em7"="iii", "F"="IV", "F7M"="IV", "G"="V", "G7"="V", "Am"="vi", "Am7"="vi", "Bdim"="vii", "Bdim7"="vii")
cMinor <- list("Cm"="i", "Cm7"="i", "Ddim"="ii", "Ddim7"="ii", "Eb"="III", "Eb7M"="III", "Fm"="iv", "Fm7"="iv", "Gm"="v", "Gm7"="v", "G#"="VI", "G#7M"="VI", "Bb"="VII", "Bb7"="VII")

newNotation <- function(key, chord){
  if (key=="C"){
    if (is.null(cMajor[[str_extract(chord, "^[A-Z][#b]?(m)?(dim)?(7)?(M)?")]])==FALSE){
      return(cMajor[[str_extract(chord, "^[A-Z][#b]?(m)?(dim)?(7)?(M)?")]])
    }
    else {
      return(chord)
    }
  }
  else {
    if (is.null(cMinor[[str_extract(chord, "^[A-Z][#b]?(m)?(dim)?(7)?(M)?")]])==FALSE){
      return(cMinor[[str_extract(chord, "^[A-Z][#b]?(m)?(dim)?(7)?(M)?")]])
    }
    else {
      return(chord)
    }
  }
}

billEvans$newChord <- mapply(newNotation, key=billEvans$newKey, chord=billEvans$newChord)

Transforming data frame into transactions usable by the association rules mining algorithms

write.table(billEvans[,c(1,5)], file="transactions_bill_evans.csv", sep=";", row.names=FALSE)

chordsEvans <- read.transactions("transactions_bill_evans.csv", sep=";", format="single", 
                                       header=TRUE, cols=c(1:2))

Association rules mining

Chords frequency

Before mining the association rules, I will examine the frequency of chords on a bar plot. It is visible that some frequent chords are not represented by a roman numeral and therefore, do not belong in the keys of C major or C minor. Combined with the insight from the association rules, I will be able to tell when these “unusual” chords appear.

chordsFreq <- itemFrequency(chordsEvans, type="absolute")
chordsFreq <- sort(chordsFreq, decreasing=TRUE)[0:20]
barplot(chordsFreq, las=2)

Eclat algorithm rules

In order to discover the association rules, Eclat algorithm can be applied. The alternative is the Apriori algorithm, but it is more memory expensive and usually works slower than Eclat. I will keep to the default support value of 0.2 and because I want lengthier chord progressions, I will keep their length set at a minimum of four chords.

chordsEclat <- eclat(chordsEvans, parameter=list(supp=0.2, minlen=4))
eclatRules <- ruleInduction(chordsEclat, chordsEvans, confidence=0.6)

##      lhs                   rhs  support   confidence lift     itemset
## [1]  {Fm7, ii, V}       => {I}  0.2358491 1.0000000  1.261905 1      
## [2]  {Fm7, I, V}        => {ii} 0.2358491 1.0000000  1.432432 1      
## [3]  {Fm7, I, ii}       => {V}  0.2358491 1.0000000  1.277108 1      
## [4]  {Ebdim, ii, IV, V} => {I}  0.2075472 1.0000000  1.261905 2      
## [5]  {Ebdim, I, IV, V}  => {ii} 0.2075472 1.0000000  1.432432 2      
## [6]  {Ebdim, I, ii, V}  => {IV} 0.2075472 0.9166667  1.408213 2      
## [7]  {Ebdim, I, ii, IV} => {V}  0.2075472 1.0000000  1.277108 2      
## [8]  {Ebdim, IV, V}     => {ii} 0.2075472 1.0000000  1.432432 3      
## [9]  {Ebdim, ii, V}     => {IV} 0.2075472 0.9166667  1.408213 3      
## [10] {Ebdim, ii, IV}    => {V}  0.2075472 1.0000000  1.277108 3

Rules sorted by support

When sorting by support, only the most frequent combinations are highlighted. The results are to be expected, as these chord progressions are jazz staples. I-IV-ii-V, known as a Montgomery-Ward bridge is often used as a bridge of the jazz standard according to Wikipedia https://en.wikipedia.org/wiki/Montgomery-Ward_bridge. The other progression: I-vi-ii-V, is a very popular jazz “turnaround”, which ends a particular section of the song. Based on the confidence value in rules 1, 5 and 9, it is also clear that the tonic (the “I” chord or in this case C major or C major 7) always accompanies the other chords from key.

supportSortedRules <- sort(eclatRules, by="support", decreasing=TRUE) 
inspect(head(supportSortedRules, 10))

##      lhs            rhs  support   confidence lift     itemset
## [1]  {ii, IV, V} => {I}  0.5754717 1.0000000  1.261905 314    
## [2]  {I, IV, V}  => {ii} 0.5754717 0.9104478  1.304155 314    
## [3]  {I, ii, V}  => {IV} 0.5754717 0.8356164  1.283701 314    
## [4]  {I, ii, IV} => {V}  0.5754717 0.9838710  1.256510 314    
## [5]  {ii, V, vi} => {I}  0.5377358 1.0000000  1.261905 313    
## [6]  {I, V, vi}  => {ii} 0.5377358 0.9047619  1.296010 313    
## [7]  {I, ii, vi} => {V}  0.5377358 1.0000000  1.277108 313    
## [8]  {I, ii, V}  => {vi} 0.5377358 0.7808219  1.313764 313    
## [9]  {IV, V, vi} => {I}  0.5000000 1.0000000  1.261905 312    
## [10] {I, V, vi}  => {IV} 0.5000000 0.8412698  1.292386 312

Rules sorted by confidence

Rules with the highest confidence have the value of 1, which means that lhs and rhs always appear together. There are many rules like that, as exemplified below. Although the support value is lower, the appearance of the Fm7 and Ebdim chords is curious. They are both out of key, the former being a minor 7 chord instead of a major 7 chord and the latter not belonging in the key of C major at all.

confidenceSortedRules <- sort(eclatRules, by="confidence", decreasing=TRUE) 
inspect(head(confidenceSortedRules, 10))

##      lhs                   rhs  support   confidence lift     itemset
## [1]  {Fm7, ii, V}       => {I}  0.2358491 1          1.261905 1      
## [2]  {Fm7, I, V}        => {ii} 0.2358491 1          1.432432 1      
## [3]  {Fm7, I, ii}       => {V}  0.2358491 1          1.277108 1      
## [4]  {Ebdim, ii, IV, V} => {I}  0.2075472 1          1.261905 2      
## [5]  {Ebdim, I, IV, V}  => {ii} 0.2075472 1          1.432432 2      
## [6]  {Ebdim, I, ii, IV} => {V}  0.2075472 1          1.277108 2      
## [7]  {Ebdim, IV, V}     => {ii} 0.2075472 1          1.432432 3      
## [8]  {Ebdim, ii, IV}    => {V}  0.2075472 1          1.277108 3      
## [9]  {Ebdim, ii, IV}    => {I}  0.2075472 1          1.261905 4      
## [10] {Ebdim, I, IV}     => {ii} 0.2075472 1          1.432432 4

Rules sorted by lift

Lift is another useful measure for inspecting association rules. It informs the researcher how much more probable it is to find rhs in the company of lhs, compared to when assuming that they are unrelated. Here, the exemplified rules are quite bizarre. Some of them do not include the tonic and they comprise of chords out of key. D7, E7 and A7 should be minor 7 chords in the key of C major, so the switch from minor 7 to 7 chord for the ii, iii and vi is an interesting procedure worth examining during composing. What also caught my eye, is that the appearance of A7 is completely dictated by E7, iii (Em or Em7) and IV (F or F7M).

liftSortedRules <- sort(eclatRules, by="lift", decreasing=TRUE) 
inspect(head(liftSortedRules, 10))

##      lhs                        rhs  support   confidence lift     itemset
## [1]  {A7, D7, V}             => {E7} 0.2547170 0.8181818  1.971074 103    
## [2]  {E7, I, ii, iii, IV, V} => {A7} 0.2169811 1.0000000  1.962963  56    
## [3]  {E7, ii, iii, IV, V}    => {A7} 0.2169811 1.0000000  1.962963  57    
## [4]  {E7, I, ii, iii, IV}    => {A7} 0.2169811 1.0000000  1.962963  58    
## [5]  {E7, I, iii, IV, V}     => {A7} 0.2169811 1.0000000  1.962963  59    
## [6]  {E7, iii, IV, V}        => {A7} 0.2169811 1.0000000  1.962963  60    
## [7]  {E7, I, iii, IV}        => {A7} 0.2169811 1.0000000  1.962963  61    
## [8]  {E7, ii, iii, IV}       => {A7} 0.2169811 1.0000000  1.962963  62    
## [9]  {E7, iii, IV}           => {A7} 0.2169811 1.0000000  1.962963  70    
## [10] {A7, D7, I, V, vi}      => {E7} 0.2075472 0.8148148  1.962963  93

Cleaned rules

Cleaning association rules involves getting rid of redundant, insignificant and non-maximal rules. This leaves rules that are more general and statistically significant based on the Fischer’s exact test. This will help make the rules more comprehensible and easier to visualize.

rulesClean<-eclatRules[!is.redundant(eclatRules)]
rulesClean<-rulesClean[is.significant(rulesClean, transactionsEvans)]
rulesClean<-rulesClean[is.maximal(rulesClean)]

##      lhs                 rhs   support   confidence lift     itemset
## [1]  {A7, E7, ii, vi} => {iii} 0.2358491 0.8620690  1.692209  55    
## [2]  {E7, iii, V}     => {A7}  0.2547170 0.9310345  1.827586  67    
## [3]  {A7, iii, V}     => {E7}  0.2547170 0.7714286  1.858442  67    
## [4]  {E7, I, iii}     => {A7}  0.2547170 0.9310345  1.827586  68    
## [5]  {A7, I, iii}     => {E7}  0.2547170 0.7714286  1.858442  68    
## [6]  {E7, iii, IV}    => {A7}  0.2169811 1.0000000  1.962963  70    
## [7]  {A7, iii, IV}    => {E7}  0.2169811 0.7666667  1.846970  70    
## [8]  {E7, iii, V}     => {vi}  0.2547170 0.9310345  1.566502  79    
## [9]  {E7, I, iii}     => {vi}  0.2547170 0.9310345  1.566502  80    
## [10] {D7, E7, V}      => {A7}  0.2547170 0.8709677  1.709677 103

Cleaned rules sorted by support

supportSortedClean <- sort(rulesClean, by="support", decreasing=TRUE) 
inspect(head(supportSortedClean, 10))

##      lhs            rhs  support   confidence lift     itemset
## [1]  {ii, IV, V} => {I}  0.5754717 1.0000000  1.261905 314    
## [2]  {I, IV, V}  => {ii} 0.5754717 0.9104478  1.304155 314    
## [3]  {I, ii, V}  => {IV} 0.5754717 0.8356164  1.283701 314    
## [4]  {I, ii, IV} => {V}  0.5754717 0.9838710  1.256510 314    
## [5]  {ii, V, vi} => {I}  0.5377358 1.0000000  1.261905 313    
## [6]  {I, V, vi}  => {ii} 0.5377358 0.9047619  1.296010 313    
## [7]  {I, ii, vi} => {V}  0.5377358 1.0000000  1.277108 313    
## [8]  {I, ii, V}  => {vi} 0.5377358 0.7808219  1.313764 313    
## [9]  {IV, V, vi} => {I}  0.5000000 1.0000000  1.261905 312    
## [10] {I, V, vi}  => {IV} 0.5000000 0.8412698  1.292386 312

Cleaned rules sorted by confidence

confidenceSortedClean <- sort(rulesClean, by="confidence", decreasing=TRUE) 
inspect(head(confidenceSortedClean, 10))

##      lhs              rhs  support   confidence lift     itemset
## [1]  {E7, iii, IV} => {A7} 0.2169811 1          1.962963  70    
## [2]  {A7, E7, I}   => {V}  0.3396226 1          1.277108 133    
## [3]  {E7, I, ii}   => {V}  0.3490566 1          1.277108 167    
## [4]  {A7, iii, IV} => {ii} 0.2830189 1          1.432432 204    
## [5]  {A7, iii, V}  => {ii} 0.3301887 1          1.432432 206    
## [6]  {A7, I, iii}  => {ii} 0.3301887 1          1.432432 207    
## [7]  {D7, iii, IV} => {ii} 0.2641509 1          1.432432 230    
## [8]  {iii, IV, vi} => {V}  0.3773585 1          1.277108 239    
## [9]  {iii, IV, vi} => {I}  0.3773585 1          1.261905 240    
## [10] {ii, iii, vi} => {V}  0.4339623 1          1.277108 243

Cleaned rules sorted by lift

After cleaning the association rules, there is some clarification relating to the inclusion of E7 in the chord combinations. It turns out that A7 and D7 work well with this chord.

liftSortedClean <- sort(rulesClean, by="lift", decreasing=TRUE) 
inspect(head(liftSortedClean, 10))

##      lhs              rhs  support   confidence lift     itemset
## [1]  {A7, D7, V}   => {E7} 0.2547170 0.8181818  1.971074 103    
## [2]  {E7, iii, IV} => {A7} 0.2169811 1.0000000  1.962963  70    
## [3]  {A7, D7, IV}  => {E7} 0.2075472 0.8148148  1.962963 106    
## [4]  {A7, D7, vi}  => {E7} 0.2075472 0.8148148  1.962963 107    
## [5]  {A7, D7, I}   => {E7} 0.2452830 0.8125000  1.957386 104    
## [6]  {A7, D7, ii}  => {E7} 0.2264151 0.8000000  1.927273 105    
## [7]  {A7, iii, V}  => {E7} 0.2547170 0.7714286  1.858442  67    
## [8]  {A7, I, iii}  => {E7} 0.2547170 0.7714286  1.858442  68    
## [9]  {A7, iii, IV} => {E7} 0.2169811 0.7666667  1.846970  70    
## [10] {E7, ii, IV}  => {A7} 0.2641509 0.9333333  1.832099 129

Visualization of the cleaned association rules sorted by lift

plot(rulesClean, method="grouped")

plot(rulesClean, method="graph", limit=10, engine="htmlwidget")

Sequential rules

A subset of association rules mining is sequential rules mining. It differs from the eclat and apriori algorithm in that it preserves the order in which items appear. So essentially it provides an information how past items influence the future item. This is perfect for analyzing chord progressions, because the discovered combinations of chords are ready to be used in the printed order. In R “arulesSequences” package implements the cSPADE (Sequential Pattern Discovery using Equivalence classes) algorithm that enables mining for sequential rules. It works by first counting the number of 1-element sequences and 2-element sequences. After that, subsequent n-element sequences are formed by joining (n-1) element sequences based on their id-lists. An id-list is a list of objects where the sequence occurs. This explanation comes from https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Sequence_Mining/SPADE. The original paper is available at https://link.springer.com/article/10.1023/A:1007652502315.

billEvansSeq <- billEvans %>%
  group_by(name) %>%
  mutate(eventID = row_number())

billEvansSeq <- billEvansSeq %>%
  relocate(eventID, .after=name)

billEvansSeq <- billEvansSeq[,c(1,2,6)]

names(billEvansSeq) <- c("sequenceID", "eventID", "items")

billEvansSeq <- billEvansSeq[order(billEvansSeq$sequenceID, billEvansSeq$eventID),]

write.table(billEvansSeq, file="seq_transactions_bill_evans.csv", sep=";", row.names=FALSE, col.names=FALSE)

seqEvans <- read_baskets("seq_transactions_bill_evans.csv", sep=";", info = c("sequenceID", "eventID"))

cspadeSeq <- cspade(seqEvans, parameter = list(support = 0.1), control = list(verbose = TRUE))

cspadeRules <- ruleInduction(cspadeSeq, confidence = 0.5, control = list(verbose = TRUE))

inspect(head(cspadeRules, 10))

##     lhs          rhs         support confidence      lift 
##   1 <{"i"}>   => <{"VII"}> 0.1132075  0.6000000 4.8923077 
##   2 <{"iv"}>  => <{"VII"}> 0.1132075  0.6315789 5.1497976 
##   3 <{"i"}>   => <{"VI"}>  0.1132075  0.6000000 5.3000000 
##   4 <{"I"}>   => <{"vi"}>  0.5566038  0.7023810 1.1817838 
##   5 <{"ii"}>  => <{"vi"}>  0.3773585  0.5405405 0.9094809 
##   6 <{"iii"}> => <{"vi"}>  0.2641509  0.5185185 0.8724280 
##   7 <{"IV"}>  => <{"vi"}>  0.3584906  0.5507246 0.9266161 
##   8 <{"V"}>   => <{"vi"}>  0.4811321  0.6144578 1.0338497 
##   9 <{"vi"}>  => <{"vi"}>  0.3018868  0.5079365 0.8546233 
##  10 <{"V"},                   
##      {"vi"}>  => <{"vi"}>  0.2452830  0.5098039 0.8577653 
##

Sequential rules sorted by support

Unfortunately, I cannot control the length of the rules and sorting by support provided pretty obvious pairs of chords from the same key.

supportSortedSeq <- sort(cspadeRules, by="support", decreasing=TRUE)

inspect(head(supportSortedSeq, 10))

##     lhs          rhs         support confidence      lift 
##   1 <{"I"}>   => <{"V"}>   0.7075472  0.8928571 1.1402754 
##   2 <{"I"}>   => <{"ii"}>  0.6509434  0.8214286 1.1766409 
##   3 <{"I"}>   => <{"IV"}>  0.6226415  0.7857143 1.2070393 
##   4 <{"I"}>   => <{"I"}>   0.6226415  0.7857143 0.9914966 
##   5 <{"V"}>   => <{"V"}>   0.6037736  0.7710843 0.9847583 
##   6 <{"ii"}>  => <{"V"}>   0.5754717  0.8243243 1.0527515 
##   7 <{"V"}>   => <{"I"}>   0.5660377  0.7228916 0.9122203 
##   8 <{"I"}>   => <{"vi"}>  0.5566038  0.7023810 1.1817838 
##   9 <{"V"}>   => <{"IV"}>  0.5094340  0.6506024 0.9994762 
##  10 <{"I"}>   => <{"iii"}> 0.5094340  0.6428571 1.2619048 
##

Sequential rules sorted by confidence

Sorting by confidence definitely paid off. The rules are longer and include the unusual chords discovered earlier - E7 and D7. It is worth mentioning that when a chord reappears in the same rule, like V in the first one, it could be a variation of the V chord and not necessarily the same one. This implies that it might be a good idea to run cspade without the roman numerals, but for simplicity’s sake I will keep them.

confidenceSortedSeq <- sort(cspadeRules, by="confidence", decreasing=TRUE)

inspect(head(confidenceSortedSeq, 10))

##     lhs         rhs        support confidence     lift 
##   1 <{"I"},                 
##      {"V"},                 
##      {"E7"},                
##      {"D7"}> => <{"V"}>  0.1037736  0.9166667 1.170683 
##   2 <{"I"}>  => <{"V"}>  0.7075472  0.8928571 1.140275 
##   3 <{"ii"},                
##      {"E7"},                
##      {"D7"}> => <{"V"}>  0.1037736  0.8461538 1.080630 
##   4 <{"ii"}> => <{"V"}>  0.5754717  0.8243243 1.052752 
##   5 <{"I"}>  => <{"ii"}> 0.6509434  0.8214286 1.176641 
##   6 <{"i"}>  => <{"iv"}> 0.1509434  0.8000000 4.463158 
##   7 <{"i"}>  => <{"i"}>  0.1509434  0.8000000 4.240000 
##   8 <{"V"},                 
##      {"E7"},                
##      {"vi"}> => <{"V"}>  0.1037736  0.7857143 1.003442 
##   9 <{"V"},                 
##      {"V"},                 
##      {"C7"}> => <{"IV"}> 0.1037736  0.7857143 1.207039 
##  10 <{"I"}>  => <{"IV"}> 0.6226415  0.7857143 1.207039 
##

Sequential rules sorted by lift

Here, the results are similar to when sorting by support. There is not much interesting stuff happening in these rules, because all the chord progressions come from the same key and are short.

liftSortedSeq <- sort(cspadeRules, by="lift", decreasing=TRUE)

inspect(head(liftSortedSeq, 10))

##     lhs          rhs         support confidence     lift 
##   1 <{"i"}>   => <{"VI"}>  0.1132075  0.6000000 5.300000 
##   2 <{"i"}>   => <{"III"}> 0.1320755  0.7000000 5.300000 
##   3 <{"i"},                   
##      {"iv"}>  => <{"III"}> 0.1037736  0.6875000 5.205357 
##   4 <{"iv"}>  => <{"III"}> 0.1226415  0.6842105 5.180451 
##   5 <{"iv"}>  => <{"VII"}> 0.1132075  0.6315789 5.149798 
##   6 <{"i"}>   => <{"VII"}> 0.1132075  0.6000000 4.892308 
##   7 <{"i"}>   => <{"v"}>   0.1037736  0.5500000 4.858333 
##   8 <{"i"}>   => <{"iv"}>  0.1509434  0.8000000 4.463158 
##   9 <{"i"}>   => <{"G7"}>  0.1132075  0.6000000 4.240000 
##  10 <{"i"}>   => <{"i"}>   0.1509434  0.8000000 4.240000 
##

Conclusion

Association rules mining has shown that there are some frequent chord progressions used in jazz music, like the I-IV-ii-V and the I-vi-ii-V. Moreover, I have discovered that in his compositions, Bill Evans has switched the ii, iii and vi minor 7 chords into 7 chords, with the switched iii often appearing with the switched ii and vi. By implementing the sequential rules mining algorithm, I also extracted some of the chord progressions with these switched chords. This is all valuable insight, for anyone interested in writing jazz music. As a guitar player, I have always been fascinated by the complexity of jazz and this project has confirmed to me that musical harmony does not need to follow simple structures. Not to mention, I did get to practice using regular expressions, which is a skill crucial in text mining.

Association rules in jazz

Maciej Lorens

2023-02-04