Music is an art form that has been with humanity for ages. It can be said that it is constantly evolving, since there are new genres invented every now and then, but I would argue that some elements in popular music stay the same. Long ago, musicians discovered sounds that go well with each other and these sounds were combined into chords, which then formed chord progressions. When examining many popular pop songs from different times, one will find that they share exactly the same chord progressions. Since there are many different layers to the song, the casual listener may have not realized this, but in music theory terms this is quite boring. That is probably why genres like jazz exist - to push what is considered to be musically harmonious into unknown territories. In this project, I used association rules mining to discover chords that frequently appear together in compositions of a famous jazz pianist - Bill Evans. The results can be inspiring to any musician interested in broadening their songwriting repertoire.
The source of the data is https://www.e-chords.com, which I scraped with BeautifulSoup and Selenium and saved song names, song keys and chords separated by a semicolon into a .csv file. I presented the loaded dataset below.
billEvans <- read_csv('bill_evans_chords.csv')
| name | key | chords |
|---|---|---|
| Autumn Leaves | Dm | Gm;Gm9;Gm7;C7;Am;Dm7;Am7;E7;Dsus4;Dm |
| My Foolish Heart | A | D;D7M;D6;Fdim;F#m;F#m7+;F#m7;B7/9-;A;D9;E7;A7M;A7;Bm5-/7;A7/13;Dm6;F#7;B7;A7M/9 |
| Lucy In The Sky With Diamonds | E | E;D;C;A;F;G |
| Like Someone In Love | G | G;G/F#;Em7;D9;D7;Cdim;Bm7;G7;C;Am7;Cm7;Edim;Am;Am7+;Am7/G;A7;F |
| A Time For Love | D | A7;D7M;D;D9;Gdim;D7M/9;Bm7;F#m;Fdim;A;A9;Em7;A7/9;F#7;Cdim;Bm;B7;A7M;Bm7/E;E;C#7;Edim |
| Emily | D | D7M;G6/9;Em7;A7;D6;D9;D7/9;G;Gdim;A;B;G#7;C#m7;F#7;Bm7;Bm7/E;E7;E7/13;A7/13-;D;D7;Bm;F#m;Fdim;B7;Cdim;G/F#;G/B |
The planned preprocessing of the chords requires that the third column is separated into individual rows.
billEvans <- billEvans %>%
select(name, key, chords) %>%
separate_rows(chords, sep=';') %>%
filter(grepl("^[A-Z][#b]*", chords))
| name | key | chords |
|---|---|---|
| Autumn Leaves | Dm | Gm |
| Autumn Leaves | Dm | Gm9 |
| Autumn Leaves | Dm | Gm7 |
| Autumn Leaves | Dm | C7 |
| Autumn Leaves | Dm | Am |
| Autumn Leaves | Dm | Dm7 |
It will be much easier to understand the results of association rules mining when all songs are in the same key. For anyone not familiar with music theory, changing a key means that the song gets a higher or lower pitch than before. Its structure is not affected, because all chords are relative to a key. First, I defined the positions of each note in the chromatic scale relative to the C note.
keys <- list("C"=1, "C#"=2, "D"=3, "Eb"=4, "E"=5, "F"=6, "F#"=7, "G"=8, "G#"=9, "A"=10, "Bb"=11, "B"=12)
I need to take into account that depending on the key, some chords may have two different names. The function below changes these chords into a common name to simplify the transposition.
replaceSimilar <- function (chord){
if (grepl("^D#", chord)){
return(str_replace_all(chord, "^D#", "Eb"))
}
else if (grepl("^Db", chord)){
return(str_replace_all(chord, "^Db", "C#"))
}
else if (grepl("^Gb", chord)){
return(str_replace_all(chord, "^Gb", "F#"))
}
else if (grepl("^Ab", chord)){
return(str_replace_all(chord, "^Ab", "G#"))
}
else if (grepl("^A#", chord)){
return(str_replace_all(chord, "^A#", "Bb"))
}
else {
return(chord)
}
}
Now in order to change the keys of the songs into C major or C minor, each chord in a song needs to be moved by the difference of that song key’s root note to the C note. I wrote functions that achieve this result with regular expressions included in the stringr package from tidyverse. I extracted only the chord’s root note (so a capital letter plus a “b” sign for a “flat” note or a “#” for a “sharp” note), because the other characteristics of the chord (whether it is major, minor, diminished or a 7 chord etc.) will stay constant.
transposeKey <- function (key, chord){
idxBase <- keys[["C"]]
idxKey <- keys[[str_extract(key, "^[A-Z][#b]?")]]
diffKey <- idxKey-idxBase
idxNew <- (12+keys[[str_extract(chord, "^[A-Z][#b]?")]]-diffKey)%%12
if (idxNew != 0){
return(names(keys)[idxNew])
}
else {
return(names(keys)[12])
}
}
changeChord <- function (key, chord){
return(str_replace_all(chord, "^[A-Z][#b]?", transposeKey(key, chord)))
}
Some chords are so-called “slash” chords, meaning they consist of bass notes indicated by the note after the slash. The function below will include these notes in the transposition.
slashChord <- function(key, chord) {
if (grepl("\\/[A-Z]", chord)){
afterSlash <- str_extract(chord, "(?<=\\/)[A-Z]")
afterSlash <- replaceSimilar(afterSlash)
return(str_replace_all(chord, "(?<=\\/).*", changeChord(key, afterSlash)))
}
else {
return(chord)
}
}
Finally, the functions can be applied to the data.
billEvans$key <- mapply(replaceSimilar, chord=billEvans$key)
billEvans$chords <- mapply(replaceSimilar, chord=billEvans$chords)
billEvans$newKey <- mapply(changeChord, key=billEvans$key, chord=billEvans$key)
billEvans$newChord <- mapply(changeChord, key=billEvans$key, chord=billEvans$chords)
billEvans$newChord <- mapply(slashChord, key=billEvans$key, chord=billEvans$newChord)
Since chord progressions are independent of key, they are often written as roman numerals. The roman numeral represents one of seven chords characteristic to the specific key. I have decided to use this notation, because it will make it much more clear, which chords are out of key in a song.
cMajor <- list("C"="I", "C7M"='I', "Dm"="ii", "Dm7"="ii", "Em"="iii", "Em7"="iii", "F"="IV", "F7M"="IV", "G"="V", "G7"="V", "Am"="vi", "Am7"="vi", "Bdim"="vii", "Bdim7"="vii")
cMinor <- list("Cm"="i", "Cm7"="i", "Ddim"="ii", "Ddim7"="ii", "Eb"="III", "Eb7M"="III", "Fm"="iv", "Fm7"="iv", "Gm"="v", "Gm7"="v", "G#"="VI", "G#7M"="VI", "Bb"="VII", "Bb7"="VII")
newNotation <- function(key, chord){
if (key=="C"){
if (is.null(cMajor[[str_extract(chord, "^[A-Z][#b]?(m)?(dim)?(7)?(M)?")]])==FALSE){
return(cMajor[[str_extract(chord, "^[A-Z][#b]?(m)?(dim)?(7)?(M)?")]])
}
else {
return(chord)
}
}
else {
if (is.null(cMinor[[str_extract(chord, "^[A-Z][#b]?(m)?(dim)?(7)?(M)?")]])==FALSE){
return(cMinor[[str_extract(chord, "^[A-Z][#b]?(m)?(dim)?(7)?(M)?")]])
}
else {
return(chord)
}
}
}
billEvans$newChord <- mapply(newNotation, key=billEvans$newKey, chord=billEvans$newChord)
write.table(billEvans[,c(1,5)], file="transactions_bill_evans.csv", sep=";", row.names=FALSE)
chordsEvans <- read.transactions("transactions_bill_evans.csv", sep=";", format="single",
header=TRUE, cols=c(1:2))
Before mining the association rules, I will examine the frequency of chords on a bar plot. It is visible that some frequent chords are not represented by a roman numeral and therefore, do not belong in the keys of C major or C minor. Combined with the insight from the association rules, I will be able to tell when these “unusual” chords appear.
chordsFreq <- itemFrequency(chordsEvans, type="absolute")
chordsFreq <- sort(chordsFreq, decreasing=TRUE)[0:20]
barplot(chordsFreq, las=2)
In order to discover the association rules, Eclat algorithm can be applied. The alternative is the Apriori algorithm, but it is more memory expensive and usually works slower than Eclat. I will keep to the default support value of 0.2 and because I want lengthier chord progressions, I will keep their length set at a minimum of four chords.
chordsEclat <- eclat(chordsEvans, parameter=list(supp=0.2, minlen=4))
eclatRules <- ruleInduction(chordsEclat, chordsEvans, confidence=0.6)
## lhs rhs support confidence lift itemset
## [1] {Fm7, ii, V} => {I} 0.2358491 1.0000000 1.261905 1
## [2] {Fm7, I, V} => {ii} 0.2358491 1.0000000 1.432432 1
## [3] {Fm7, I, ii} => {V} 0.2358491 1.0000000 1.277108 1
## [4] {Ebdim, ii, IV, V} => {I} 0.2075472 1.0000000 1.261905 2
## [5] {Ebdim, I, IV, V} => {ii} 0.2075472 1.0000000 1.432432 2
## [6] {Ebdim, I, ii, V} => {IV} 0.2075472 0.9166667 1.408213 2
## [7] {Ebdim, I, ii, IV} => {V} 0.2075472 1.0000000 1.277108 2
## [8] {Ebdim, IV, V} => {ii} 0.2075472 1.0000000 1.432432 3
## [9] {Ebdim, ii, V} => {IV} 0.2075472 0.9166667 1.408213 3
## [10] {Ebdim, ii, IV} => {V} 0.2075472 1.0000000 1.277108 3
When sorting by support, only the most frequent combinations are highlighted. The results are to be expected, as these chord progressions are jazz staples. I-IV-ii-V, known as a Montgomery-Ward bridge is often used as a bridge of the jazz standard according to Wikipedia https://en.wikipedia.org/wiki/Montgomery-Ward_bridge. The other progression: I-vi-ii-V, is a very popular jazz “turnaround”, which ends a particular section of the song. Based on the confidence value in rules 1, 5 and 9, it is also clear that the tonic (the “I” chord or in this case C major or C major 7) always accompanies the other chords from key.
supportSortedRules <- sort(eclatRules, by="support", decreasing=TRUE)
inspect(head(supportSortedRules, 10))
## lhs rhs support confidence lift itemset
## [1] {ii, IV, V} => {I} 0.5754717 1.0000000 1.261905 314
## [2] {I, IV, V} => {ii} 0.5754717 0.9104478 1.304155 314
## [3] {I, ii, V} => {IV} 0.5754717 0.8356164 1.283701 314
## [4] {I, ii, IV} => {V} 0.5754717 0.9838710 1.256510 314
## [5] {ii, V, vi} => {I} 0.5377358 1.0000000 1.261905 313
## [6] {I, V, vi} => {ii} 0.5377358 0.9047619 1.296010 313
## [7] {I, ii, vi} => {V} 0.5377358 1.0000000 1.277108 313
## [8] {I, ii, V} => {vi} 0.5377358 0.7808219 1.313764 313
## [9] {IV, V, vi} => {I} 0.5000000 1.0000000 1.261905 312
## [10] {I, V, vi} => {IV} 0.5000000 0.8412698 1.292386 312
Rules with the highest confidence have the value of 1, which means that lhs and rhs always appear together. There are many rules like that, as exemplified below. Although the support value is lower, the appearance of the Fm7 and Ebdim chords is curious. They are both out of key, the former being a minor 7 chord instead of a major 7 chord and the latter not belonging in the key of C major at all.
confidenceSortedRules <- sort(eclatRules, by="confidence", decreasing=TRUE)
inspect(head(confidenceSortedRules, 10))
## lhs rhs support confidence lift itemset
## [1] {Fm7, ii, V} => {I} 0.2358491 1 1.261905 1
## [2] {Fm7, I, V} => {ii} 0.2358491 1 1.432432 1
## [3] {Fm7, I, ii} => {V} 0.2358491 1 1.277108 1
## [4] {Ebdim, ii, IV, V} => {I} 0.2075472 1 1.261905 2
## [5] {Ebdim, I, IV, V} => {ii} 0.2075472 1 1.432432 2
## [6] {Ebdim, I, ii, IV} => {V} 0.2075472 1 1.277108 2
## [7] {Ebdim, IV, V} => {ii} 0.2075472 1 1.432432 3
## [8] {Ebdim, ii, IV} => {V} 0.2075472 1 1.277108 3
## [9] {Ebdim, ii, IV} => {I} 0.2075472 1 1.261905 4
## [10] {Ebdim, I, IV} => {ii} 0.2075472 1 1.432432 4
Lift is another useful measure for inspecting association rules. It informs the researcher how much more probable it is to find rhs in the company of lhs, compared to when assuming that they are unrelated. Here, the exemplified rules are quite bizarre. Some of them do not include the tonic and they comprise of chords out of key. D7, E7 and A7 should be minor 7 chords in the key of C major, so the switch from minor 7 to 7 chord for the ii, iii and vi is an interesting procedure worth examining during composing. What also caught my eye, is that the appearance of A7 is completely dictated by E7, iii (Em or Em7) and IV (F or F7M).
liftSortedRules <- sort(eclatRules, by="lift", decreasing=TRUE)
inspect(head(liftSortedRules, 10))
## lhs rhs support confidence lift itemset
## [1] {A7, D7, V} => {E7} 0.2547170 0.8181818 1.971074 103
## [2] {E7, I, ii, iii, IV, V} => {A7} 0.2169811 1.0000000 1.962963 56
## [3] {E7, ii, iii, IV, V} => {A7} 0.2169811 1.0000000 1.962963 57
## [4] {E7, I, ii, iii, IV} => {A7} 0.2169811 1.0000000 1.962963 58
## [5] {E7, I, iii, IV, V} => {A7} 0.2169811 1.0000000 1.962963 59
## [6] {E7, iii, IV, V} => {A7} 0.2169811 1.0000000 1.962963 60
## [7] {E7, I, iii, IV} => {A7} 0.2169811 1.0000000 1.962963 61
## [8] {E7, ii, iii, IV} => {A7} 0.2169811 1.0000000 1.962963 62
## [9] {E7, iii, IV} => {A7} 0.2169811 1.0000000 1.962963 70
## [10] {A7, D7, I, V, vi} => {E7} 0.2075472 0.8148148 1.962963 93
Cleaning association rules involves getting rid of redundant, insignificant and non-maximal rules. This leaves rules that are more general and statistically significant based on the Fischer’s exact test. This will help make the rules more comprehensible and easier to visualize.
rulesClean<-eclatRules[!is.redundant(eclatRules)]
rulesClean<-rulesClean[is.significant(rulesClean, transactionsEvans)]
rulesClean<-rulesClean[is.maximal(rulesClean)]
## lhs rhs support confidence lift itemset
## [1] {A7, E7, ii, vi} => {iii} 0.2358491 0.8620690 1.692209 55
## [2] {E7, iii, V} => {A7} 0.2547170 0.9310345 1.827586 67
## [3] {A7, iii, V} => {E7} 0.2547170 0.7714286 1.858442 67
## [4] {E7, I, iii} => {A7} 0.2547170 0.9310345 1.827586 68
## [5] {A7, I, iii} => {E7} 0.2547170 0.7714286 1.858442 68
## [6] {E7, iii, IV} => {A7} 0.2169811 1.0000000 1.962963 70
## [7] {A7, iii, IV} => {E7} 0.2169811 0.7666667 1.846970 70
## [8] {E7, iii, V} => {vi} 0.2547170 0.9310345 1.566502 79
## [9] {E7, I, iii} => {vi} 0.2547170 0.9310345 1.566502 80
## [10] {D7, E7, V} => {A7} 0.2547170 0.8709677 1.709677 103
supportSortedClean <- sort(rulesClean, by="support", decreasing=TRUE)
inspect(head(supportSortedClean, 10))
## lhs rhs support confidence lift itemset
## [1] {ii, IV, V} => {I} 0.5754717 1.0000000 1.261905 314
## [2] {I, IV, V} => {ii} 0.5754717 0.9104478 1.304155 314
## [3] {I, ii, V} => {IV} 0.5754717 0.8356164 1.283701 314
## [4] {I, ii, IV} => {V} 0.5754717 0.9838710 1.256510 314
## [5] {ii, V, vi} => {I} 0.5377358 1.0000000 1.261905 313
## [6] {I, V, vi} => {ii} 0.5377358 0.9047619 1.296010 313
## [7] {I, ii, vi} => {V} 0.5377358 1.0000000 1.277108 313
## [8] {I, ii, V} => {vi} 0.5377358 0.7808219 1.313764 313
## [9] {IV, V, vi} => {I} 0.5000000 1.0000000 1.261905 312
## [10] {I, V, vi} => {IV} 0.5000000 0.8412698 1.292386 312
confidenceSortedClean <- sort(rulesClean, by="confidence", decreasing=TRUE)
inspect(head(confidenceSortedClean, 10))
## lhs rhs support confidence lift itemset
## [1] {E7, iii, IV} => {A7} 0.2169811 1 1.962963 70
## [2] {A7, E7, I} => {V} 0.3396226 1 1.277108 133
## [3] {E7, I, ii} => {V} 0.3490566 1 1.277108 167
## [4] {A7, iii, IV} => {ii} 0.2830189 1 1.432432 204
## [5] {A7, iii, V} => {ii} 0.3301887 1 1.432432 206
## [6] {A7, I, iii} => {ii} 0.3301887 1 1.432432 207
## [7] {D7, iii, IV} => {ii} 0.2641509 1 1.432432 230
## [8] {iii, IV, vi} => {V} 0.3773585 1 1.277108 239
## [9] {iii, IV, vi} => {I} 0.3773585 1 1.261905 240
## [10] {ii, iii, vi} => {V} 0.4339623 1 1.277108 243
After cleaning the association rules, there is some clarification relating to the inclusion of E7 in the chord combinations. It turns out that A7 and D7 work well with this chord.
liftSortedClean <- sort(rulesClean, by="lift", decreasing=TRUE)
inspect(head(liftSortedClean, 10))
## lhs rhs support confidence lift itemset
## [1] {A7, D7, V} => {E7} 0.2547170 0.8181818 1.971074 103
## [2] {E7, iii, IV} => {A7} 0.2169811 1.0000000 1.962963 70
## [3] {A7, D7, IV} => {E7} 0.2075472 0.8148148 1.962963 106
## [4] {A7, D7, vi} => {E7} 0.2075472 0.8148148 1.962963 107
## [5] {A7, D7, I} => {E7} 0.2452830 0.8125000 1.957386 104
## [6] {A7, D7, ii} => {E7} 0.2264151 0.8000000 1.927273 105
## [7] {A7, iii, V} => {E7} 0.2547170 0.7714286 1.858442 67
## [8] {A7, I, iii} => {E7} 0.2547170 0.7714286 1.858442 68
## [9] {A7, iii, IV} => {E7} 0.2169811 0.7666667 1.846970 70
## [10] {E7, ii, IV} => {A7} 0.2641509 0.9333333 1.832099 129
plot(rulesClean, method="grouped")
plot(rulesClean, method="graph", limit=10, engine="htmlwidget")
A subset of association rules mining is sequential rules mining. It differs from the eclat and apriori algorithm in that it preserves the order in which items appear. So essentially it provides an information how past items influence the future item. This is perfect for analyzing chord progressions, because the discovered combinations of chords are ready to be used in the printed order. In R “arulesSequences” package implements the cSPADE (Sequential Pattern Discovery using Equivalence classes) algorithm that enables mining for sequential rules. It works by first counting the number of 1-element sequences and 2-element sequences. After that, subsequent n-element sequences are formed by joining (n-1) element sequences based on their id-lists. An id-list is a list of objects where the sequence occurs. This explanation comes from https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Sequence_Mining/SPADE. The original paper is available at https://link.springer.com/article/10.1023/A:1007652502315.
billEvansSeq <- billEvans %>%
group_by(name) %>%
mutate(eventID = row_number())
billEvansSeq <- billEvansSeq %>%
relocate(eventID, .after=name)
billEvansSeq <- billEvansSeq[,c(1,2,6)]
names(billEvansSeq) <- c("sequenceID", "eventID", "items")
billEvansSeq <- billEvansSeq[order(billEvansSeq$sequenceID, billEvansSeq$eventID),]
write.table(billEvansSeq, file="seq_transactions_bill_evans.csv", sep=";", row.names=FALSE, col.names=FALSE)
seqEvans <- read_baskets("seq_transactions_bill_evans.csv", sep=";", info = c("sequenceID", "eventID"))
cspadeSeq <- cspade(seqEvans, parameter = list(support = 0.1), control = list(verbose = TRUE))
cspadeRules <- ruleInduction(cspadeSeq, confidence = 0.5, control = list(verbose = TRUE))
inspect(head(cspadeRules, 10))
## lhs rhs support confidence lift
## 1 <{"i"}> => <{"VII"}> 0.1132075 0.6000000 4.8923077
## 2 <{"iv"}> => <{"VII"}> 0.1132075 0.6315789 5.1497976
## 3 <{"i"}> => <{"VI"}> 0.1132075 0.6000000 5.3000000
## 4 <{"I"}> => <{"vi"}> 0.5566038 0.7023810 1.1817838
## 5 <{"ii"}> => <{"vi"}> 0.3773585 0.5405405 0.9094809
## 6 <{"iii"}> => <{"vi"}> 0.2641509 0.5185185 0.8724280
## 7 <{"IV"}> => <{"vi"}> 0.3584906 0.5507246 0.9266161
## 8 <{"V"}> => <{"vi"}> 0.4811321 0.6144578 1.0338497
## 9 <{"vi"}> => <{"vi"}> 0.3018868 0.5079365 0.8546233
## 10 <{"V"},
## {"vi"}> => <{"vi"}> 0.2452830 0.5098039 0.8577653
##
Unfortunately, I cannot control the length of the rules and sorting by support provided pretty obvious pairs of chords from the same key.
supportSortedSeq <- sort(cspadeRules, by="support", decreasing=TRUE)
inspect(head(supportSortedSeq, 10))
## lhs rhs support confidence lift
## 1 <{"I"}> => <{"V"}> 0.7075472 0.8928571 1.1402754
## 2 <{"I"}> => <{"ii"}> 0.6509434 0.8214286 1.1766409
## 3 <{"I"}> => <{"IV"}> 0.6226415 0.7857143 1.2070393
## 4 <{"I"}> => <{"I"}> 0.6226415 0.7857143 0.9914966
## 5 <{"V"}> => <{"V"}> 0.6037736 0.7710843 0.9847583
## 6 <{"ii"}> => <{"V"}> 0.5754717 0.8243243 1.0527515
## 7 <{"V"}> => <{"I"}> 0.5660377 0.7228916 0.9122203
## 8 <{"I"}> => <{"vi"}> 0.5566038 0.7023810 1.1817838
## 9 <{"V"}> => <{"IV"}> 0.5094340 0.6506024 0.9994762
## 10 <{"I"}> => <{"iii"}> 0.5094340 0.6428571 1.2619048
##
Sorting by confidence definitely paid off. The rules are longer and include the unusual chords discovered earlier - E7 and D7. It is worth mentioning that when a chord reappears in the same rule, like V in the first one, it could be a variation of the V chord and not necessarily the same one. This implies that it might be a good idea to run cspade without the roman numerals, but for simplicity’s sake I will keep them.
confidenceSortedSeq <- sort(cspadeRules, by="confidence", decreasing=TRUE)
inspect(head(confidenceSortedSeq, 10))
## lhs rhs support confidence lift
## 1 <{"I"},
## {"V"},
## {"E7"},
## {"D7"}> => <{"V"}> 0.1037736 0.9166667 1.170683
## 2 <{"I"}> => <{"V"}> 0.7075472 0.8928571 1.140275
## 3 <{"ii"},
## {"E7"},
## {"D7"}> => <{"V"}> 0.1037736 0.8461538 1.080630
## 4 <{"ii"}> => <{"V"}> 0.5754717 0.8243243 1.052752
## 5 <{"I"}> => <{"ii"}> 0.6509434 0.8214286 1.176641
## 6 <{"i"}> => <{"iv"}> 0.1509434 0.8000000 4.463158
## 7 <{"i"}> => <{"i"}> 0.1509434 0.8000000 4.240000
## 8 <{"V"},
## {"E7"},
## {"vi"}> => <{"V"}> 0.1037736 0.7857143 1.003442
## 9 <{"V"},
## {"V"},
## {"C7"}> => <{"IV"}> 0.1037736 0.7857143 1.207039
## 10 <{"I"}> => <{"IV"}> 0.6226415 0.7857143 1.207039
##
Here, the results are similar to when sorting by support. There is not much interesting stuff happening in these rules, because all the chord progressions come from the same key and are short.
liftSortedSeq <- sort(cspadeRules, by="lift", decreasing=TRUE)
inspect(head(liftSortedSeq, 10))
## lhs rhs support confidence lift
## 1 <{"i"}> => <{"VI"}> 0.1132075 0.6000000 5.300000
## 2 <{"i"}> => <{"III"}> 0.1320755 0.7000000 5.300000
## 3 <{"i"},
## {"iv"}> => <{"III"}> 0.1037736 0.6875000 5.205357
## 4 <{"iv"}> => <{"III"}> 0.1226415 0.6842105 5.180451
## 5 <{"iv"}> => <{"VII"}> 0.1132075 0.6315789 5.149798
## 6 <{"i"}> => <{"VII"}> 0.1132075 0.6000000 4.892308
## 7 <{"i"}> => <{"v"}> 0.1037736 0.5500000 4.858333
## 8 <{"i"}> => <{"iv"}> 0.1509434 0.8000000 4.463158
## 9 <{"i"}> => <{"G7"}> 0.1132075 0.6000000 4.240000
## 10 <{"i"}> => <{"i"}> 0.1509434 0.8000000 4.240000
##
Association rules mining has shown that there are some frequent chord progressions used in jazz music, like the I-IV-ii-V and the I-vi-ii-V. Moreover, I have discovered that in his compositions, Bill Evans has switched the ii, iii and vi minor 7 chords into 7 chords, with the switched iii often appearing with the switched ii and vi. By implementing the sequential rules mining algorithm, I also extracted some of the chord progressions with these switched chords. This is all valuable insight, for anyone interested in writing jazz music. As a guitar player, I have always been fascinated by the complexity of jazz and this project has confirmed to me that musical harmony does not need to follow simple structures. Not to mention, I did get to practice using regular expressions, which is a skill crucial in text mining.