In this R markdown we will program the above image, among other
things.
See Figure 5 for an update of this image. List of
figures
“…he was so singular and self-determined in his musical mission that he created a world almost completely unrelated to anything else in music. Sure, you could name some influences - mainly R&B music of the ’50s and atonal composers like Stravinsky and Varèse - but, Zappa music is its own entity, completely removed from its surroundings. To be a fan of Zappa is to throw away all preconceived notions about what music is. And, once you’ve done so, you’ve entered his world and it is very hard to get out.” 2
Shortly after purchasing an Akai tape deck in 1976, a then classmate
introduced me to Zappa’s music when he gave me a BASF tape containing
five records, with an explicit warning written on it: “extremely
forbidden to whipe out”. The tape has not stood the test of time, the
Akai device has been placed in the electronic waste bin, but to this day
I still enjoy the music.3
I keep on looking forward to new album
releases.
This writing and the underlying R script are split into five parts.
First, it describes a way in which a list can be compiled of Frank Zappa’s official albums as found on the internet, including all those credited to the Mothers of Invention.
The resulting list is a source for all kinds of statistics and graphics. We summarize the data in diagrams: a Sankey overview, a pie chart, and a few histograms. These involve a concise classification of albums and the year of release.
Next we will link the list of albums to the home digital collection that is in FLAC format. Album data can be easily extracted from properly coded FLAC files.
The result of the comparison between the official list and home
collection also provides an opportunity to gain some information. Which
albums are your favourites? Which tracks? Of course you have an idea
about this, but what do the statistics reveal?
In addition to album
rankings based on selected tracks for playlists, we programmed a kind of
punch card with the year of release on the first dimension and on the
second one whether one or more live or studio performances of a song
were released in that year. To be interpreted as a graphic translation
in the form of a punch card of the principle “conceptual continuity”, a
Zappa comprehensive understanding that is often mentioned when his
oeuvre is discussed.
Finally, a trip to ZappaBASE 2.0 and a comparison of their list of songs that Zappa has played live most often with my own list of favorites.
It is important to mention here that in addition to Frank
Zappa, this writing also concerns albums by The Mothers of Invention,
also known as The Mothers for convenience. In the past I have chosen to
tag all three as FZ.
Next to the patented mustache4, I would like to
interpret “Zappa” as an ever-changing ensemble of great musicians led by
an exceptional conductor, guitarist and all-round frontman. An
impressive list of musicians who have performed on officially released
Frank Zappa recordings can be found on Wikipedia 5 or on the Román García
Albertos site.6
Zappa was by no means a lazy artist releasing about 70 audio and
video albums in his 30-year career. Since 1994, the Zappa Family Trust
with help of “Vaultmeister” Joe Travers have released so far about 65
posthumous album sets, as well. Also marked as official are the “Beat
the Boots!” series box sets, released in 1991, 1992 and 2009
successively. The trio contains legal reissues of 21 bootleg recordings,
originally distributed illegally prior to their official release.
Finally: all time compilations (approximately 30) and download albums
are also included, provided they are official.7
The album list is
generated by an R script that is fully printed below. The initial
thought is that programming such an R list should be an easy job. Simple
and unambiguous, in order get up-to-date information from the internet
instead of using some static CSV or Excel file on a local drive.
Four possible sources have been examined:
The Official Frank Zappa website: https://www.zappa.com/music/official-discography/#/
Wikipedia: https://en.wikipedia.org/wiki/Frank_Zappa_discography
Zappa Wiki Jawaka: https://wiki.killuglyradio.com/wiki/Main_Page
A site maintained by Román García Albertos, https://www.donlope.net/fz/lyrics/
The latter site caught my special attention. An invaluable resource!
Here’s my R script to extract some basic data from it:
version[['version.string']]
## [1] "R version 4.4.2 (2024-10-31 ucrt)"
library(readr)
library(data.table)
library(anytime)
library(rvest)
library(tidyverse)
url <- 'https://www.donlope.net/fz/lyrics/index.html'
months <- c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
dod <- as.Date("1993-12-04")
umg <- as.Date("2022-06-30")
css_selector <- "#OriginalAlbums > ol"
original_albums_part1 <- read_html(url) %>%
html_element(css = css_selector) %>%
html_text() %>%
strsplit(split = "\n") %>%
data.frame(V1 = unlist(.)) %>%
select(V1)
css_selector <- "#OriginalAlbums > ul"
original_albums_part2 <- read_html(url) %>%
html_element(css = css_selector) %>%
html_text() %>%
strsplit(split = "\n") %>%
data.frame(V1 = unlist(.)) %>%
select(V1)
original_albums <- rbind(original_albums_part1, original_albums_part2) %>%
mutate(V1 = str_remove(V1, "\\#[^#]*$")) %>% # remove only characters after the last '#'
mutate(album_title = trimws(gsub("\\(.*", "", V1)),
album_title = ifelse(album_title == "Apostrophe", "Apostrophe (')", album_title)) %>%
mutate(release_date = anydate(sub(".*\\,([^,]+\\,[^,]+$)", "\\1", V1)),
release_year = year(release_date)) %>%
mutate(status = ifelse(release_date < dod, "Official album", "Official album (posthumous)")) %>%
mutate(disk = sapply(str_extract_all(V1, "(?<=\\()[^)(]+(?=\\))"), paste0, collapse =","),
disk = str_remove_all(disk, "(fazedooh),|('),|s"),
disk = ifelse(album_title == "Zappa—Original Motion Picture Soundtrack", "3 CD", trimws(disk)),
disk = sub("\\,.*", "", disk),
disk = gsub("([1-9])([C])", "\\1 \\2", disk),
disk = recode(disk, "USB Cotume Box Set" = "USB")) %>%
mutate(status = ifelse(disk == "DVD", "Official video (posthumous)", status)) %>%
mutate(label = sapply(str_extract_all(V1, "(?<=\\()[^)(]+(?=\\))"), paste0, collapse =","),
label = str_remove_all(label, "(fazedooh),|('),"),
label = str_extract(label, '(?<=,)[^,]+(?=,)'),
label = ifelse(album_title == "Zappa—Original Motion Picture Soundtrack", "Zappa Records/UME ZR20035", trimws(label)),
status = ifelse(substring(album_title,1,15) == "The Old Masters", "The Old Masters", status)) %>%
filter (release_date != as.Date("2012-12-21")) %>% # double in source
mutate(album_title = ifelse(album_title == "Apostrophe (')" & release_year == 2024, "Apostrophe (') 50th Anniversary Edition", album_title)) %>%
select(-V1)
css_selector <- "#BTB"
beat_the_boots <- read_html(url) %>%
html_element(css = css_selector) %>%
html_text() %>%
strsplit(split = "\n") %>%
data.frame(V1 = unlist(.)) %>%
select(V1) %>%
filter(V1 != 'Beat The Boots Series' & trimws(V1, whitespace = "[\\h\\v]") != '') %>%
mutate (status=V1)
n = nrow(beat_the_boots)
for(i in 1:n){
if (substring(beat_the_boots$V1[i],1,4) == 'Beat'){
beat_the_boots$status[i] = beat_the_boots$V1[i]
j = 1
}
else {
beat_the_boots$status[i] = beat_the_boots$V1[i-j]
j = j+1
}
}
add_labels <- data.table (album_title = c("As An Am","The Ark","Freaks & Motherfu*#@%!","Unmitigated Audacity","Anyway The Wind Blows","'Tis The Season To Be Jelly","Saarbrucken 1978",
"Piquantique","Disconnected Synapses","Tengo Na Minchia Tanta","Electric Aunt Jemima","At The Circus","Swiss Cheese/Fire!","Our Man In Nirvana","Conceptual Continuity","BTB III - Disc One",
"BTB III - Disc Two","BTB III - Disc Three","BTB III - Disc Four","BTB III - Disc Five","BTB III - Disc Six"),
label = c("Foo-eee R2 70537","Foo-eee R2 70538","Foo-eee R2 70539","Foo-eee R2 70540","Foo-eee R2 70541","Foo-eee R2 70542","Foo-eee R2 70543","Foo-eee R2 70544","Foo-eee R2 71017","Foo-eee R2 71018","Foo-eee R2 71019",
"Foo-eee R2 71020","Foo-eee R2 71021","Foo-eee R2 71022","Foo-eee R2 71023","Zappa Records","Zappa Records","Zappa Records","Zappa Records","Zappa Records","Zappa Records"))
beat_the_boots <- beat_the_boots %>%
filter(substring(beat_the_boots$V1,1,4) != 'Beat') %>%
mutate(release_year = as.integer(str_extract(status, "[0-9]+"))) %>%
rename(album_title = V1) %>%
mutate(album_title = ifelse(substring(album_title,1,5) == 'Disc ', paste0("BTB III - ",album_title), album_title)) %>%
mutate(disk = ifelse(album_title == "Swiss Cheese/Fire!", '2 CD', ifelse(release_year == 2009, 'digital download','CD'))) %>%
mutate(release_date = ifelse(status == "Beat The Boots I July 1991", "1991-07-01",
ifelse(status == "Beat The Boots II June 1992", "1992-06-16",
ifelse(album_title %in% c("BTB III - Disc One", "BTB III - Disc Two"), "2009-01-25",
ifelse(album_title %in% c("BTB III - Disc Three", "BTB III - Disc Four"), "2009-01-31",
ifelse(album_title == "BTB III - Disc Five", "2009-02-01",
ifelse(album_title == "BTB III - Disc Six", "2009-02-02",NA))))))) %>%
mutate(release_date = as.Date(release_date)) %>%
left_join(add_labels, by = "album_title")
add_dates_labels <- data.table (release_date = c("1971-11-10","1979-12-21","1982-10-31","1985-01-15","1987-01-15","1988-09-15","1989-01-31","1989-05-01","1982-01-15","1976-01-01","2015-10-30"),
label = c("Warner Home Video – PEV 99498","Frank Zappa Self-released","Frank Zappa Self-released","Buelax Video – VMP 217","Honker Home Video – BSV-P3121",
"Honker Home Video – BSV-P3122","MPI Home Video #4003","MPI Home Entertainment #4004","Boggs/Baker Productions","Frank Zappa Self-released","Eagle Rock 5034504114777"),
album_title = c("200 Motels", "Baby Snakes","The Dub Room Special!","Does Humor Belong In Music?","Video From Hell","Uncle Meat","True Story Of 200 Motels","The Amazing Mr. Bickford","The Torture Never Stops","A Token Of His Extreme","Roxy—The Movie")) %>%
mutate(release_date = as.Date(release_date))
css_selector <- "#films > ul:nth-child(2)"
videos <- read_html(url) %>%
html_element(css = css_selector) %>%
html_text() %>%
strsplit(split = "\n") %>%
data.frame(V1 = unlist(.)) %>%
select(V1) %>%
mutate(release_year = as.integer(str_extract(substring(V1,16,99), "[0-9]+")),
release_year = ifelse(release_year == 0, 1989, release_year)) %>%
mutate(album_title = gsub("\\b(?:\\d{7,8}|(?!200\\b)\\d{4})\\b", "", V1, perl=TRUE), # remove all numbers except 200 (200 Motels)
album_title = str_replace_all(album_title, setNames(rep("", length(months)), months)), # remove month
album_title = gsub("/", "", album_title),
album_title = trimws(album_title)) %>%
mutate(disk = ifelse(release_year <= 1990, 'VHS', 'DVD')) %>%
mutate(status = ifelse(release_year <= year(dod), "Official video", "Official video (posthumous)")) %>%
select(album_title, release_year, status, disk) %>%
left_join(add_dates_labels, by = "album_title")
css_selector <- "#Compilations"
compilations <- read_html(url) %>%
html_element(css = css_selector) %>%
html_text() %>%
strsplit(split = "\n") %>%
data.frame(V1 = unlist(.)) %>%
select(V1) %>%
filter(trimws(V1, whitespace = "[\\h\\v]") != '') %>%
filter (V1 != 'Samplers, Compilations, Digital Downloads, Etc.') %>%
mutate(pick_heading = !str_detect(V1, "[0-9]")) %>%
mutate (status=V1)
n = nrow(compilations)
for(i in 1:n){
if (compilations$pick_heading[i]){
compilations$status[i] = compilations$V1[i]
j = 1
}
else {
compilations$status[i] = compilations$V1[i-j]
j = j+1
}
}
compilations <- compilations %>%
filter(V1 != status ) %>%
select(-pick_heading) %>%
mutate(album_title = trimws(gsub("\\(.*", "", V1))) %>%
mutate(release_date = anydate(sub(".*\\,([^,]+\\,[^,]+$)", "\\1", V1)),
release_year = year(release_date)) %>%
mutate(disk = sapply(str_extract_all(V1, "(?<=\\()[^)(]+(?=\\))"), paste0, collapse =","),
disk = sub("\\,.*", "", disk),
disk = gsub("([1-9])([C])", "\\1 \\2", disk),
disk = ifelse(disk == "Transparency", "LP", disk),
disk = ifelse(disk == "Crossfire Publications", "CD", trimws(disk)),
disk = ifelse(disk != "cassette", str_remove_all(disk, "s"), disk)) %>%
mutate(label = sapply(str_extract_all(V1, "(?<=\\()[^)(]+(?=\\))"), paste0, collapse =","),
label = str_remove_all(label, "(Transparency),|Japan,"),
label = str_extract(label, '(?<=,)[^,]+(?=,)'),
label = ifelse(album_title == "Paul Buff Presents The Pal And Original Sound Studio Archives: The Collection", "Crossfire Publications", trimws(label))) %>%
select (-V1)
# not_mentioned_in_source = data.table(
# album_title = "Apostrophe/Overnite Sensation",
# disk = "DVD",
# release_date = as.Date("2007-05-01"),
# release_year = 2007,
# status = 'Official video (posthumous)',
# label = "Eagle Rock EREDV625"
# )
zappa_discography_official <- rbind(setDT(original_albums), setDT(compilations), setDT(beat_the_boots), setDT(videos), fill=TRUE) %>%
filter (label != "Zappa Records ZR 20005") %>% # "The MOFO Project/Object (fazedooh)" is for me the same release as "The MOFO Project/Object"
mutate( across(.cols = everything(), ~str_replace_all( ., "—", " - " ))) %>%
mutate(status = ifelse(grepl("Highlights", album_title) & substr(album_title,1,4) != "Paul", "Halloween concerts highlights", status)) %>% # compilations
mutate(release_date = ymd(release_date)) %>%
filter(album_title != "Return Of The Son Of Shut Up 'N Play Yer Guitar" & album_title != "Shut Up 'N Play Yer Guitar Some More") %>% # classified here as one 3 LP box
mutate(disk = ifelse(album_title == "Shut Up 'N Play Yer Guitar", "3 LP", disk)) %>%
arrange(release_date)
# add album number of tracks.
fz_official_album_n_of_tracks <- read_fwf(
file="data/Frank Zappa officiële discografie.txt", locale = locale(encoding = "ISO-8859-1"), trim_ws = TRUE,
skip=0, fwf_widths(c(3,50,7,41,11,40,3,2,3,100))) %>%
rename(seqnr = X1, album_tag = X2, medium = X3, publisher = X4, release_date = X5, section = X6, number_of_tracks = X7, filter_var = X8, hires = X9, album_title = X10 ) %>%
mutate(release_date = dmy(release_date)) %>% select(-hires)
add_info_and_number_of_tracks <- fz_official_album_n_of_tracks %>% select(-c(section, publisher, medium))
zappa_discography_official_complete <- zappa_discography_official %>%
full_join(add_info_and_number_of_tracks, by = c("release_date", "album_title")) %>% mutate(disk = gsub("digital", "", disk), disk = trimws(disk)) %>%
mutate(status = recode(status, "ZFT digital download items" = "ZFT download items"))
source_url <- "https://www.donlope.net/fz/lyrics/index.html"
library(DT) # generate a table
zappa_discography_official_table <- zappa_discography_official_complete %>%
select(-c("label", "seqnr", "filter_var", "album_tag")) %>%
mutate_if(is.numeric, ~replace(., . == 0, NA))
zappa_discography_official_table <- DT::datatable(zappa_discography_official_table, filter = "top", caption = paste("Frank Zappa discography. Number of albums: ", nrow(zappa_discography_official_complete), ". Source: ", source_url, " with minor modifications"), extensions = c("Buttons","FixedColumns"), rownames= TRUE,
options = list(autoWidth = FALSE, solidHeader = TRUE, pageLength = 10))
nrow(zappa_discography_official_complete) # number of albums
## [1] 192
The complete list of albums, exactly as generated by the R script
above, can be seen below. As already mentioned, its source is https://www.donlope.net/fz/lyrics/index.html, with some
minor modifications that can be found in the script.
To increase the
number of album lines on a page, please make a selection in the “Show
entries” box. You may sort and filter.
Table 1. Frank Zappa official discography. Sorted by date of
release.
Number of albums listed: 192. Source: https://www.donlope.net/fz/lyrics/index.html
A Sankey diagram is a graphic illustration of flows. Several entities (nodes) are represented by rectangles or boxes. Their links are represented with arrow or arcs that have a width proportional to the importance of the flow. The script and graph appear below.
library(networkD3)
status <- zappa_discography_official_complete %>%
select(status) %>%
mutate(status = recode(status, "Various Artists Compilations" = "Others", "Zappa Records/Universal compilations" = "Others", "Special items compiled by FZ" = "Official album")) %>%
mutate(main_group =ifelse(substring(status,1,8) == 'Official', "Official",
ifelse(substring(status,1,3)=="ZFT" | status == "Beat The Boots III January-February 2009", "Downloads",
ifelse(substring(status,1,4) == 'Beat', "Beat the Boots I and II", "Compilations")))) %>%
mutate(status = factor(status, levels=c("Official album","Official album (posthumous)","Official video","Official video (posthumous)","Beat The Boots I July 1991",
"Beat The Boots II June 1992","Beat The Boots III January-February 2009","Cucamonga era compilations",
"The Old Masters","Verve/MGM compilations","Rykodisc compilations", "Halloween concerts highlights","Others", "ZFT download items"))) %>%
mutate(main_group = factor(main_group, levels=c("Official","Beat the Boots I and II","Compilations","Downloads"))) %>%
group_by(status, main_group) %>%
summarise(number_of_albums = n(),
perc = (number_of_albums / nrow(zappa_discography_official_complete)*100)) %>%
arrange(main_group, status) %>%
ungroup() %>%
mutate(b_link = as.numeric(main_group),
c_link = max(b_link) + row_number())
sum(status$number_of_albums) # check
## [1] 192
sum(status$perc) # must be 100%
## [1] 100
# raw data
knitr::kable(status %>% select(main_group, status, number_of_albums, perc, b_link, c_link) %>% mutate(perc = round(perc,1)), format = "html") %>% kableExtra::kable_styling()
| main_group | status | number_of_albums | perc | b_link | c_link |
|---|---|---|---|---|---|
| Official | Official album | 59 | 30.7 | 1 | 5 |
| Official | Official album (posthumous) | 63 | 32.8 | 1 | 6 |
| Official | Official video | 10 | 5.2 | 1 | 7 |
| Official | Official video (posthumous) | 1 | 0.5 | 1 | 8 |
| Beat the Boots I and II | Beat The Boots I July 1991 | 8 | 4.2 | 2 | 9 |
| Beat the Boots I and II | Beat The Boots II June 1992 | 7 | 3.6 | 2 | 10 |
| Compilations | Cucamonga era compilations | 6 | 3.1 | 3 | 11 |
| Compilations | The Old Masters | 3 | 1.6 | 3 | 12 |
| Compilations | Verve/MGM compilations | 4 | 2.1 | 3 | 13 |
| Compilations | Rykodisc compilations | 9 | 4.7 | 3 | 14 |
| Compilations | Halloween concerts highlights | 3 | 1.6 | 3 | 15 |
| Compilations | Others | 4 | 2.1 | 3 | 16 |
| Downloads | Beat The Boots III January-February 2009 | 6 | 3.1 | 4 | 17 |
| Downloads | ZFT download items | 9 | 4.7 | 4 | 18 |
links <- select(status, b_link, c_link, perc)
nodes <- data.frame(name = c("", unique(as.character(status$main_group)), as.character(status$status)))
sankeyNetwork(Links = links, Nodes = nodes, Source = "b_link",
Target = "c_link", Value = "perc", NodeID = "name", colourScale = JS(
'd3.scaleOrdinal() .range(["orange","yellow","lightgreen","blue",
"lightgreen","lightgreen","lightgreen","lightgreen","lightgreen","lightgreen","blue"])'),
fontSize = 12, nodeWidth = 30, iterations = 1)
Figure 1. Zappa’s official discography put in a Sankey diagram (192
albums)
List of figures
Based on https://www.donlope.net/fz/lyrics/index.html Reference date: 2024-12-19
Next an alternative to the Sankey graphic in the form of a pie chart.
library(webr) # creating the pie-donut chart
status %>% mutate(status = recode(status, "Beat The Boots III January-February 2009" = "Beat The Boots III 2009", "Halloween concerts highlights" = "Halloween highlights")) %>%
PieDonut(., aes(main_group, status, count=number_of_albums), showRatioThreshold = 0.001, showPieName = FALSE, pieLabelSize = 5, donutLabelSize = 4,
title = paste0("Frank Zappa official discography (", nrow(zappa_discography_official_complete), " albums)"), explode=1,start=pi/1.4, labelposition=2, r0 = 0,
ratioByGroup = FALSE, maxx=1.55)
Figure 2. Zappa’s official discography in a pie chart (192 albums)
List of figures
Based on https://www.donlope.net/fz/lyrics/index.html Reference date: 2024-12-19
We see the highest numbers of albums in the years that the
first Beat the Boots series were released, see Figure 3-a.
On June
22, 2022, The Universal Music Group (UMG) announced an agreement between
The Zappa Trust to acquire Frank Zappa’s estate, including his expansive
recordings, publishing catalog of songs, film archive, and the complete
contents of The Vault.8 Interesting to keep an eye on how the
release of new albums will proceed.
albums_per_year <- zappa_discography_official %>%
select(release_year, status) %>%
mutate (release_year = as.numeric(release_year)) %>%
mutate(main_group =ifelse(substring(status,1,8) == 'Official' | status == "Special items compiled by FZ", "Official new album",
ifelse(substring(status,1,4) == 'Beat', "Beat the Boots",
ifelse(substring(status,1,3)=="ZFT", "Downloads", "Compilations")))) %>%
mutate(main_group = factor(main_group, levels=c("Downloads","Compilations","Beat the Boots","Official new album"))) %>%
group_by(release_year, main_group) %>%
summarise(number_of_albums = n()) %>%
mutate(pattern_mix = ifelse(release_year == 2009 & main_group == 'Beat the Boots', 'x', '' )) # BTB 2009 is both BTB and download as well
vline_breaks <- seq(1965, year(today()), 1)
hline_breaks <- seq(0, 10, 1)
library(ggpattern) # add stripe-pattern to a barplot in ggplot
ggplot(albums_per_year, aes(x = release_year, y = number_of_albums, fill=main_group, pattern=pattern_mix)) +
geom_vline(xintercept = vline_breaks, color ="white") +
geom_hline(yintercept = hline_breaks, color ="white") +
geom_vline(xintercept = year(dod) + 0.5, color ="red") +
geom_vline(xintercept = year(umg) + 0.5, color ="black") +
geom_segment(aes(x=1995, xend=2000, y=9.5, yend=9.5),
arrow = arrow(length = unit(0.5, "cm")), color = 'red') +
geom_segment(aes(x=2019, xend=2022, y=11.5, yend=11.5),
arrow = arrow(length = unit(0.5, "cm")), color = 'black') +
labs(title = paste0("Frank Zappa official discography (", sum(albums_per_year$number_of_albums), " albums, audio and video)"),
subtitle = paste0("Source data: ", url, " (with minor modifications)"),
caption = paste0("reference date: ", today())) +
theme(plot.title = element_text(color = "black")) +
theme(plot.subtitle = element_text(color = "darkgreen")) +
annotate("text", x = 2006, y = 9.5, label = "after Dec. 4, 1993", color = 'red') +
annotate("text", x = 2012.5, y = 11.5, label = "June 2022 UMG deal", color = 'black') +
geom_bar_pattern(position="stack", stat="identity",
color = NA,
pattern_fill = "blue",
pattern_angle = 45,
pattern_density = 0.4,
pattern_spacing = 0.025,
pattern_key_scale_factor = 0.6) +
scale_fill_manual(values=c("blue","lightgreen","yellow","orange")) +
scale_pattern_manual(values = c('x' = "stripe", ' ' = "none")) +
guides(pattern = guide_legend(override.aes = list(fill = "white")),
fill = guide_legend(override.aes = list(pattern = "none"))) +
guides(pattern = FALSE) +
theme(legend.position = "inside",
legend.justification = c(0.05, 0.9),
legend.box.margin = margin(5, l = 5, unit = "mm")) +
scale_y_continuous(name = "number of albums", limits = c(0, 13),
breaks = c(0:13), position = 'left',
labels = as.character(c(0:13))) +
scale_x_continuous(name = "album year of release", limits = c(1965, year(today())+1),
breaks = c(seq(1965, 2020, by = 5)),
labels = as.character(c(seq(1965, 2020, by = 5)))) +
theme(panel.grid.minor = element_blank()) +
labs(fill = "status")
Figure 3-a. Breakdown of Zappa’s official discography by album year of release (192 albums) List of figures
Based on https://www.donlope.net/fz/lyrics/index.html
Before Frank Zappa died in 1993, he had spent much of his
remaining time and energy completing a number of projects, including
releasing the first Beat The Boots series in 1991 and 1992 (shown in
yellow in the chart above).
The highest numbers of newly released songs can be found in
the years 2019-2022, see Figure 3-b.9 We note a small peak in the numbers of
tracks released in the period before the date of death, and a large peak
in the years before the UMG agreement. We’ll have to wait and see
whether the high number of track releases we’ve seen in recent years
will last.
tracks_per_year <- zappa_discography_official_complete %>%
select(album_title, release_date, status, number_of_tracks) %>%
mutate (release_year = as.numeric(year(release_date))) %>%
mutate(main_group =ifelse(substring(status,1,8) == 'Official' | status == "Special items compiled by FZ", "Official new track",
ifelse(substring(status,1,4) == 'Beat', "Beat the Boots",
ifelse(substring(status,1,3)=="ZFT", "Downloads", "Compilations")))) %>%
mutate(main_group = factor(main_group, levels=c("Downloads","Compilations","Beat the Boots","Official new track"))) %>%
group_by(release_year, main_group) %>%
summarise(number_of_tracks = sum(number_of_tracks)) %>%
filter(number_of_tracks > 0) %>%
mutate(pattern_mix = ifelse(release_year == 2009 & main_group == 'Beat the Boots', 'x', '' )) # BTB 2009 is both BTB and download
sum(tracks_per_year$number_of_tracks)
## [1] 4012
remove_reissues <- 7 # 'Buff' sets and identical album reissues excluded (Old Masters, Threesome)
vline_breaks <- seq(1965, year(today()), 1)
hline_breaks <- seq(0, 320, 20)
ggplot(tracks_per_year, aes(x = release_year, y = number_of_tracks, fill = main_group, pattern=pattern_mix)) +
geom_vline(xintercept = vline_breaks, color ="white") +
geom_hline(yintercept = hline_breaks, color ="white") +
geom_vline(xintercept = year(dod) + 0.5, color ="red") +
geom_vline(xintercept = year(umg) + 0.5, color ="black") +
geom_segment(aes(x=1995, xend=2000, y=280, yend=280),
arrow = arrow(length = unit(0.5, "cm")), color = 'red') +
geom_segment(aes(x=2019, xend=2022, y=320, yend=320),
arrow = arrow(length = unit(0.5, "cm")), color = 'black') +
labs(title = paste0("Frank Zappa official discography (", sum(tracks_per_year$number_of_tracks), " tracks on ",nrow(zappa_discography_official_complete) - remove_reissues, " albums (*), audio and video)"),
subtitle = paste0("Source data: ", url, " (with minor modifications)"),
caption = paste0("(*) in this chart the 'Buff' sets and identical album reissues are excluded (Old Masters, Threesome). Reference date: ", today())) +
theme(plot.title = element_text(color = "black")) +
theme(plot.subtitle = element_text(color = "darkgreen")) +
annotate("text", x = 2006, y = 280, label = "after Dec. 4, 1993", color = 'red') +
annotate("text", x = 2012.5, y = 320, label = "June 2022 UMG deal", color = 'black') +
geom_bar_pattern(position="stack", stat="identity",
color = NA,
pattern_fill = "blue",
pattern_angle = 45,
pattern_density = 0.4,
pattern_spacing = 0.025,
pattern_key_scale_factor = 0.6) +
scale_fill_manual(values=c("blue","lightgreen","yellow","orange")) +
scale_pattern_manual(values = c('x' = "stripe", ' ' = "none")) +
guides(pattern = guide_legend(override.aes = list(fill = "white")),
fill = guide_legend(override.aes = list(pattern = "none"))) +
guides(pattern = FALSE) +
theme(legend.position = "inside",
legend.justification = c(0.05, 0.9),
legend.box.margin = margin(5, l = 5, unit = "mm")) +
scale_y_continuous(name = "number of tracks", limits = c(0, 320),
breaks = c(seq(0,320, by = 20)), position = 'left',
labels = as.character(c(seq(0,320, by = 20)))) +
scale_x_continuous(name = "track released", limits = c(1965, year(today())+1),
breaks = c(seq(1965, 2020, by = 5)),
labels = as.character(c(seq(1965, 2020, by = 5)))) +
theme(panel.grid.minor = element_blank()) +
labs(fill = "status")
Figure 3-b. Breakdown of Zappa’s official discography by track year of release (4012 songs on 185 albums) List of figures
This chapter is about reading tags from the FZ home FLAC
collection.10 The desktop app Mp3tag lets you scrape
album data from the FLAC files. Mp3tag is a powerful and easy-to-use
tool to edit and extract metadata of audio and video files, written by
Florian Heidenreich, an indie software developer living in Dresden,
Germany (https://www.mp3tag.de/en/).
The script below reads
the tags.
# Steps in Mp3tag: 1. select FLAC map (and all tags of audio files in directory and subdirectories are extracted)
# 2. select all
# 3. save csv file in R project data directory)
# (be sure to remove old csv (default = append))
My_own_FZ_tracks <- read_delim("data/mp3tag.csv", delim = ";",
escape_double = FALSE, locale = locale(encoding = "UTF-16")) %>%
select(-last_col())
colnames(My_own_FZ_tracks)
## [1] "Title" "Artist" "Album" "Track"
## [5] "Year" "Length" "Size" "Last Modified"
## [9] "Path" "Filename"
My_own_FZ_tracks <- My_own_FZ_tracks %>%
mutate(Filetype = str_to_upper(tools::file_ext(Filename))) %>%
mutate(v1 = (gregexpr('\\\\', Path))) %>%
mutate(v2 = map(v1,last)) %>%
mutate(Map = substring(Path, pluck(v1,1,2), as.numeric(v2))) %>%
select(c(Artist, Title, Album, Year, Track)) %>%
mutate(Track = as.numeric(Track), Year = as.numeric(Year)) %>%
filter(Artist == "Frank Zappa")
add_number_of_tracks <- add_info_and_number_of_tracks %>% select(-c(seqnr, release_date, filter_var))
My_own_FZ_albums <- My_own_FZ_tracks %>%
group_by(Album, Year) %>%
summarise(Number_of_tracks_on_my_playlist = n()) %>%
left_join(add_number_of_tracks, by = c("Album" = "album_tag" )) %>%
select(-c("number_of_tracks", "album_title"))
Official_new_FZ_albums_My_own <- zappa_discography_official_complete %>%
filter(substring(status,1,4) == "Beat" | substring(status,1,8) == "Official" | filter_var == "+" | status == "Special items compiled by FZ" & substring(status,1,14) != "Official video") %>%
left_join(My_own_FZ_albums, by = c("album_tag" = "Album" )) %>%
select(-c(label,disk,Year)) %>%
mutate(release_year = year(release_date)) %>%
filter(!is.na(Number_of_tracks_on_my_playlist))
My_official_tracks <- My_own_FZ_tracks %>%
select(-c("Artist","Year", "Track")) %>%
left_join(Official_new_FZ_albums_My_own, by = c("Album" = "album_tag")) %>%
filter(!is.na(album_title)) %>%
rename(song_title = Title) %>%
select (-c("Album","release_date")) %>%
arrange(song_title, album_title) %>%
mutate(song_title = trimws(song_title)) %>%
mutate(temp_nr = regexpr("\\(", song_title)) %>%
mutate(song_title_sec = ifelse (temp_nr > 1, substring(song_title, 1, temp_nr - 1), song_title)) %>%
select(-c(temp_nr)) %>%
mutate(album_title = trimws(album_title)) %>%
mutate(song_title_sec = trimws(song_title_sec))
The composition ‘King Kong’, made famous from the 1969
Mothers of Invention project “Uncle Meat”, tops the rankings (Figure 4).
But the year before the song could already be heard on “Lumpy Gravy”, a
1968 solo album by Frank Zappa.
Now just a first side note
using key piece “Inca Roads” as an example. “Shut Up ’n Play Yer
Guitar”, “Shut Up ’n Play Yer Guitar Some More”, “The Return of the Son
of Shut Up ’n Play Yer Guitar” (all three from the album “Shut Up ’n
Play Yer Guitar”), “Systems of Edges” (from “Guitar”), and now “A Cold
Dark Matter” (from “Trance-Fusion”) are all lifted from solo sections of
“Inca Roads”. Each version has its own charm, without ever being
redundant. Other key compositions such as “King Kong”, “City of Tiny
Lites”, “The Torture Never Stops”, “Easy Meat”, “The Black Page” also
have such spin-offs. “Uncle Rebus” on the album “Finer Moments” is
another example of a hidden tune, this time featuring solos from “King
Kong” recorded at the Ark in Boston in July 1969 (previously released as
part of Beat the Boots!, Vol. I, titled as “Uncle Meat/King Kong”
medley). Many examples can be found on the song list https://www.donlope.net/fz/songs/index.html.
So,
occasionally Zappa excerpted the solo portion of a track and retitled
it. In the present exercise, however, the title of the song as printed
on the album is leading, and not the source tune.
A second comment
concerns the occasional re-release of exact copies of previous songs on
new albums. Various examples can be found on the track list just
mentioned. Although we ignore some obvious compilation albums in this
chapter - and thus by definition exclude track duplications - there are
undoubtedly some exact song copies on our list.
FZ_songs_with_most_version_in_own_collection <- My_official_tracks %>%
arrange(song_title_sec, release_year) %>%
group_by(song_title_sec) %>%
summarise(number_of_versions = n(), first_release = first(release_year)) %>%
arrange(desc(number_of_versions)) %>%
filter(number_of_versions >= 3) %>%
mutate(song_title_sec = paste0(song_title_sec, " (", first_release, ")"),
song_title_sec = factor(song_title_sec, levels = song_title_sec))
bar_number <- FZ_songs_with_most_version_in_own_collection$number_of_versions
ggplot(FZ_songs_with_most_version_in_own_collection, aes(x=song_title_sec, y=number_of_versions)) +
geom_bar(stat="identity", width=.5, fill="tomato3") +
geom_text(aes(label=bar_number), position=position_dodge(width=1), vjust = 0.1, hjust= -0.2, size = 3) +
labs(title=paste0(nrow(FZ_songs_with_most_version_in_own_collection), " Zappa tracks with the most versions in my collection"),
subtitle = paste0("together representing ", sum(abs(FZ_songs_with_most_version_in_own_collection$number_of_versions)), " releases/reissues/etc. (*); behind title: first year of release in collection"),
caption = paste0("(*) on average that is ", round(sum(sum(abs(FZ_songs_with_most_version_in_own_collection$number_of_versions))/n_distinct(FZ_songs_with_most_version_in_own_collection$song_title_sec)),1) , " releases/reissues per track. Reference date: ", today())) +
scale_x_discrete(limits = rev) +
labs(x = "song title", y = "number of official versions in collection") +
coord_flip()
Figure 4. In home collection: Zappa tracks with the most
versions List of figures
Conceptual continuity, hu…? Well, all kinds of things have been written and analyzed about it.
Let’s hear it from Fido, first.
“…Well then Fido got up off the floor an’ he rolled over
An’ he
looked me straight in the eye
An’ you know what he said?
Once
upon a time
Somebody say to me
(This is a dog talkin’ now)
What is your Conceptual Continuity?
Well, I told him right then
(Fido said)
It should be easy to see
The crux of the biscuit
Is the Apostrophe(’)…”11
And next from a former member of the band. To put Zappa’s work
in a broader context of conceptual continuity, Mark Volman
explains:
“Frank had a philosophy, that he related to me at one
time which was, an artist’s career should not be judged on any singular
project, no single record, film or any other individual piece of work.
Frank felt that a person’s art could only be judged as part of the whole
of their career. Each individual creation was a part of that whole. No
critique of any single work could change the overall end result, which
was what should be seen as an artist’s entire body of work. Only in that
end result can it be judged and critiqued.” 12
You can
also find network elaborations of the concept on the web, such as
Cameron Piko’s “Interactive Conceptual Continuity Map” 13
Based on my
personal collection of official Zappa tracks, I present my own
interpretation and graph of “conceptual continuity”. The idea behind it
is as follows. Over the years, a fair number of variants of each key
track have been released in the form of studio recording takes and live
performances, often with a varying group of musicians. The result is a
time series of tracks. Many of those give me the feeling that late
rapper/songwriter Craig Mack described and performed:
“… Here comes the brand new flava in ya ear (flava in ya ear)
Time for new flava in ya ear (flava)
I’m kickin’ new flava in ya
ear” 14
############ A graphical representation of Frank Zappa's 'conceptual continuity' in the punch card style
top_n_tracks_alt <- My_official_tracks %>%
group_by(song_title_sec) %>%
summarise(number_of_versions = n()) %>%
arrange(desc(number_of_versions)) %>%
top_n(60, number_of_versions) %>%
left_join(My_official_tracks, by = "song_title_sec") %>%
select(song_title_sec, release_year) %>%
rename(song_title = song_title_sec, year_on_album = release_year) %>%
group_by(song_title, year_on_album) %>%
summarise(number_count = n()) %>%
group_by() %>%
arrange(desc(song_title), year_on_album) %>%
mutate(title_2_number = match(song_title, unique(song_title)))
unique(top_n_tracks_alt$song_title)
## [1] "Zoot Allures" "Zomby Woof"
## [3] "You Are What You Is" "Yo Mama"
## [5] "Wonderful Wino" "Willie The Pimp"
## [7] "Whippin Post" "Watermelon In Easter Hay"
## [9] "Tryin' To Grow A Chin" "Trouble Every Day"
## [11] "Transylvania Boogie" "The Torture Never Stops"
## [13] "The Purple Lagoon" "The Orange County Lumber Truck"
## [15] "The Mud Shark" "The Illinois Enema Bandit"
## [17] "The Black Page #2" "Strictly Genteel"
## [19] "Stink-Foot" "Stevie's Spanking"
## [21] "Son Of Mr. Green Genes" "Sharleena"
## [23] "Rollo" "RDNZL"
## [25] "Pygmy Twylyte" "Punky's Whips"
## [27] "Plastic People" "Penguin In Bondage"
## [29] "Peaches En Regalia" "Outside Now"
## [31] "Oh No" "My Guitar Wants To Kill Your Mama"
## [33] "Muffin Man" "More Trouble Every Day"
## [35] "Montana" "Magic Fingers"
## [37] "Little House I Used To Live In" "King Kong"
## [39] "It Just Might Be A One Shot Deal" "Inca Roads"
## [41] "I'm The Slime" "Hungry Freaks Daddy"
## [43] "Flakes" "Eat That Question"
## [45] "Easy Meat" "Dupree's Paradise"
## [47] "Disco Boy" "Dirty Love"
## [49] "Dinah-Moe Humm" "Dickie's Such An Asshole"
## [51] "Dancing Fool" "Cosmik Debris"
## [53] "Conehead" "City Of Tiny Lites"
## [55] "Chunga's Revenge" "Carolina Hardcore Ecstasy"
## [57] "Call Any Vegetable" "Bobby Brown Goes Down"
## [59] "Black Napkins" "Bamboozled By Love"
## [61] "Apostrophe" "Andy"
## [63] "Advance Romance" "A Pound For A Brown"
Y_axis_titles <- as_vector(top_n_tracks_alt %>%
group_by(song_title) %>%
summarise() %>%
arrange(desc(song_title)))
length(Y_axis_titles)
## [1] 64
library(ggrepel)
ggplot(top_n_tracks_alt, aes(xmin = year_on_album - 0.5, xmax = year_on_album + 0.5, y = title_2_number, colour = song_title)) +
geom_vline(xintercept = vline_breaks, color ="white") +
geom_hline(yintercept = hline_breaks, color ="white") +
geom_vline(xintercept = year(dod) + 0.5, color ="red") +
geom_segment(aes(x=1995, xend=1999, y=23.5, yend=23.5),
arrow = arrow(length = unit(0.3, "cm")), color = 'red') +
labs(title = paste0("A graphical representation of Frank Zappa's 'conceptual continuity' on a punch card"),
subtitle = paste0(n_distinct(top_n_tracks_alt$song_title), " most common tracks (y-axis) in my home collection, together representing ", sum(abs(top_n_tracks_alt$number_count)), " releases/reissues/takes/live performances (*)"),
caption = paste0("(*) on average that is ", round(sum(abs(top_n_tracks_alt$number_count))/n_distinct(top_n_tracks_alt$song_title),1) , " releases/reissues/takes/live performances per track. Reference date: ", today())) +
annotate("text", x = 2006, y = 23.5, label = "after Dec. 4, 1993", color = 'red') +
geom_linerange(linewidth = 8, position = position_dodge(width = 0.5),show.legend = FALSE, stat = "identity") +
geom_text_repel(data=top_n_tracks_alt ,
aes(x= year_on_album, y = title_2_number - 0.4, label=number_count),
direction = c("y"), box.padding = 0.00, show.legend = FALSE, vjust = -0.5, colour = "black") +
scale_color_manual(values = c("darkorange", "darkolivegreen1","antiquewhite", "blueviolet", "red", "lightgreen", "blue", "orange", "lightblue", "yellow", "purple", "pink", "grey", "deeppink1", "green", "brown3", "white", "antiquewhite", "darkkhaki", "magenta", "yellowgreen", "gold", "rosybrown1", "ivory", "darkorange", "darkolivegreen1", "turquoise1", "darkcyan", "chartreuse", "tomato", "white", "tan", "yellow2", "indianred2", "bisque", "dodgerblue", "azure2","orange", "lightblue", "yellow", "pink", "grey", "turquoise1", "darkcyan", "chartreuse", "tomato", "deepskyblue", "brown", "lightyellow", "purple", "darkgreen", "yellow3", "lightgreen", "purple", "white", "grey", "orange", "yellow4","magenta", "ivory", "darkorange", "darkolivegreen1","gold","darkcyan")) +
scale_y_continuous(limits = c(1,length(Y_axis_titles)), oob = scales::oob_keep,
name = NULL, breaks = (seq(1, length(Y_axis_titles), by = 1)),
labels = Y_axis_titles) +
scale_x_continuous(name = "album release", limits = c(1965, year(today())+1),
breaks = c(seq(1965, 2020, by = 5)),
labels = as.character(c(seq(1965, 2020, by = 5)))) +
theme(panel.grid.minor = element_blank())
Figure 5. Frank Zappa’s ‘conceptual continuity’ on a punch card List of figures
Figure 6-a shows that “The Hot Rats Sessions” (2019) and “Halloween 77” (2017) top the list with 29 selected tracks each. Posthumous albums dominate this list.
#####################################################################
####### Ranking albums with most selected songs in collection.
#####################################################################
top_n_albums <- Official_new_FZ_albums_My_own %>%
mutate(two_periods = ifelse( release_date > dod, "official posthumous", "official")) %>%
mutate(perc = round(Number_of_tracks_on_my_playlist / number_of_tracks * 100,0)) %>%
arrange(desc(Number_of_tracks_on_my_playlist), number_of_tracks) %>%
top_n(40, Number_of_tracks_on_my_playlist) %>%
select(two_periods,album_title, release_year, Number_of_tracks_on_my_playlist, number_of_tracks, perc) %>%
mutate(album_title = paste0(album_title, " (", release_year, ")"),
album_title = factor(album_title, levels = album_title))
label_bar_abs <- paste0("(", top_n_albums$Number_of_tracks_on_my_playlist, " / ",
top_n_albums$number_of_tracks, " = ", top_n_albums$perc," %)")
tot_number <- nrow(Official_new_FZ_albums_My_own)
ggplot(top_n_albums, aes(x = album_title, y = Number_of_tracks_on_my_playlist, fill=two_periods)) +
geom_bar(stat="identity", width=.5) +
geom_text(aes(label=label_bar_abs), position=position_dodge(width=1), vjust = 0.1, hjust= -0.2, size = 3) +
labs(title=paste0("Ranking albums with most selected songs (",nrow(top_n_albums), " albums listed in this diagram)"),
subtitle = paste0( "all together ", tot_number, " official albums in collection (excl. video albums) "),
caption = paste0("reference date: ", today())) +
theme(axis.text.x = element_text(angle=0, vjust=0.6)) +
theme(
legend.position = "inside",
legend.justification = c(0.8,0.2),
legend.box.margin = margin(5, l = 5, unit = "mm")) +
scale_y_continuous(name = "number of tracks picked from album (number picked/album track total)",
limits = c(0,33),
breaks = (seq(0, 33, by = 5)),
labels = as.character(seq(0, 33, by = 5))) +
scale_x_discrete(name = "", limits = rev) +
coord_flip()+
guides(fill=guide_legend("period"))
Figure 6-a. Ranking albums with most selected songs in home
collection
List of figures
From Figure 6-b it can be seen that two albums score 100%, namely “Fillmore East - June 1971” and “FZ Plays The Music Of FZ”. This means that I selected all tracks from both albums for the FLAC collection and playlists at home. The number of posthumous albums is secondary in this graph.
#####################################################################
####### Relative ranking albums with most selected songs in collection.
#####################################################################
top_perc_albums <- Official_new_FZ_albums_My_own %>%
mutate(two_periods = ifelse( release_date > dod, "official posthumous", "official")) %>%
mutate(perc = round(Number_of_tracks_on_my_playlist / number_of_tracks * 100,0)) %>%
select(two_periods,album_title, release_year, Number_of_tracks_on_my_playlist, number_of_tracks, perc) %>%
filter(album_title != "Penguin In Bondage/The Little Known History Of The Mothers Of Invention") %>%
arrange(desc(perc), desc(Number_of_tracks_on_my_playlist)) %>%
top_n(40, perc) %>%
mutate(album_title = ifelse(substring(album_title,1,42) == "Frank Zappa Plays The Music Of Frank Zappa",
"FZ Plays The Music Of FZ - A Memorial Tribute ",
ifelse(album_title == "The Frank Zappa AAAFNRAA Birthday Bundle 21.12.2014",
"The FZ AAAFNRAA Birthday Bundle 21.12.2014", album_title)),
album_title = paste0(album_title, " (", release_year, ")"),
album_title = factor(album_title, levels = album_title))
label_bar_rel <- paste0("(", top_perc_albums$Number_of_tracks_on_my_playlist, " / ",
top_perc_albums$number_of_tracks, " = ", top_perc_albums$perc," %)")
ggplot(top_perc_albums, aes(x = album_title, y = perc, fill=two_periods)) +
geom_bar(stat="identity", width=.5) +
geom_text(aes(label=label_bar_rel), position=position_dodge(width=1), vjust = 0.1, hjust= -0.2, size = 3) +
labs(title=paste0("Ranking albums with most selected songs as a percentage (",nrow(top_perc_albums), " albums listed below)"),
subtitle = paste0( "all together ", tot_number, " official albums in collection (excl. video albums) "),
caption = paste0("reference date: ", today())) +
theme(axis.text.x = element_text(angle=0, vjust=0.6)) +
theme(
legend.position = "inside",
legend.justification = c(0.8,0.2),
legend.box.margin = margin(5, l = 5, unit = "mm")) +
scale_y_continuous(name = "% of tracks picked from album (in brackets: number picked/album track total)") +
scale_x_discrete(name = "", limits = rev) +
coord_flip()+
guides(fill=guide_legend("period"))
Figure 6-b. Ranking albums with highest percentage selected
songs in home collection
List of
figures
ZappaBASE 2.0 includes venue, state, date, lineup and
setlist of all available recorded shows. All together this amounts to
946 gigs. The source material for dates and setlists comes largely from
https://www.zappateers.com/fzshows/.
The now very
well-known https://www.donlope.net/fz/ was used extensively by the
supervisor for validation.
In our final chapter we create a graph of
the songs Zappa has played live most often, based on ZappaBASE 2.0 page
https://www.zappateers.com/zappabase/stats.php. City Of
Tiny Lites tops the list and was played at 326 recorded shows. See
Figure 7 below after the R code.
library(ggtext) # markdown in ggplot
library(xml2)
url <- "https://www.zappateers.com/zappabase/stats.php"
data <- rvest::read_html(url)
css_selector <- "#table_songs > tbody"
my_lower_limit <- 3
live_lower_limit <- 78
number_of_known_live_recordings <- read_html(url) %>%
html_element(css = css_selector) %>%
html_text() %>%
strsplit(split = "\n") %>%
data.frame(V1 = unlist(.)) %>%
select(V1) %>% filter(!is.na(as.numeric(V1))) %>% mutate(V1 = as.numeric(V1)) %>% rename(number_of_known_live_recordings = V1)
words_df <- data.frame(target = c("Greasey", "Lights", "Dancin'", "Hard-Core", "Yo'", "Whipping", " Medley", " Intro", "Stinkfoot", "Sofa","Po-jama People","Pick Me I'm Clean", "Pedro's Dowry/Naval Aviation In Art",
"Help I'm A Rock", " And Goes Home", "Baby Take Your Teeth Out", "Big Leg Emma", "Bobby Brown", "Bolero (a capella)", "Caravan / When The Saints Go Marching In", "Dog Breath Variations", "Farther O'Blivion", "Gee",
"My Boyfriend's Back/I'm Gonna Bust His Head", "Porn War Jam", "Pound For A Brown", "Orange County Lumber Truck", "Twenty-One", "Uncle Meat Variations", "Who Needs The Peace Corps/Duke Of Prunes", "Wind Up Workin' In A Gas Station"),
replacement = c("Greasy", "Lites", "Dancing", "Hardcore", "Yo", "Whippin", "", "", "Stink-Foot", "Sofa #1","Po-Jama People","Pick Me, I'm Clean", "Naval Aviation In Art",
"Help, I'm A Rock", "", "Baby, Take Your Teeth Out", "Big Legs", "Bobby Brown Goes Down", "Bolero", "Caravan", "Dog Breath", "Father O'Blivion", "Gee, I Like Your Pants",
"My Boyfriend's Back", "Porn Wars Deluxe", "A Pound For A Brown", "The Orange County Lumber Truck", "Twenty One", "Uncle Meat", "Who Needs The Peace Corps", "Wind Up Working In A Gas Station"))
replacements <- c(words_df$replacement)
names(replacements) <- c(words_df$target)
recs <- xml_find_all(data, "//input")
song_labels <- as.data.frame(trimws(xml_attr(recs, "value"))[1:nrow(number_of_known_live_recordings)])
number_of_known_FZ_live_recordings_of_a_song <- cbind(song_labels,number_of_known_live_recordings) %>% rename("song_title" = !!names(.[1])) %>%
filter(!song_title %in% c("Improvisations", "Intro", "Outro", "drum check", "drum duet", "Drum Solo", "bass check", "guitar check", "Jam", "preamble", "audience participation",
"keyboard check", "sax/keyboard check", "Interlude")) %>%
mutate(song_title = str_remove(song_title, "\\?"),
song_title = str_replace_all(song_title, pattern = replacements)) %>%
group_by(song_title) %>%
summarise(number_of_known_live_recordings = sum(number_of_known_live_recordings))
top_number_of_known_FZ_live_recordings_of_a_song <- number_of_known_FZ_live_recordings_of_a_song %>%
filter (number_of_known_live_recordings >= live_lower_limit) %>%
arrange(desc(number_of_known_live_recordings)) %>% mutate(song_title = factor(song_title, levels = song_title))
bar_number_song <- top_number_of_known_FZ_live_recordings_of_a_song$number_of_known_live_recordings
ggplot(top_number_of_known_FZ_live_recordings_of_a_song, aes(x=song_title, y=number_of_known_live_recordings)) +
geom_bar(stat="identity", width=.5, fill="orange") +
geom_text(aes(label=bar_number_song), position=position_dodge(width=1), vjust = 0.1, hjust= -0.2, size = 3) +
labs(title=paste0("The ", nrow(top_number_of_known_FZ_live_recordings_of_a_song), " songs that Zappa has played live most often (and of which a live recording is known)"),
subtitle = paste0("Source: ZappaBASE 2.0 ", url),
caption = paste0("Reference date: ", today())) +
scale_x_discrete(limits = rev) +
labs(x = "song title", y = "number of known recorded live performances") +
coord_flip()
Figure 7. Songs Zappa has played live most often (based on
known recordings)
Source: ZappaBASE 2.0 https://www.zappateers.com/zappabase/stats.php
List of figures
Now let’s cross-section the
live song ranking (Figure 7 shown directly above) and my private list of
most frequently tagged tunes (Figure 4) in a two-dimensional
scatterplot. To make the two variables (dimensions) more comparable, we
apply a z-score standardization. Then a log transformation to limit the
influence of the outliers in the plot. Below are some examples printed
of the results of the calculations. Black Napkins scores 2.0 on the
x-axis and 1.54 on the y-axis. See Figure 8 for the final result.
number_of_my_own_official_FZ_songs_alt <- My_official_tracks %>% group_by(song_title_sec) %>%
summarise(my_number_of_versions = n()) %>%
arrange(desc(my_number_of_versions))
match_songs <- number_of_my_own_official_FZ_songs_alt %>% full_join(number_of_known_FZ_live_recordings_of_a_song, by = c("song_title_sec" = "song_title")) %>%
arrange(song_title_sec) %>% mutate(across(where(is.numeric), ~replace(., is.na(.), 0))) %>%
mutate(z_my_number_of_versions = round((my_number_of_versions - mean(my_number_of_versions))/sd(my_number_of_versions), 2),
z_number_of_known_live_recordings = round((number_of_known_live_recordings - mean(number_of_known_live_recordings))/sd(number_of_known_live_recordings), 2),
log1p_z_my_number_of_versions= log1p(z_my_number_of_versions),
log1p_z_number_of_known_live_recordings = log1p(z_number_of_known_live_recordings))
# log1p returns log(1 + number), computed in a way that is accurate even when the value of number is close to zero.
top_ranking <- match_songs %>% filter(my_number_of_versions >= my_lower_limit | number_of_known_live_recordings >= live_lower_limit)
knitr::kable(top_ranking %>% filter(row_number(.) <= 10)) %>%
kableExtra::kable_styling(full_width = TRUE) %>%
kableExtra::kable_styling(font_size = 9) %>%
kableExtra::row_spec(1:10, font_size = 10)
| song_title_sec | my_number_of_versions | number_of_known_live_recordings | z_my_number_of_versions | z_number_of_known_live_recordings | log1p_z_my_number_of_versions | log1p_z_number_of_known_live_recordings |
|---|---|---|---|---|---|---|
| A Pound For A Brown | 8 | 254 | 2.78 | 4.68 | 1.3297240 | 1.7369512 |
| Advance Romance | 6 | 156 | 1.97 | 2.67 | 1.0885620 | 1.3001917 |
| Andy | 3 | 89 | 0.76 | 1.30 | 0.5653138 | 0.8329091 |
| Apostrophe | 5 | 20 | 1.57 | -0.12 | 0.9439059 | -0.1278334 |
| Bamboozled By Love | 4 | 178 | 1.16 | 3.12 | 0.7701082 | 1.4158532 |
| Black Napkins | 17 | 205 | 6.41 | 3.67 | 2.0028304 | 1.5411591 |
| Bobby Brown Goes Down | 3 | 244 | 0.76 | 4.47 | 0.5653138 | 1.6992786 |
| Broken Hearts Are For Assholes | 1 | 140 | -0.05 | 2.34 | -0.0512933 | 1.2059708 |
| Call Any Vegetable | 5 | 39 | 1.57 | 0.27 | 0.9439059 | 0.2390169 |
| Camarillo Brillo | 0 | 184 | -0.45 | 3.24 | -0.5978370 | 1.4445633 |
my_own_top_songs <- as.numeric(nrow(top_ranking %>% filter(my_number_of_versions >= my_lower_limit)))
top_live_songs <- as.numeric(nrow(top_ranking %>% filter(number_of_known_live_recordings >= live_lower_limit)))
line_my_min <- as.numeric(filter(match_songs, my_number_of_versions == my_lower_limit)[1,6])
line_live_min <- as.numeric(filter(match_songs, number_of_known_live_recordings == live_lower_limit)[1,7])
ggplot(top_ranking, aes(log1p_z_my_number_of_versions, log1p_z_number_of_known_live_recordings)) +
geom_point(color = "red") +
geom_text_repel(label = top_ranking$song_title_sec, max.time = 4, max.overlaps = 100, color = "darkgreen") +
xlab("log1p z-score number of official song releases tagged in my collection (live and studio recordings)") +
ylab("log1p z-score number of known live recordings") +
theme(axis.text = element_text(size = 12), axis.title = element_text(size = 12)) +
labs(title = "Songs that Zappa played live most often (*) versus the top ranking of my private collection (**)",
subtitle = paste0("(*) y-axis stats are from known live recordings. Number of songs live recorded at least ", live_lower_limit, " times: ", top_live_songs, ". Source: ZappaBASE 2.0",
"\n(**) x-axis based on tagged studio/live performances for my playlists. Number of songs tagged at least ", my_lower_limit, " times: ", my_own_top_songs),
caption = paste0("Reference date: ", today())) +
theme(plot.title = element_markdown(hjust = 0.5, face="bold", size = 17, color = "darkgreen")) +
geom_segment(aes(x=-0.5, xend=2.5, y=-0.7, yend=-0.7), color = 'red', linetype = "dashed") +
annotate("text", x = -0.35, y = -0.75, label = "no known live recording with this title", color = 'red') +
geom_segment(aes(x=-0.55, xend=-0.55, y=-0.4, yend=2), color = 'red', linetype = "dashed") +
annotate("text", x = -0.6, y = -0.21, label = "missing in my collection", color = 'red', angle = 90) +
geom_segment(aes(x=0.09, xend=0.09, y=-0.7, yend=-0.8),
arrow = arrow(length = unit(0.5, "cm")), color = 'red') +
geom_segment(aes(x=-0.55, xend=-0.65, y=0.15, yend=0.15),
arrow = arrow(length = unit(0.5, "cm")), color = 'red') +
geom_segment(aes(x=line_my_min, xend=line_my_min, y=-Inf, yend=2), color = 'orange', linetype = "dashed") +
geom_segment(aes(x=-Inf, xend=2.5, y=line_live_min, yend=line_live_min), color = 'orange', linetype = "dashed") +
geom_segment(aes(x=line_my_min, xend=0.72, y=-0.4, yend=-0.4),
arrow = arrow(length = unit(0.5, "cm")), color = 'orange') +
annotate("text", x = 1.45, y = -0.4, label = paste0(my_lower_limit, " or more versions in my collection (live and studio recordings)"), color = 'orange') +
geom_segment(aes(x=1.8, xend=1.8, y=line_live_min, yend=0.85),
arrow = arrow(length = unit(0.5, "cm")), color = 'orange') +
annotate("text", x = 2.2, y = 0.8, label = paste0("played live at least ", live_lower_limit, " times"), color = 'orange')
Figure 8. Cross section of FZ’s top live played songs
(Figure 7, source: ZappaBASE 2.0) and my private ranking (Figure 4) in a
scatter plot
List of figures
The selection criteria are indicated in the figure by means of
orange dotted lines: at least 78 live performances or at least 3
versions in my own music collection. This means that the bottom left
quadrant is empty by definition. The top right quadrant contains tracks
that score high both in terms of number of live performances and on the
list of personal favorites.
If a song is plotted on the outside of a
red dotted line, that song is missing from either list. A few examples
of frequently played tracks, which are conspicuous by their absence on
my playlists: Tinsel Town Rebellion, The Meek Shall Inherit Nothing,
Teenage Wind, Joe’s Garage. Apparently not my most appreciated tunes.
Three titles seem never to have been played live - The Son Of Mr. Green
Genes, More Trouble Every Day, It Just Might Be A One Shot Deal - but
they appear to be on set lists under a different name: Mr. Green Genes,
Trouble Every Day, Mystery Song no. 2.
The first conclusion is that importing a discography found on the
Internet, as well as scraping a list that looks seemingly simple on the
screen, is time-consuming in terms of R programming. For me, next time I
will just simply copy and paste and next keep it up to date in Excel.
Second, enjoy your Zappa (Hi-Res) playlists from streaming
services like Qobuz. And forget about tagging FLAC files.
But
luckily for the data analyst… extracting the FLAC tags with Mp3tag works
great, provided they are properly encoded of course. Carelessly tagged
track and album titles distort matches with other databases.
Using
Mp3tag, it takes little effort to change and export tags repeatedly.
Once in the desired format, you can enjoy the graphical output
of the data.
For example:
1. Zappa’s official discography
summarized in a Sankey diagram (Chapter 3.1, Figure
1),
2. Zappa’s ‘conceptual continuity’ on a punch card (Chapter 5.2, Figure 5).
If we ignore video albums, compilations including Buff and Halloween
highlights, the Old Masters and the Threesome, the following
Zappa album pick rate appears: the number of official
albums in my own collection (140) divided by the total number (150)
gives 0.93 or 93%. Ten albums are missing from the collection: the Beat
The Boots series III (six albums) and some other download items.
My
Zappa track pick rate is the number of official tracks
in my FLAC playlist collection - 705 - divided by 3,358 (this is the
total number of tracks on my official albums) giving 0.21 or 21%,
roughly meaning one in five songs gets chosen for my playlists. The
track pick percentage of posthumous albums is considerably lower than
that of the others: 17 percent compared to 29 percent.
Figure 1. Zappa’s official discography put in a
Sankey diagram
Figure 2. Zappa’s official
discography in a pie chart
Figure 3-a. Breakdown of Zappa’s official discography by album year of
release
Figure 3-b. Breakdown of Zappa’s
official discography by track year of release
Figure 4. In my home collection: Zappa tracks with the most
versions
Figure 5. Zappa’s ‘conceptual
continuity’ on a punch card
Figure 6-a. Ranking
albums with most selected songs in my home collection
Figure
6-b. Ranking albums with the highest percentage of
selected songs in home collection
Figure 7. ZappaBASE 2.0: Songs Zappa has played live most often
(based on known recordings)
Figure 8. Cross
section of FZ’s top live played songs and my private ranking in a
scatter plot
Charles Ulrich “The Big Note: A Guide to the Recordings of Frank Zappa”, New Star Books, Canada, 2018
The Official Frank Zappa website https://www.zappa.com/music/official-discography/#/
Wikipedia https://en.wikipedia.org/wiki/Frank_Zappa_discography
Román García Albertos https://www.donlope.net/fz/index.html
Florian Heidenreich “Mp3tag” https://www.mp3tag.de/en/
FZShows, version 7.1 https://www.zappateers.com/fzshows/
FZShows was
created in 1996 by Jon Naurin. It’s currently maintained by Oscar
Bianco.
ZappaBASE 2.0 https://www.zappateers.com/zappabase/stats.php
Footnotes
Geographer, data-analist and cyclist↩︎
RYM Rough Guide for Frank Zappa↩︎
About 25 years ago I acquired a beautiful second-hand
Akai GX-270D - also built in 1976 - from a former fellow student and
recently had it overhauled. Special thanks to Philip van der Matten for
solving the poor tape speed, replacing the transistors on the audio
prints, recapping and re-adjusting!
https://www.reeltoreel.nl/
New tapes are still for
sale. That gave me the idea to record an anthology of Frank Zappa’s
oeuvre on tape, based on my own music collection. The result: A young
person’s guide to the music of Frank Zappa, from reel to reel! https://rpubs.com/JSchakenraad/1197190↩︎
Don’t forget: Your Zappa beard needs care! https://www.gillette.co.uk/blog/facial-hair-styles/zappa-beard/↩︎
https://en.wikipedia.org/wiki/List_of_performers_on_Frank_Zappa_records↩︎
In addition, some remarks. Firstly, Halloween
Highlights’ three albums are classified as ‘compilations’. Second, a
note about The Mofo Project/Object and The MOFO
Project/Object (fazedooh). We consider both as one album. The
Release History Of “The Two MOFOs” is explained here: https://www.donlope.net/fz/notes/The_MOFO_Project_Object.html,
Third, the download album AAAFNRAA—Baby Snakes—The Compleat
Soundtrack is deliberately mentioned twice in the source. Here it
counts for one. Fourth. “Shut Up ‘n Play Yer Guitar” is considered here
as one 3 LP box. As a result, Shut Up ’n Play Yer Guitar Some
More and Return Of The Son Of Shut Up ’n Play Yer Guitar
are excluded.
Joe’s Garage was originally released as two
separate albums on Zappa Records (’Act I’ and ‘Acts II and III’); The
project was later remastered and reissued as a triple album box set in
1987. Here we only mention both original records.↩︎
Source: https://www.universalmusic.com/universal-music-group-becomes-the-permanent-home-of-frank-zappa-estate/#↩︎
A note on Figure 3-b. The Old Masters
is a box set series, released in three volumes from April 1985 to
December 1987, consisting of studio and live albums by Zappa and The
Mothers of Invention originally released from 1966 to 1976, as well as
two “Mystery Discs” which contained previously unreleased material. All
in all 24 LPs…
We’ll leave out The Old Masters in
the track bar chart Figure 3-b - it would simply result in too many
identical, duplicate tracks. Both mentioned ‘Mystery discs’ were
released together on CD in 1998 and the tracks on this CD obviously
count in the graph.
During 2011-2012, Crossfire rolled out volumes
from Paul Buff’s archives of Pal and Original Sound recordings. This
concerns a number of CDs with songs by various artists from Paul Buff’s
archive. Sometimes with FZ as a writer, producer or engineer. There is
also a recording of Frank Zappa performing on the Steve Allen show,
‘Cyclophony’. Conclusion: interesting stuff, but too few connections
here. We are omitting these Buff sets.
Threesome No. 1 and Threesome No. 2
are CD reissues of 5 LPs. Let’s also leave them out.↩︎
Please note: only a limited number of tracks are tagged per album - simply a matter of choice - and added to playlists.↩︎
Stink-Foot (extract), on Apostrophe (’) (1974)↩︎
Interview with Mark Volman of the Turtles https://wiki.killuglyradio.com/wiki/Conceptual_Continuity↩︎
“Flava in Ya Ear” by Craig Mack. Album “Project: Funk Da World”, Bad Boy 1994↩︎