About Megadeth

Megadeth is an American heavy metal band formed in 1983 by Dave Mustaine and David Ellefson. As pioneers of thrash and members of the “Big Four,” they have released numerous platinum-selling albums and received multiple Grammy nominations and awards. The classic lineup features Mustaine (vocals, lead guitar), Ellefson (bass), Marty Friedman (lead guitar), and Nick Menza (drums).

Lineups by period

1983–1984: Dave Mustaine (vocals, lead guitar), David Ellefson (bass), Greg Handevidt (guitar), Dijon Carruthers (drums).
1984: Mustaine, Ellefson, Kerry King (touring guitar), Lee Rausch (drums).
1984–1987: Mustaine, Ellefson, Chris Poland (guitar), Gar Samuelson (drums).
1987–1989: Mustaine, Ellefson, Jeff Young (guitar), Chuck Behler (drums).
1990–1998: Mustaine, Ellefson, Marty Friedman (guitar), Nick Menza (drums).
1998–2000: Mustaine, Ellefson, Marty Friedman (guitar, until 2000), Jimmy DeGrasso (drums).
2000–2002: Mustaine, Ellefson, Al Pitrelli (guitar), Jimmy DeGrasso (drums).
2004: Mustaine, Glen Drover (guitar), James MacDonough (bass), Shawn Drover (drums).
2006–2007: Mustaine, Glen Drover (guitar), James LoMenzo (bass), Shawn Drover (drums).
2008–2014: Mustaine, Chris Broderick (guitar), James LoMenzo (bass, 2006–2010) / David Ellefson (bass, 2010–2014), Shawn Drover (drums).
2015–2023: Mustaine, Kiko Loureiro (guitar), David Ellefson (bass, until 2021) / James LoMenzo (bass, from 2021), Dirk Verbeuren (drums).
2023–present: Mustaine, Teemu Mäntysaari (touring guitar, replacing Loureiro), James LoMenzo (bass), Dirk Verbeuren (drums).

All raw data sources: megadeth.com and the official Megadeth YouTube channel.

Albums

1985 — Killing Is My Business… and Business Is Good! — 8 songs
1986 — Peace Sells… but Who’s Buying? — 8 songs
1988 — So Far, So Good… So What! — 8 songs
1990 — Rust in Peace — 9 songs
1992 — Countdown to Extinction — 11 songs
1994 — Youthanasia — 12 songs
1997 — Cryptic Writings — 12 songs
1999 — Risk — 12 songs
2001 — The World Needs a Hero — 12 songs
2004 — The System Has Failed — 12 songs
2007 — United Abominations — 11 songs
2009 — Endgame — 11 songs
2011 — Th1rt3en — 13 songs
2013 — Super Collider — 11 songs
2016 — Dystopia — 11 songs
2022 — The Sick, the Dying… and the Dead! — 12 songs

Data, preprocessing, and features

Next, we outline the pipeline for obtaining the data, applying preprocessing steps, and deriving the relevant features.

This dataset comprises 173 songs across 16 studio albums.

Scope and sources

The dataset contains 173 songs from 16 Megadeth studio albums (1985–2022). Audio comes from official releases. Metadata (year, band, album, track, song, duration, bpm) from public discographies. Lyrics from official sources.

Audio decoding and text mining

Each track is decoded to WAV via FFmpeg, down-mixed to mono, and amplitude-normalized. Frames use a Hann window (\(46\) ms) with a \(23\) ms hop. For frame \(i\): samples \(x_i[m]\), DFT \(X_i[k]\), magnitudes \(M_i[k]=|X_i[k]|\) for bins \(k=1,\dots,K\), frequencies \(f_k\), and a small \(\varepsilon>0\) for numerical stability.

Audio metrics

Per song we compute four per-frame metrics, smooth each with a cubic spline, and optionally \(z\)-score:

Loudness: \[ \mathrm{RMS}_i=\sqrt{\tfrac{1}{L}\sum_{m=0}^{L-1}x_i[m]^2}, \qquad \log\mathrm{RMS}_i=\log(\mathrm{RMS}_i+\varepsilon). \]

Higher values indicate louder passages with more acoustic energy; lower values correspond to quieter sections or silence.
Spectral brightness: \[ \displaystyle \mathrm{logSC}_i=\frac{\sum_{k=1}^K \log(f_k)\,M_i[k]}{\sum_{k=1}^K M_i[k]+\varepsilon}. \]

Higher values reflect brighter, treble-rich timbres (e.g., cymbals, distorted guitars), while lower values indicate darker or bass-heavy sounds.
Spectral flatness: \[ \displaystyle \mathrm{SFM}_i=\frac{\exp\!\big(\tfrac{1}{K}\sum_{k=1}^K \log(M_i[k]+\varepsilon)\big)}{\tfrac{1}{K}\sum_{k=1}^K M_i[k]+\varepsilon},\qquad \mathrm{logitSFM}_i=\log\!\left(\tfrac{\mathrm{SFM}_i}{1-\mathrm{SFM}_i}\right). \]

Higher values suggest noise-like or texture-like content with flatter spectra; lower values point to tonal, pitched, or harmonic sounds.
Onset/rhythmic activity: \[ \displaystyle \mathrm{Flux}_i=\sum_{k=1}^K \max\!\big(M_i[k]-M_{i-1}[k],0\big),\qquad \log\mathrm{Flux}_i=\log(\mathrm{Flux}_i+\varepsilon). \]

Higher values capture frequent or strong spectral changes, often aligned with beats or accents; lower values correspond to sustained tones, smoother textures, or silence.

Post-processing

For each metric \(y_s(t)\) we retain three representations: the raw series, a smoothed version \(\tilde y_s(t)\) obtained using cubic smoothing splines with smoothing parameter \(0.5\), and a standardized series \(z_s(t)=(\tilde y_s(t)-\text{mean}_s)/\text{sd}_s\). All frames are indexed on a normalized time scale \(t\in[0,1]\).

Lyrics

Lyrics are tokenized to words, lowercased, stopwords removed, and accents folded to ASCII. We compute: Bing coverage and negative proportion; NRC emotion proportions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust); a sentiment arc (mean, SD, slope, range); lexical richness (type–token ratio, hapax proportion); NRC VAD means (valence, arousal, dominance) and coverage.

Metadata

We retain year, band, album, track, song, duration, bpm per track (duration in seconds and bpm is predominant tempo).

Building the Song Graph

Each song curve \(y_i(t)\) is interpolated onto a uniform grid
\[ t = \{t_1,\dots,t_M\}, \quad t_m \in [0,1], \]
so that all songs have aligned time points.

For each song \(i\), the resampled curve is standardized:
\[ \tilde y_i(t_m) = \frac{y_i(t_m)-\mu_i}{\sigma_i}, \qquad \mu_i = \frac{1}{M}\sum_{m=1}^M y_i(t_m), \quad \sigma_i^2 = \frac{1}{M-1}\sum_{m=1}^M \big(y_i(t_m)-\mu_i\big)^2. \]

For two songs \(i\) and \(j\), their dissimilarity is computed as
\[ D_{i,j} = \sqrt{\sum_{m=1}^M \Big(\tilde y_i(t_m)-\tilde y_j(t_m)\Big)^2}. \]

Distances are transformed into similarities using a Gaussian (RBF) kernel:
\[ W_{i,j} = \exp\!\left(-\frac{D_{i,j}^2}{2\sigma^2}\right), \qquad W_{i,i}=0, \]
where \(\sigma\) is chosen as the median of nonzero distances.

For each song \(i\), keep only the \(k\) strongest similarities \(W_{i,j}\) and set the rest to zero.

The adjacency matrix is then symmetrized:
\[ A_{i,j} = \max\!\big(W_{i,j}, W_{j,i}\big). \]

Megadeth dataset

Juan Sosa PhD

Email jcsosam@unal.edu.co

GitHub https://github.com/jstats1702