Megadeth is an American heavy metal band formed in 1983 by Dave Mustaine and David Ellefson. As pioneers of thrash and members of the “Big Four,” they have released numerous platinum-selling albums and received multiple Grammy nominations and awards. The classic lineup features Mustaine (vocals, lead guitar), Ellefson (bass), Marty Friedman (lead guitar), and Nick Menza (drums).
All raw data sources: megadeth.com and the official Megadeth YouTube channel.
Next, we outline the pipeline for obtaining the data, applying preprocessing steps, and deriving the relevant features.
This dataset comprises 173 songs across 16 studio albums.
The dataset contains 173 songs from 16 Megadeth studio albums (1985–2022). Audio comes from official releases. Metadata (year, band, album, track, song, duration, bpm) from public discographies. Lyrics from official sources.
Each track is decoded to WAV via FFmpeg, down-mixed to mono, and amplitude-normalized. Frames use a Hann window (\(46\) ms) with a \(23\) ms hop. For frame \(i\): samples \(x_i[m]\), DFT \(X_i[k]\), magnitudes \(M_i[k]=|X_i[k]|\) for bins \(k=1,\dots,K\), frequencies \(f_k\), and a small \(\varepsilon>0\) for numerical stability.
Per song we compute four per-frame metrics, smooth each with a cubic spline, and optionally \(z\)-score:
Loudness: \[ \mathrm{RMS}_i=\sqrt{\tfrac{1}{L}\sum_{m=0}^{L-1}x_i[m]^2}, \qquad \log\mathrm{RMS}_i=\log(\mathrm{RMS}_i+\varepsilon). \]
Higher values indicate louder passages with more acoustic energy; lower values correspond to quieter sections or silence.
Spectral brightness: \[ \displaystyle \mathrm{logSC}_i=\frac{\sum_{k=1}^K \log(f_k)\,M_i[k]}{\sum_{k=1}^K M_i[k]+\varepsilon}. \]
Higher values reflect brighter, treble-rich timbres (e.g., cymbals, distorted guitars), while lower values indicate darker or bass-heavy sounds.
Spectral flatness: \[ \displaystyle \mathrm{SFM}_i=\frac{\exp\!\big(\tfrac{1}{K}\sum_{k=1}^K \log(M_i[k]+\varepsilon)\big)}{\tfrac{1}{K}\sum_{k=1}^K M_i[k]+\varepsilon},\qquad \mathrm{logitSFM}_i=\log\!\left(\tfrac{\mathrm{SFM}_i}{1-\mathrm{SFM}_i}\right). \]
Higher values suggest noise-like or texture-like content with flatter spectra; lower values point to tonal, pitched, or harmonic sounds.
Onset/rhythmic activity: \[ \displaystyle \mathrm{Flux}_i=\sum_{k=1}^K \max\!\big(M_i[k]-M_{i-1}[k],0\big),\qquad \log\mathrm{Flux}_i=\log(\mathrm{Flux}_i+\varepsilon). \]
Higher values capture frequent or strong spectral changes, often aligned with beats or accents; lower values correspond to sustained tones, smoother textures, or silence.
For each metric \(y_s(t)\) we retain three representations: the raw series, a smoothed version \(\tilde y_s(t)\) obtained using cubic smoothing splines with smoothing parameter \(0.5\), and a standardized series \(z_s(t)=(\tilde y_s(t)-\text{mean}_s)/\text{sd}_s\). All frames are indexed on a normalized time scale \(t\in[0,1]\).
Lyrics are tokenized to words, lowercased, stopwords removed, and accents folded to ASCII. We compute: Bing coverage and negative proportion; NRC emotion proportions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust); a sentiment arc (mean, SD, slope, range); lexical richness (type–token ratio, hapax proportion); NRC VAD means (valence, arousal, dominance) and coverage.
We retain year
, band
, album
,
track
, song
, duration
,
bpm
per track (duration in seconds and bpm is predominant
tempo).
Each song curve \(y_i(t)\) is
interpolated onto a uniform grid
\[
t = \{t_1,\dots,t_M\}, \quad t_m \in [0,1],
\]
so that all songs have aligned time points.
For each song \(i\), the resampled
curve is standardized:
\[
\tilde y_i(t_m) = \frac{y_i(t_m)-\mu_i}{\sigma_i},
\qquad
\mu_i = \frac{1}{M}\sum_{m=1}^M y_i(t_m), \quad
\sigma_i^2 = \frac{1}{M-1}\sum_{m=1}^M \big(y_i(t_m)-\mu_i\big)^2.
\]
For two songs \(i\) and \(j\), their dissimilarity is computed
as
\[
D_{i,j} = \sqrt{\sum_{m=1}^M \Big(\tilde y_i(t_m)-\tilde
y_j(t_m)\Big)^2}.
\]
Distances are transformed into similarities using a Gaussian (RBF)
kernel:
\[
W_{i,j} = \exp\!\left(-\frac{D_{i,j}^2}{2\sigma^2}\right),
\qquad W_{i,i}=0,
\]
where \(\sigma\) is chosen as the
median of nonzero distances.
For each song \(i\), keep only the \(k\) strongest similarities \(W_{i,j}\) and set the rest to zero.
The adjacency matrix is then symmetrized:
\[
A_{i,j} = \max\!\big(W_{i,j}, W_{j,i}\big).
\]