About Metallica

Metallica is an American heavy metal band formed in 1981 by James Hetfield and Lars Ulrich. As pioneers of thrash and members of the “Big Four,” they have sold tens of millions of records and earned multiple Grammy Awards. The classic lineup features Hetfield (vocals, rhythm guitar), Ulrich (drums), Kirk Hammett (lead guitar), and Robert Trujillo (bass).


Lineups by period

  • 1981–1982: James Hetfield (vocals, rhythm guitar), Lars Ulrich (drums), Dave Mustaine (lead guitar), Ron McGovney (bass).
  • 1983–1986: Hetfield, Ulrich, Kirk Hammett (lead guitar), Cliff Burton (bass).
  • 1986–2001: Hetfield, Ulrich, Hammett, Jason Newsted (bass).
  • 2001–2003: Hetfield, Ulrich, Hammett, Bob Rock (bass, studio/tour transition).
  • 2003–present: Hetfield, Ulrich, Hammett, Robert Trujillo (bass).

All raw data sources: metallica.com and the official Metallica YouTube channel.


Albums

  • 1983 — Kill ’Em All — 10 songs
  • 1984 — Ride the Lightning — 8 songs
  • 1986 — Master of Puppets — 8 songs
  • 1987 — The $5.98 E.P.: Garage Days Re-Revisited — 5 songs
  • 1988 — …And Justice for All — 9 songs
  • 1991 — Metallica — 12 songs
  • 1996 — Load — 14 songs
  • 1997 — Reload — 13 songs
  • 1998 — Garage Inc. — 11 songs
  • 2003 — St. Anger — 11 songs
  • 2008 — Death Magnetic — 10 songs
  • 2016 — Hardwired…To Self-Destruct — 13 songs
  • 2023 — 72 Seasons — 12 songs

Data, preprocessing, and features

Next, we outline the pipeline for obtaining the data, applying preprocessing steps, and deriving the relevant features.

This dataset comprises 136 songs across 13 studio albums.


Scope and sources

The dataset contains 136 songs from 13 Metallica studio albums (1983–2025). Audio comes from official releases. Metadata (year, band, album, track, song, duration, bpm) from public discographies. Lyrics from official sources.


Audio decoding and text mining

Each track is decoded to WAV via FFmpeg, down-mixed to mono, and amplitude-normalized. Frames use a Hann window (\(46\) ms) with a \(23\) ms hop. For frame \(i\): samples \(x_i[m]\), DFT \(X_i[k]\), magnitudes \(M_i[k]=|X_i[k]|\) for bins \(k=1,\dots,K\), frequencies \(f_k\), and a small \(\varepsilon>0\) for numerical stability.


Audio metrics

Per song we compute four per-frame metrics, smooth each with a cubic spline, and optionally \(z\)-score:

  • Loudness: \[ \mathrm{RMS}_i=\sqrt{\tfrac{1}{L}\sum_{m=0}^{L-1}x_i[m]^2}, \qquad \log\mathrm{RMS}_i=\log(\mathrm{RMS}_i+\varepsilon). \]

    Higher values indicate louder passages with more acoustic energy; lower values correspond to quieter sections or silence.

  • Spectral brightness: \[ \displaystyle \mathrm{logSC}_i=\frac{\sum_{k=1}^K \log(f_k)\,M_i[k]}{\sum_{k=1}^K M_i[k]+\varepsilon}. \]

    Higher values reflect brighter, treble-rich timbres (e.g., cymbals, distorted guitars), while lower values indicate darker or bass-heavy sounds.

  • Spectral flatness: \[ \displaystyle \mathrm{SFM}_i=\frac{\exp\!\big(\tfrac{1}{K}\sum_{k=1}^K \log(M_i[k]+\varepsilon)\big)}{\tfrac{1}{K}\sum_{k=1}^K M_i[k]+\varepsilon},\qquad \mathrm{logitSFM}_i=\log\!\left(\tfrac{\mathrm{SFM}_i}{1-\mathrm{SFM}_i}\right). \]

    Higher values suggest noise-like or texture-like content with flatter spectra; lower values point to tonal, pitched, or harmonic sounds.

  • Onset/rhythmic activity: \[ \displaystyle \mathrm{Flux}_i=\sum_{k=1}^K \max\!\big(M_i[k]-M_{i-1}[k],0\big),\qquad \log\mathrm{Flux}_i=\log(\mathrm{Flux}_i+\varepsilon). \]

    Higher values capture frequent or strong spectral changes, often aligned with beats or accents; lower values correspond to sustained tones, smoother textures, or silence.


Post-processing

For each metric \(y_s(t)\) we retain three representations: the raw series, a smoothed version \(\tilde y_s(t)\) obtained using cubic smoothing splines with smoothing parameter \(0.5\), and a standardized series \(z_s(t)=(\tilde y_s(t)-\text{mean}_s)/\text{sd}_s\). All frames are indexed on a normalized time scale \(t\in[0,1]\).


Lyrics

Lyrics are tokenized to words, lowercased, stopwords removed, and accents folded to ASCII. We compute: Bing coverage and negative proportion; NRC emotion proportions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust); a sentiment arc (mean, SD, slope, range); lexical richness (type–token ratio, hapax proportion); NRC VAD means (valence, arousal, dominance) and coverage.


Metadata

We retain year, band, album, track, song, duration, bpm per track (duration in seconds and bpm is predominant tempo).


Building the Song Graph

Each song curve \(y_i(t)\) is interpolated onto a uniform grid
\[ t = \{t_1,\dots,t_M\}, \quad t_m \in [0,1], \]
so that all songs have aligned time points.

For each song \(i\), the resampled curve is standardized:
\[ \tilde y_i(t_m) = \frac{y_i(t_m)-\mu_i}{\sigma_i}, \qquad \mu_i = \frac{1}{M}\sum_{m=1}^M y_i(t_m), \quad \sigma_i^2 = \frac{1}{M-1}\sum_{m=1}^M \big(y_i(t_m)-\mu_i\big)^2. \]

For two songs \(i\) and \(j\), their dissimilarity is computed as
\[ D_{i,j} = \sqrt{\sum_{m=1}^M \Big(\tilde y_i(t_m)-\tilde y_j(t_m)\Big)^2}. \]

Distances are transformed into similarities using a Gaussian (RBF) kernel:
\[ W_{i,j} = \exp\!\left(-\frac{D_{i,j}^2}{2\sigma^2}\right), \qquad W_{i,i}=0, \]
where \(\sigma\) is chosen as the median of nonzero distances.

For each song \(i\), keep only the \(k\) strongest similarities \(W_{i,j}\) and set the rest to zero.

The adjacency matrix is then symmetrized:
\[ A_{i,j} = \max\!\big(W_{i,j}, W_{j,i}\big). \]


Example

Audio data for Metallica (The Black Album)


Netwrok data for Metallica (The Black Album)


Metallica