About Metallica

Metallica is an American heavy metal band formed in 1981 by James Hetfield and Lars Ulrich. As pioneers of thrash and members of the “Big Four,” they have sold tens of millions of records and earned multiple Grammy Awards. The classic lineup features Hetfield (vocals, rhythm guitar), Ulrich (drums), Kirk Hammett (lead guitar), and Robert Trujillo (bass).

Lineups by period

1981–1982: James Hetfield (vocals, rhythm guitar), Lars Ulrich (drums), Dave Mustaine (lead guitar), Ron McGovney (bass).
1983–1986: Hetfield, Ulrich, Kirk Hammett (lead guitar), Cliff Burton (bass).
1986–2001: Hetfield, Ulrich, Hammett, Jason Newsted (bass).
2001–2003: Hetfield, Ulrich, Hammett, Bob Rock (bass, studio/tour transition).
2003–present: Hetfield, Ulrich, Hammett, Robert Trujillo (bass).

All raw data sources: metallica.com and the official Metallica YouTube channel.

Albums

1983 — Kill ’Em All — 10 songs
1984 — Ride the Lightning — 8 songs
1986 — Master of Puppets — 8 songs
1987 — The $5.98 E.P.: Garage Days Re-Revisited — 5 songs
1988 — …And Justice for All — 9 songs
1991 — Metallica — 12 songs
1996 — Load — 14 songs
1997 — Reload — 13 songs
1998 — Garage Inc. — 11 songs
2003 — St. Anger — 11 songs
2008 — Death Magnetic — 10 songs
2016 — Hardwired…To Self-Destruct — 13 songs
2023 — 72 Seasons — 12 songs

Data, preprocessing, and features

Next, we outline the pipeline for obtaining the data, applying preprocessing steps, and deriving the relevant features.

This dataset comprises 136 songs across 13 studio albums.

Scope and sources

The dataset contains 136 songs from 13 Metallica studio albums (1983–2025). Audio comes from official releases. Metadata (year, band, album, track, song, duration, bpm) from public discographies. Lyrics from official sources.

Audio decoding and text mining

Each track is decoded to WAV via FFmpeg, down-mixed to mono, and amplitude-normalized. Frames use a Hann window ($46$ ms) with a $23$ ms hop. For frame $i$: samples $x_i[m]$, DFT $X_i[k]$, magnitudes $M_i[k]=|X_i[k]|$ for bins $k=1,\dots,K$, frequencies $f_k$, and a small $\varepsilon>0$ for numerical stability.

Audio metrics

Per song we compute four per-frame metrics, smooth each with a cubic spline, and optionally $z$-score:

Loudness: \[ \mathrm{RMS}_i=\sqrt{\tfrac{1}{L}\sum_{m=0}^{L-1}x_i[m]^2}, \qquad \log\mathrm{RMS}_i=\log(\mathrm{RMS}_i+\varepsilon). \]

Higher values indicate louder passages with more acoustic energy; lower values correspond to quieter sections or silence.
Spectral brightness: \[ \displaystyle \mathrm{logSC}_i=\frac{\sum_{k=1}^K \log(f_k)\,M_i[k]}{\sum_{k=1}^K M_i[k]+\varepsilon}. \]

Higher values reflect brighter, treble-rich timbres (e.g., cymbals, distorted guitars), while lower values indicate darker or bass-heavy sounds.
Spectral flatness: \[ \displaystyle \mathrm{SFM}_i=\frac{\exp\!\big(\tfrac{1}{K}\sum_{k=1}^K \log(M_i[k]+\varepsilon)\big)}{\tfrac{1}{K}\sum_{k=1}^K M_i[k]+\varepsilon},\qquad \mathrm{logitSFM}_i=\log\!\left(\tfrac{\mathrm{SFM}_i}{1-\mathrm{SFM}_i}\right). \]

Higher values suggest noise-like or texture-like content with flatter spectra; lower values point to tonal, pitched, or harmonic sounds.
Onset/rhythmic activity: \[ \displaystyle \mathrm{Flux}_i=\sum_{k=1}^K \max\!\big(M_i[k]-M_{i-1}[k],0\big),\qquad \log\mathrm{Flux}_i=\log(\mathrm{Flux}_i+\varepsilon). \]

Higher values capture frequent or strong spectral changes, often aligned with beats or accents; lower values correspond to sustained tones, smoother textures, or silence.

Post-processing

For each metric $y_s(t)$ we retain three representations: the raw series, a smoothed version $\tilde y_s(t)$ obtained using cubic smoothing splines with smoothing parameter $0.5$, and a standardized series $z_s(t)=(\tilde y_s(t)-\text{mean}_s)/\text{sd}_s$. All frames are indexed on a normalized time scale $t\in[0,1]$.

Lyrics

Lyrics are tokenized to words, lowercased, stopwords removed, and accents folded to ASCII. We compute: Bing coverage and negative proportion; NRC emotion proportions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust); a sentiment arc (mean, SD, slope, range); lexical richness (type–token ratio, hapax proportion); NRC VAD means (valence, arousal, dominance) and coverage.

Metadata

We retain year, band, album, track, song, duration, bpm per track (duration in seconds and bpm is predominant tempo).

Building the Song Graph

Each song curve $y_i(t)$ is interpolated onto a uniform grid
\[ t = \{t_1,\dots,t_M\}, \quad t_m \in [0,1], \]
so that all songs have aligned time points.

For each song $i$, the resampled curve is standardized:
\[ \tilde y_i(t_m) = \frac{y_i(t_m)-\mu_i}{\sigma_i}, \qquad \mu_i = \frac{1}{M}\sum_{m=1}^M y_i(t_m), \quad \sigma_i^2 = \frac{1}{M-1}\sum_{m=1}^M \big(y_i(t_m)-\mu_i\big)^2. \]

For two songs $i$ and $j$, their dissimilarity is computed as
\[ D_{i,j} = \sqrt{\sum_{m=1}^M \Big(\tilde y_i(t_m)-\tilde y_j(t_m)\Big)^2}. \]

Distances are transformed into similarities using a Gaussian (RBF) kernel:
\[ W_{i,j} = \exp\!\left(-\frac{D_{i,j}^2}{2\sigma^2}\right), \qquad W_{i,i}=0, \]
where $\sigma$ is chosen as the median of nonzero distances.

For each song $i$, keep only the $k$ strongest similarities $W_{i,j}$ and set the rest to zero.

The adjacency matrix is then symmetrized:
\[ A_{i,j} = \max\!\big(W_{i,j}, W_{j,i}\big). \]

Metallica dataset

Juan Sosa PhD

Email jcsosam@unal.edu.co

GitHub https://github.com/jstats1702