Ayrton Pablo Almada Jimenez
12/01/2023
Portamento
In music, portamento is a pitch sliding from one note to another. The term originated from the Italian expression “portamento della voce” (“carriage of the voice”).
Denoting from the beginning of the 17th century its use in vocal performances and emulation by members of the violin family and certain wind instruments
Short portamenti can connect notes to make a passage sound more fluid, while long portamenti can draw out a transition with anticipation before finally arriving at the destination.
Solutions to (PF) satisfy the set of operational constraints that describe currents, voltages, phases and other graph-specific constraints
Instruments that, like the human voice, can vary their pitch continuously.
Certain electronic systems are capable of producing the effect, but they are limited to particular situations (e.g. monophonic glide, offline processing).
The optimal transport problem asks how to move probability mass from one configuration to another in a way that minimizes the amount of work (mass times distance) performed on each infinitesimal piece of mass.
\[\begin{aligned}\pi^*&=_{(1)}\arg\left(\min\limits_{\pi\in\Pi\left(\rho_{v},\rho_{w}\right)}\iint_{\mathbb{R}^2}\|x-y\|_2^2d\pi(x,y)\right)\\\text{s.t.}&\begin{cases}\int_{\mathbb{R}}\pi(x,y)dy=\rho_v(x)\\\int_{\mathbb{R}}\pi(x,y)dx=\rho_w(y)\end{cases}\end{aligned}\]
We use the optimal plan to perform displacement interpolation between two distributions
Animates the mass assignment computed in (1) by sliding each particle of mass between its two assignments
On top, the distributions are interpolated linearly.
On the bottom, the same distributions are transformed using displacement interpolation
An audio effect which interpolates between any two audio streams in a way that sounds like a portamento, automatically and in real-time.
The audio transport effect relies on solving a 1-dimensional optimal transport problem. The solution to this problem determines how the pitches in one signal will move to pitches in the other.
Works by performing displacement interpolation on input audio spectra, so that pitches in one signal slide to pitches in the other as an interpolation parameter is changed.
A sliding short-time Fourier transform (STFT) is applied to both input audio streams, producing complex spectra.
These spectra are interpolated according to the optimal transport map and fed through an inverse STFT to form the output audio stream.
Consider discrete spectra represented by complex vectors \(X,Y\) and corresponding frequency vectors \(\omega^X,\omega^Y\)
Analogously to the continuous optimal transport plan given in Equation (1), we can write the optimal transport plan between these discrete spectra as the plan
\[\begin{aligned}\pi^*&=\arg\left(\min_{\pi \succeq 0}\left\{\sum_{i,j}\bigg|\omega^X-\omega^Y\bigg|^2\pi_{ij}\right\}\right),\\&\text{s.t.}\begin{cases}\sum\limits_j\pi_{ij}=|X_i|,\\\sum\limits_i\pi_{ij}=|Y_j|.\end{cases}\end{aligned}\]
This problem assumes that \(\sum\limits_i|X_i|=\sum\limits_j|X_j|\).
Once an optimal plan is computed, the spectra can be interpolated with parameter \(k\in [0,1]\) by placing each mass \(\pi^*_{ij}\) at the displaced frequency:
The algorithm begins with the initial bins of the two spectra.
Since no mass can cross over any other, all of the mass in the smaller bin must be assigned to the larger.
With this assignment done, one can imagine virtually removing the smaller bin and shrinking the mass of the larger by the mass assignment.
The algorithm then continues inductively on the smaller problem.
At every iteration all of the mass in one bin becomes completely assigned.
Therefore, the complexity of the algorithm is \(O(|X|+|Y|)\).
This runtime is efficient relative to the super-linear runtime of the fast Fourier transform.
\[ \begin{aligned} \overline{\underline{\text{Algorithm 1:}}}&\overline{\underline{\text{ Computing The Optimal Transport Matrix, }\pi^*~~~~~~~~~~~~~~}}\\ \pi^*_{ij}&\leftarrow 0\\ \rho_X&,\rho_Y\leftarrow|X_0|,|Y_0|~~~~~~~~~~~~~~~~~~~~~~~~~~\rho\text{ is the mass left in a bin}\\ \text{loo}&\text{p}\\ &\text{if }\rho_X<\rho_Y\text{ then}\\ &~~~~\pi^*_{ij}\leftarrow \rho_X~~~~~~~~~~~~~~~~~~~~~~\text{Assign as much mass as possible}\\ &~~~~i\leftarrow i+1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\text{Refill the emptied bin}\\ &~~~~\text{if }i\ge|X|\text{ then break}\\ &~~~~~~~~\rho_X\leftarrow|X_i|\\ &~~~~~~~~\rho_Y\leftarrow\rho_Y-\rho_X~~~~~~\text{Decrease the capacity of the other}\\ &~~~~\text{else}\\ &~~~~~~~~~~~\text{Symmetric to the case above}\\ \underline{~~~~~~~~~~~~~~~~~\text{ret}}&\underline{\text{urn }\pi^*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~} \end{aligned} \]
One unfortunate effect of using an STFT is the necessary tradeoff between time and frequency resolution. As the time resolution increases, the frequency domain becomes “smeared.
The relation between a peak frequency and its smeared components is known to be important for perceptual quality. Treating these independently leads to phasing artifacts within a window known as vertical incoherence.
One method to solve this problem in phase vocoder literature is to “lock” regions surrounding a peak frequency so that the relative phase between bins within these regions remains unchanged.
If Algorithm 1 were applied directly to audio spectra, it would introduce vertical incoherence by translating smeared components independently.
It now remains to determine how exactly to choose the boundaries between smeared spectral regions.
Frequency reassignment uses information in a signal’s phase to enhance its frequency resolution.
Each spectral component with frequency \(\omega_i\) is mapped to the reassigned frequency \(\hat{\omega_i}\) that better reflects the true energy distribution.
Sinusoids that have been smeared across multiple bins become mapped to the same central frequency, which produces the plateaus shown in Figure 2.
With this view, an intuitive way to define sinusoidal regions is by the zero crossings of \(\hat{\omega_i}-\omega_i\)
Falling crossings indicate the center bin of a region while rising crossings indicate the boundaries
These can be computed at the cost of an additional STFT with the following formula:
\[\hat{\omega_i}-\omega_i=\mathcal{F}\left\{\frac{X_i^{\mathcal{Th}}\cdot X_i^*}{\|X_i^2\|}\right\},\]