INTRO

Portamento
- In music, portamento is a pitch sliding from one note to another. The term originated from the Italian expression “portamento della voce” (“carriage of the voice”).
- Denoting from the beginning of the 17th century its use in vocal performances and emulation by members of the violin family and certain wind instruments
- Short portamenti can connect notes to make a passage sound more fluid, while long portamenti can draw out a transition with anticipation before finally arriving at the destination.
- Solutions to (PF) satisfy the set of operational constraints that describe currents, voltages, phases and other graph-specific constraints
- Instruments that, like the human voice, can vary their pitch continuously.
- Certain electronic systems are capable of producing the effect, but they are limited to particular situations (e.g. monophonic glide, offline processing).

OPTIMAL TRANSPORT OVERVIEW Part I

The optimal transport problem asks how to move probability mass from one configuration to another in a way that minimizes the amount of work (mass times distance) performed on each infinitesimal piece of mass.
\[\begin{aligned}\pi^*&=_{(1)}\arg\left(\min\limits_{\pi\in\Pi\left(\rho_{v},\rho_{w}\right)}\iint_{\mathbb{R}^2}\|x-y\|_2^2d\pi(x,y)\right)\\\text{s.t.}&\begin{cases}\int_{\mathbb{R}}\pi(x,y)dy=\rho_v(x)\\\int_{\mathbb{R}}\pi(x,y)dx=\rho_w(y)\end{cases}\end{aligned}\]

OPTIMAL TRANSPORT Part II

We use the optimal plan to perform displacement interpolation between two distributions
- Animates the mass assignment computed in (1) by sliding each particle of mass between its two assignments
- Figure 1: The distribution on the left is transformed into the distribution on the right with two different interpolation methods.
- On top, the distributions are interpolated linearly.
  - As audio spectra, fading one set of pitches out and another set in.
- On the bottom, the same distributions are transformed using displacement interpolation
  - Mass physically slides from one location to another, this sliding would sound like a portamento.

Audio transport

An audio effect which interpolates between any two audio streams in a way that sounds like a portamento, automatically and in real-time.
The audio transport effect relies on solving a 1-dimensional optimal transport problem. The solution to this problem determines how the pitches in one signal will move to pitches in the other.
Works by performing displacement interpolation on input audio spectra, so that pitches in one signal slide to pitches in the other as an interpolation parameter is changed.
- A sliding short-time Fourier transform (STFT) is applied to both input audio streams, producing complex spectra.
- These spectra are interpolated according to the optimal transport map and fed through an inverse STFT to form the output audio stream.

Optimal Transport Between Spectra

Consider discrete spectra represented by complex vectors \(X,Y\) and corresponding frequency vectors \(\omega^X,\omega^Y\)
Analogously to the continuous optimal transport plan given in Equation (1), we can write the optimal transport plan between these discrete spectra as the plan
\[\begin{aligned}\pi^*&=\arg\left(\min_{\pi \succeq 0}\left\{\sum_{i,j}\bigg|\omega^X-\omega^Y\bigg|^2\pi_{ij}\right\}\right),\\&\text{s.t.}\begin{cases}\sum\limits_j\pi_{ij}=|X_i|,\\\sum\limits_i\pi_{ij}=|Y_j|.\end{cases}\end{aligned}\]
This problem assumes that \(\sum\limits_i|X_i|=\sum\limits_j|X_j|\).
- To treat spectra with different total magnitudes, the plan can be computed on normalized spectra; then, scaling is interpolated linearly over the interpolation.
Once an optimal plan is computed, the spectra can be interpolated with parameter \(k\in [0,1]\) by placing each mass \(\pi^*_{ij}\) at the displaced frequency:
- \[(1-k)\omega^X_i+k\omega^X_j.\]

Algorithm to determine \(\pi^*\) Part I

The algorithm begins with the initial bins of the two spectra.
- Since no mass can cross over any other, all of the mass in the smaller bin must be assigned to the larger.
- With this assignment done, one can imagine virtually removing the smaller bin and shrinking the mass of the larger by the mass assignment.
The algorithm then continues inductively on the smaller problem.
At every iteration all of the mass in one bin becomes completely assigned.
Therefore, the complexity of the algorithm is \(O(|X|+|Y|)\).
This runtime is efficient relative to the super-linear runtime of the fast Fourier transform.

Algorithm to determine \(\pi^*\) Part II

\[ \begin{aligned} \overline{\underline{\text{Algorithm 1:}}}&\overline{\underline{\text{ Computing The Optimal Transport Matrix, }\pi^*~~~~~~~~~~~~~~}}\\ \pi^*_{ij}&\leftarrow 0\\ \rho_X&,\rho_Y\leftarrow|X_0|,|Y_0|~~~~~~~~~~~~~~~~~~~~~~~~~~\rho\text{ is the mass left in a bin}\\ \text{loo}&\text{p}\\ &\text{if }\rho_X<\rho_Y\text{ then}\\ &~~~~\pi^*_{ij}\leftarrow \rho_X~~~~~~~~~~~~~~~~~~~~~~\text{Assign as much mass as possible}\\ &~~~~i\leftarrow i+1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\text{Refill the emptied bin}\\ &~~~~\text{if }i\ge|X|\text{ then break}\\ &~~~~~~~~\rho_X\leftarrow|X_i|\\ &~~~~~~~~\rho_Y\leftarrow\rho_Y-\rho_X~~~~~~\text{Decrease the capacity of the other}\\ &~~~~\text{else}\\ &~~~~~~~~~~~\text{Symmetric to the case above}\\ \underline{~~~~~~~~~~~~~~~~~\text{ret}}&\underline{\text{urn }\pi^*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~} \end{aligned} \]

Resolving Vertical Incoherence: Slicing the Spectrogram

One unfortunate effect of using an STFT is the necessary tradeoff between time and frequency resolution. As the time resolution increases, the frequency domain becomes “smeared.
The relation between a peak frequency and its smeared components is known to be important for perceptual quality. Treating these independently leads to phasing artifacts within a window known as vertical incoherence.
One method to solve this problem in phase vocoder literature is to “lock” regions surrounding a peak frequency so that the relative phase between bins within these regions remains unchanged.
If Algorithm 1 were applied directly to audio spectra, it would introduce vertical incoherence by translating smeared components independently.
- So, applying the locking strategy, we will treat smeared regions as single units with collective magnitude in the transportation map.
It now remains to determine how exactly to choose the boundaries between smeared spectral regions.
- A common strategy is to use Frequency Reassignment.

Frequency ReassignmentFrequency Reassignment Part I

Frequency reassignment uses information in a signal’s phase to enhance its frequency resolution.
- Each spectral component with frequency \(\omega_i\) is mapped to the reassigned frequency \(\hat{\omega_i}\) that better reflects the true energy distribution.
- Sinusoids that have been smeared across multiple bins become mapped to the same central frequency, which produces the plateaus shown in Figure 2.
Figure 2: Dividing the spectrum of a sinusoidal A major chord consisting of the notes A4,C#5,E5 and A5. The spectrum is displayed on top. On the bottom, the reassigned frequency (solid line) is plotted against the frequency (dashed line). The intersections of these lines indicate the boundaries between groups (vertical lines) and their pitch centers (dots).

Frequency ReassignmentFrequency Reassignment Part II

With this view, an intuitive way to define sinusoidal regions is by the zero crossings of \(\hat{\omega_i}-\omega_i\)
Falling crossings indicate the center bin of a region while rising crossings indicate the boundaries
These can be computed at the cost of an additional STFT with the following formula:
\[\hat{\omega_i}-\omega_i=\mathcal{F}\left\{\frac{X_i^{\mathcal{Th}}\cdot X_i^*}{\|X_i^2\|}\right\},\]
- where \(X^{\mathcal{Th}}\) is the STFT computed using a time-weighted analysis window.

AUDIO TRANSPORT:A GENERALIZED PORTAMENTO VIA OPTIMAL TRANSPORT