A smoothed approximation of AMS

Define the Approximate Median Significance as

\[ \text{AMS} = \sqrt{2((s + b + b_r)\log\big(1 + \frac{s}{b + b_r}\big) - s)} \]

where

\[ s = \sum_{i=1}^{n} w_i1\{y_i=s\}1\{p_i=s\} \]

and

\[ b = \sum_{i=1}^{n} w_i1\{y_i=b\}1\{p_i=s\} \]

with

\( y_i \) - the true outcome of the \( i^\text{th} \) datapoint
\( p_i \) - the model's prediction of the \( i^\text{th} \) datapoint
\( w_i \) - the assigned weight of the \( i^\text{th} \) datapoint
\( b_r = 10 \)

We can rewrite \( \text{AMS} \) in a more concice form

\[ \text{AMS} = \sqrt{2(FL - s)} \]

where

\[ F = s + b + b_r \] \[ L = \log\big(\frac{F}{b + b_r}\big) \]

We can define a smooth approximation to AMS against the model's prediciton to allow for gradient computations:

\[ \text{AMS}_\text{s} = \sqrt{2(F_\text{s}L_\text{s} - s_\text{s})} \]

where

\[ s_\text{s} = \sum_{i=1}^{n} w_i1\{y_i=s\}h_\theta(x_i) \] \[ b = \sum_{i=1}^{n} w_i1\{y_i=b\}h_\theta(x_i) \] \[ F_\text{s} = s_\text{s} + b_\text{s} + b_r \] \[ L_\text{s} = \log\big(\frac{F_\text{s}}{b_\text{s} + b_r}\big) \]

Here, \( h_\theta(x_i) \) outputs the probability that point \( x_i \) is of class \( s \) parameterized by \( \theta \)

Now we can compute the gradient of \( \text{AMS}_\text{s} \) with respect to \( \theta \).

I will focus on deriving the gradient of \( 0.5\text{AMS}_\text{s}^2 \) (the inner part of AMS underneath the square root) since the derivative of the square root should be trivial with basic calculus

\[ 0.5\frac{\delta \text{AMS}_\text{s}^2}{\delta \theta_j} = \frac{\delta F_\text{s}}{\delta \theta_j}L_\text{s} + F_\text{s}\frac{\delta L_\text{s}}{\delta \theta_j} - \frac{\delta s_\text{s}}{\delta \theta_j} \]

with

\[ \frac{\delta F_\text{s}}{\delta \theta_j} = \vec{w}^T\frac{\delta h_\theta(\vec{x})}{\delta \theta_j} \] \[ \frac{\delta L_\text{s}}{\delta \theta_j} = \frac{\frac{\delta F_\text{s}}{\delta \theta_j}}{F} - \frac{\frac{\delta b_\text{s}}{\delta \theta_j}}{b_\text{s} + b_r} \] \[ \frac{\delta s_\text{s}}{\delta \theta_j} = (1\{\vec{y}=s\}\star\vec{w})^T\frac{\delta h_\theta(\vec{x})}{\delta \theta_j} \] \[ \frac{\delta b_\text{s}}{\delta \theta_j} = (1\{\vec{y}=b\}\star\vec{w})^T\frac{\delta h_\theta(\vec{x})}{\delta \theta_j} \]

where \( \star \) is the element-wise multiplication operation and \( h_\theta(\vec{x}) \) is element-wise partial derivative of \( h_\theta \)

With these equations we can simplify the partial derivative of \( 0.5\text{AMS}_\text{s}^2 \) to:

\[ 0.5\frac{\delta \text{AMS}_\text{s}^2}{\delta \theta_j} = ((1 + L_\text{s} - e^{L_\text{s}}1\{\vec{y}=b\} - 1\{\vec{y}=s\}) \star \vec{w})^T \frac{\delta h_\theta(\vec{x})}{\delta \theta_j} \]