Define the Approximate Median Significance as
\[ \text{AMS} = \sqrt{2((s + b + b_r)\log\big(1 + \frac{s}{b + b_r}\big) - s)} \]
where
\[ s = \sum_{i=1}^{n} w_i1\{y_i=s\}1\{p_i=s\} \]
and
\[ b = \sum_{i=1}^{n} w_i1\{y_i=b\}1\{p_i=s\} \]
with
\( y_i \) - the true outcome of the \( i^\text{th} \) datapoint
\( p_i \) - the model's prediction of the \( i^\text{th} \) datapoint
\( w_i \) - the assigned weight of the \( i^\text{th} \) datapoint
\( b_r = 10 \)
We can rewrite \( \text{AMS} \) in a more concice form
\[ \text{AMS} = \sqrt{2(FL - s)} \]
where
\[ F = s + b + b_r \] \[ L = \log\big(\frac{F}{b + b_r}\big) \]
We can define a smooth approximation to AMS against the model's prediciton to allow for gradient computations:
\[ \text{AMS}_\text{s} = \sqrt{2(F_\text{s}L_\text{s} - s_\text{s})} \]
where
\[ s_\text{s} = \sum_{i=1}^{n} w_i1\{y_i=s\}h_\theta(x_i) \] \[ b = \sum_{i=1}^{n} w_i1\{y_i=b\}h_\theta(x_i) \] \[ F_\text{s} = s_\text{s} + b_\text{s} + b_r \] \[ L_\text{s} = \log\big(\frac{F_\text{s}}{b_\text{s} + b_r}\big) \]
Here, \( h_\theta(x_i) \) outputs the probability that point \( x_i \) is of class \( s \) parameterized by \( \theta \)
Now we can compute the gradient of \( \text{AMS}_\text{s} \) with respect to \( \theta \).
I will focus on deriving the gradient of \( 0.5\text{AMS}_\text{s}^2 \) (the inner part of AMS underneath the square root) since the derivative of the square root should be trivial with basic calculus
\[ 0.5\frac{\delta \text{AMS}_\text{s}^2}{\delta \theta_j} = \frac{\delta F_\text{s}}{\delta \theta_j}L_\text{s} + F_\text{s}\frac{\delta L_\text{s}}{\delta \theta_j} - \frac{\delta s_\text{s}}{\delta \theta_j} \]
with
\[ \frac{\delta F_\text{s}}{\delta \theta_j} = \vec{w}^T\frac{\delta h_\theta(\vec{x})}{\delta \theta_j} \] \[ \frac{\delta L_\text{s}}{\delta \theta_j} = \frac{\frac{\delta F_\text{s}}{\delta \theta_j}}{F} - \frac{\frac{\delta b_\text{s}}{\delta \theta_j}}{b_\text{s} + b_r} \] \[ \frac{\delta s_\text{s}}{\delta \theta_j} = (1\{\vec{y}=s\}\star\vec{w})^T\frac{\delta h_\theta(\vec{x})}{\delta \theta_j} \] \[ \frac{\delta b_\text{s}}{\delta \theta_j} = (1\{\vec{y}=b\}\star\vec{w})^T\frac{\delta h_\theta(\vec{x})}{\delta \theta_j} \]
where \( \star \) is the element-wise multiplication operation and \( h_\theta(\vec{x}) \) is element-wise partial derivative of \( h_\theta \)
With these equations we can simplify the partial derivative of \( 0.5\text{AMS}_\text{s}^2 \) to:
\[ 0.5\frac{\delta \text{AMS}_\text{s}^2}{\delta \theta_j} = ((1 + L_\text{s} - e^{L_\text{s}}1\{\vec{y}=b\} - 1\{\vec{y}=s\}) \star \vec{w})^T \frac{\delta h_\theta(\vec{x})}{\delta \theta_j} \]