The challenge here is to work out the partial derivatives for a cost function in logistic regression. It is far more complicated than the trivial exercise of partial derivatives for linear regression. For me, anyway.
The cost function for logistic regression is:
\[J(\theta) = -\frac{1}{m} \sum\limits_{i=1}^m y^{(i)} \log(h_{\theta}(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_{\theta}(x^{(i)}))\]
This is what we know (thanks to Prof. Ng) to be the partial derivative of the cost function. Can we find it?
\[\frac{\partial}{\partial \theta_j} = \frac{1}{m} \sum\limits_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_j^{(i)}\]
\[h_{\theta}(x^{(i)}) = \frac{1}{1+e^{-\theta^Tx}}\]
\[\log \left( \frac{x}{y} \right) = \log(x) - \log(y)\]
\[\log(e^a) = a\]
\[\frac{\delta}{\delta x} \log(\text{expression}) = \frac{1}{\text{expression}} \cdot \frac{\delta}{\delta x} \text{expression}\]
Examples:
\[\frac{\delta}{\delta x} \log(x) = \frac{1}{x} \cdot \frac{\delta}{\delta x} x = \frac{1}{x} \cdot 1 = \frac{1}{x}\]
\[\frac{\delta}{\delta x} \log(\frac{1}{2x^2 + 3}) = (2x^2 + 3) \cdot \frac{\delta}{\delta x} 2x^2 + 3 = (2x^2 + 3) \cdot 4x = 8x^3 + 12x\]
This is an almost maddeningly clever technique. It is not the sort of answer one might find casually. It seems clear that there was much trial and error arriving at the neat, compact form at the end of step 2.
\[\begin{eqnarray} J(\theta) &=& -\frac{1}{m} \left[ \sum\limits_{i=1}^m \left( y^{(i)} (\log h_{\theta}(x^{(i)})) + (1 - y^{(i)})\log(1 - h_{\theta}(x^{(i)})) \right) \right] \\ \nonumber & & \text{Replace }h_{\theta}(x^{(i)})\text{ with sigmoid} \\ \nonumber &=& -\frac{1}{m} \left[ \sum\limits_{i=1}^m \left( y^{(i)}\log(\frac{1}{1+e^{-\theta^Tx^{(i)}}}) + (1-y^{(i)}) \log(1 - \frac{1}{1+e^{-\theta^Tx^{(i)}}}) \right) \right] \\ \nonumber & & \text{Convert right term to single rational expression} \\ \nonumber &=& -\frac{1}{m} \left[ \sum\limits_{i=1}^m \left( y^{(i)}\log(\frac{1}{1+e^{-\theta^Tx^{(i)}}}) + (1-y^{(i)}) \log(\frac{e^{-\theta^Tx^{(i)}}}{1+e^{-\theta^Tx^{(i)}}}) \right) \right] \\ \nonumber & & \text{Apply }\log(\frac{a}{b})=\log(a) - \log(b) \text{ on left term} \\ \nonumber &=& -\frac{1}{m} \left[ \sum\limits_{i=1}^m \left( y^{(i)}(\log(1)-\log(1+e^{-\theta^Tx^{(i)}})) + (1-y^{(i)}) \log(\frac{e^{-\theta^Tx^{(i)}}}{1+e^{-\theta^Tx^{(i)}}}) \right) \right] \\ \nonumber &=& -\frac{1}{m} \left[ \sum\limits_{i=1}^m \left( -y^{(i)}\log(1+e^{-\theta^Tx^{(i)}}) + (1-y^{(i)}) \log(\frac{e^{-\theta^Tx^{(i)}}}{1+e^{-\theta^Tx^{(i)}}}) \right) \right] \\ \nonumber & &\text{Apply }\log(\frac{a}{b}) = \log(a) - \log(b) \text{ to right term}\\ \nonumber &=& -\frac{1}{m} \left[ \sum\limits_{i=1}^m \left( -y^{(i)}\log(1+e^{-\theta^Tx^{(i)}}) + (1-y^{(i)}) \log(e^{-\theta^Tx^{(i)}})-(1-y^{(i)})(\log({1+e^{-\theta^Tx^{(i)}}}) \right) \right] \\ \nonumber & & \text{Apply } \log(e^a) = a \text{ to right term}\\ \nonumber &=& -\frac{1}{m} \left[ \sum\limits_{i=1}^m \left( -y^{(i)}\log(1+e^{-\theta^Tx^{(i)}}) + (1-y^{(i)})(-\theta^Tx^{(i)})-(1-y^{(i)})(\log({1+e^{-\theta^Tx^{(i)}}}) \right) \right] \\ \nonumber & & \text{Move minus sign inside }\sum \\ \nonumber &=& \frac{1}{m} \left[ \sum\limits_{i=1}^m \left( y^{(i)}\log(1+e^{-\theta^Tx^{(i)}}) + (1-y^{(i)})(\theta^Tx^{(i)})+(1-y^{(i)})(\log({1+e^{-\theta^Tx^{(i)}}}) \right) \right] \\ \nonumber & & \text{Combine first and third terms} \\ \nonumber &=& \frac{1}{m} \left[ \sum\limits_{i=1}^m \left( \log(1+e^{-\theta^Tx^{(i)}}) + (1-y^{(i)})(\theta^Tx^{(i)}) \right) \right] \\ \nonumber \end{eqnarray}\]
See step 2 in First Attempt (below) for initial partial derivative of left term.
\[\begin{eqnarray} \frac{\partial}{\partial \theta_j} J(\theta) &=& \frac{1}{m} \left[ \sum\limits_{i=1}^{m} \left( \frac{e^{-\theta^Tx^{(i)}}(-x_j^{(i)})}{1 + e^{-\theta^Tx^{(i)}}} + (1 - y^{(i)})x_j^{(i)}\right) \right] \\ \nonumber & & \text{Now factor out }x^{(i)_j}\\ \nonumber &=& \frac{1}{m} \left[ \sum\limits_{i=1}^{m} \left( \frac{-e^{-\theta^Tx^{(i)}}}{1 + e^{-\theta^Tx^{(i)}}} + 1 - y^{(i)} \right) x_j^{(i)} \right] \\ \nonumber & & \text{Combine first two terms} \\ \nonumber &=& \frac{1}{m} \left[ \sum\limits_{i=1}^{m} \left( \frac{1}{1 + e^{-\theta^Tx^{(i)}}} - y^{(i)})x_j^{(i)}\right) \right] \\ \nonumber & & \text{Substitute }h_{\theta}(x^{(i)}) \text{ for sigmoid function}\\ \nonumber &=& \frac{1}{m} \left[ \sum\limits_{i=1}^{m} \left( h_{\theta}(x^{(i)}) - y^{(i)})x_j^{(i)}\right) \right] \\ \nonumber \end{eqnarray}\]
Everything below was a first attempt, probably doomed without first simplifying the cost function.
\[\frac{\partial}{\partial\theta_j} e^{-\theta^Tx} = e^{-\theta^Tx} \cdot \frac{\partial}{\partial\theta_j} -\theta^T x= e^{-\theta^Tx} \cdot -x_j^{(i)}\]
If I’m right here, it means that we can’t get rid of the \(-\theta^Tx\), because \(\frac{d}{dx} e^x = e^x \cdot \frac{d}{dx} x\). Hence, things quickly get messy.
However, we are looking at \(h_{\theta}(x^{(i)})\), not \(h_{\theta}(x)\). So I think it should be understood that every instance of \(e^{-\theta^Tx}\) below is actually:
\[e^{-\theta^Tx^{(i)}}\]
Does this make a difference?
The 1 is just a constant, so the derivative is the same as for \(e^{-\theta^Tx}\):
\[\frac{\partial}{\partial\theta_j} 1 + e^{-\theta^Tx} = 0 + \frac{\partial}{\partial\theta_j} e^{-\theta^Tx}= e^{-\theta^Tx} \cdot -x_j^{(i)}\]
\[\frac{\partial}{\partial\theta_j} \frac{1}{1+e^{-\theta^Tx}} = \frac{\partial}{\partial\theta_j} (1+e^{-\theta^Tx})^{-1}=-1(1+e^{-\theta^Tx})^{-2} \cdot \frac{\partial}{\partial\theta_j} 1 + e^{-\theta^Tx}\]
Substitute 2. for the last term, which becomes the numerator:
\[\frac{\partial}{\partial\theta_j} \frac{1}{1+e^{-\theta^Tx}} = \frac{e^{-\theta^Tx} \cdot -x_j^{(i)}}{(1+e^{-\theta^Tx})^2}\]
\[\frac{\partial}{\partial\theta_j} \log(\frac{1}{1+e^{-\theta^Tx}}) = (1+e^{-\theta^Tx}) \cdot \frac{\partial}{\partial\theta_j} \frac{1}{1 + e^{-\theta^Tx}}\]
Substitute 3. for the last term.
\[\frac{\partial}{\partial\theta_j} \log(\frac{1}{1+e^{-\theta^Tx}}) = (1+e^{-\theta^Tx}) \cdot \frac{e^{-\theta^Tx} \cdot -x_j^{(i)}}{(1+e^{-\theta^Tx})^2} = \frac{e^{-\theta^Tx} \cdot -x_j^{(i)}}{1+e^{-\theta^Tx}}\]
Substitute \(h_{\theta}(x)\) for \(\frac{1}{1+e^{-\theta^Tx}}\).
\[\frac{\partial}{\partial\theta_j} \log(\frac{1}{1+e^{-\theta^Tx}}) = (1+e^{-\theta^Tx}) \cdot \frac{e^{-\theta^Tx} \cdot -x_j^{(i)}}{(1+e^{-\theta^Tx})^2} = h_{\theta}(x) \cdot e^{-\theta^Tx} \cdot -x_j^{(i)}\]
My final expression is so long and ugly that I doubt it is correct. But I can’t find any flaw in the logic.
Find \(\frac{\partial}{\partial\theta_j} \log(1-h_{\theta}(x^{(i)})\).
First, replace \(h_{\theta}(x^{(i)})\) with the sigmoid function.
\[\frac{\partial}{\partial\theta_j} \log(1-h_{\theta}(x^{(i)})) = \frac{\partial}{\partial\theta_j} \log(1 - \frac{1}{1+e^{-\theta^Tx}})\]
Put into a single rational expression.
\[= \frac{\partial}{\partial\theta_j} \log(\frac{1+e^{-\theta^Tx}}{1+e^{-\theta^Tx}}-\frac{1}{1+e^{-\theta^Tx}}) = \frac{\partial}{\partial\theta_j} \log(\frac{e^{-\theta^Tx}}{1 + e^{-\theta^Tx}})\]
Next, \(\frac{\partial}{\partial x} \log(\text{expr}) = \frac{1}{\text{expr}} \cdot \frac{\partial}{\partial x} \text{expr}\).
\[= \frac{\partial}{\partial\theta_j} \log(\frac{e^{-\theta^Tx}}{1+e^{-\theta^Tx}}) = \frac{1+e^{-\theta^Tx}}{e^{-\theta^Tx}} \cdot \frac{\partial}{\partial\theta_j} \frac{e^{-\theta^Tx}}{1+e^{-\theta^Tx}}\]
Convert elements already derived above.
\[= \frac{1+e^{-\theta^Tx}}{e^{-\theta^Tx}} \cdot \frac{\partial}{\partial\theta_j} (e^{-\theta^Tx}) \cdot \frac{1}{1 + e^{-\theta^Tx}}\]
Now apply product rule, substitute in previous derivatives (2. and 3.).
\[= \frac{1+e^{-\theta^Tx}}{e^{-\theta^Tx}} \cdot [(e^{-\theta^Tx}) \cdot \frac{e^{-\theta^Tx} \cdot -x_j^{(i)}}{(1+e^{-\theta^Tx})^2} + (e^{-\theta^Tx} \cdot -x_j^{(i)}) \cdot \frac{1}{1+e^{-\theta^Tx}}]\]
Simplify both expressions in the square brackets into rational expressions.
\[= \frac{1+e^{-\theta^Tx}}{e^{-\theta^Tx}} \cdot [\frac{(e^{-\theta^Tx})^2 \cdot -x_j^{(i)}}{(1+e^{-\theta^Tx})^2} + \frac{e^{-\theta^Tx} \cdot -x_j^{(i)}}{1+e^{-\theta^Tx}}]\]
Make the denominators in the square brackets equivalent.
\[= \frac{1+e^{-\theta^Tx}}{e^{-\theta^Tx}} \cdot [\frac{(e^{-\theta^Tx})^2 \cdot -x_j^{(i)}}{(1+e^{-\theta^Tx})^2} + \frac{e^{-\theta^Tx} \cdot -x_j^{(i)} \cdot (1+e^{-\theta^Tx})}{(1+e^{-\theta^Tx})^2}]\]
Combine the two expressions in the square brackets.
\[= \frac{1+e^{-\theta^Tx}}{e^{-\theta^Tx}} \cdot [\frac{((e^{-\theta^Tx})^2 \cdot -x_j^{(i)}) + (e^{-\theta^Tx} \cdot -x_j^{(i)} \cdot (1+e^{-\theta^Tx}))}{(1+e^{-\theta^Tx})^2}]\]
Simplify the whole thing. Factor \(e^{-\theta^Tx}\) out of the left denominator and right numerator. Factor \(1 + e^{-\theta^Tx}\) out of the left numerator and the right denominator.
\[= \frac{(e^{-\theta^Tx}\cdot -x_j^{(i)}) + (-x_j^{(i)} \cdot (1+e^{-\theta^Tx}))}{1+e^{-\theta^Tx}}\]
Factor out \(-x_j^{(i)}\) in the numerator.
\[= \frac{-x_j^{(i)} (e^{-\theta^Tx} + (1+e^{-\theta^Tx}))}{1+e^{-\theta^Tx}} = \frac{-x_j^{(i)} (2e^{-\theta^Tx} + 1)}{1+e^{-\theta^Tx}}\]
\[= \frac{-2x_j^{(i)} e^{-\theta^Tx} - x_j^{(i)}}{1+e^{-\theta^Tx}}\]
Maybe this needs to be changed to another form. For example it could be:
\[= (-2x_j^{(i)} e^{-\theta^Tx} - x_j^{(i)}) \cdot h_{\theta}(x^{(i)})\]
\[\frac{\partial}{\partial\theta_j} y^{(i)} \log(h_{\theta}(x^{(i)})) = y^{(i)} \cdot x_j^{(i)} \cdot h_{\theta}(x^{(i)}) \cdot e^{-\theta^Tx}\]
\[\frac{\partial}{\partial\theta_j} (1 - y^{(i)}) \log(1 - h_{\theta}(x^{(i)})) = \]
Add 6. and 7. together.