March 14, 2017
\[\begin{equation}\lambda^{(*)} = \underset{\lambda \in \Sigma}{argmin}{\mathbb{E}_{x \sim \mathcal{G}_{x}}\left[\mathcal{L}\left(x ; \mathcal{A}_{\lambda}\left(\mathcal{X}^{(train)}\right)\right)\right]}\label{eq:hyper-optimization}\end{equation}\]
\[\begin{eqnarray} \lambda^{(*)} &\approx& \underset{\lambda \in \Sigma}{argmin} \underset{x \in \mathcal{X}^{(valid)}}{mean} \mathcal{L}\left( x ; \mathcal{A}_\lambda \left(X^{(train)}\right) \right) \\ &=& \underset{\lambda \in \Sigma}{argmin} \Psi(\lambda) \\ &\approx& \underset{\lambda \in \{\lambda^{(1)}, \lambda^{(2)}, ..., \lambda^{(S)}\}}{argmin} \Psi(\lambda) = \hat{\lambda} \end{eqnarray}\]
Bergstra, James, and Yoshua Bengio. 2012. “Random Search for Hyper-Parameter Optimization.” J. Mach. Learn. Res. 13 (February). JMLR.org: 281–305. http://dl.acm.org/citation.cfm?id=2188385.2188395.
Larochelle, Hugo, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. 2007. “An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation.” In Proceedings of the 24th International Conference on Machine Learning, 473–80. ICML ’07. New York, NY, USA: ACM. doi:10.1145/1273496.1273556.