Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory gifi.stat.ucla.edu/zangwill has a pdf version, the complete Rmd file with all code chunks, the bib file, and the R source code.

1 Introduction

2 Zangwill Theory

2.1 Algorithms as Point-to-set Maps

An descent algorithm is defined on a space \(\Omega.\) It consists of a triple \((\mathcal{A},\mathcal{P},\psi),\) with \(\mathcal{A}\) a mapping of \(\Omega\) into the power set \(\wp(\Omega)\) of nonempty subsets of \(\Omega,\) with \(\psi\) a real valued function on \(\Omega\), and with \(\mathcal{P}\) a fixed subset of \(\Omega.\) We call \(\mathcal{A}\) the algorithmic map, \(\mathcal{P}\) the desirable points, and \(\psi\) the evaluation function.

The algorithm works as follows.

  1. start at an arbitrary \(\omega^{(0)}\in\Omega,\)
  2. if \(\omega^{(k)}\in\mathcal{P},\) then we stop,
  3. otherwise we construct the successor by the rule \(\omega^{(k+1)}\in\mathcal{A}(\omega^k).\)

In a descent algorithm we assume \(\mathcal{A}\) is strictly monotonic on \(\Omega-\mathcal{P}\), i.e. \(\xi\in\mathcal{A}(\omega)\) implies \(\psi(\xi)\) < \(\psi(\omega)\) if \(\omega\notin\mathcal{P}\).

We study properties of the sequences \(\omega^{(k)}\) generated by algorithm of this type, in particular their convergence.

2.2 Convergence of Function Values

For this method we have our first (rather trivial) convergence theorem.

Theorem 1: [Monotone] If

  • \(\Omega\) is compact,
  • \(\psi\) is continuous on \(\Omega\),

then

  • The sequence \(\{\psi^{(k)}\}\) converges to, say, \(\psi^\infty,\)
  • the sequence \(\{\omega^{(k)}\}\) has at least one convergent subsequence,
  • if \(\omega^\infty\) is an accumulation point of \(\{\omega^{(k)}\},\) then \(\psi(\omega^\infty)=\psi^\infty.\)

Proof: Compactness and continuity imply that \(\psi\) is bounded below on \(\Omega\), and the minimum exists. Monotonicity implies that \(\{\psi^{(k)}\}\) is nonincreasing and bounded below, and thus convergent. Existence of convergent subsequences is guaranteed by Bolzano-Weierstrass, and if we have a subsequence \(\{\omega^{(k)}\}_{k\in\mathcal{K}}\) converging to \(\omega^\infty\) then by continuity \(\{\psi(\omega^{(k)})\}_{k\in\mathcal{K}}\) converges to \(\psi(\omega^\infty).\) But all subsequences of a convergent sequence converge to the same point, and thus \(\psi(\omega^\infty)=\psi^\infty.\) QED

Remark: The assumptions in theorem 1 can be relaxed. Continuity is not necessary. If we define the level sets \[ \Omega_0\mathop{=}\limits^{\Delta}\{\omega\in\Omega\mid\psi(\omega)\leq\psi^{(0)}\}, \] then obviously it is sufficient to assume that \(\Omega_0\) is compact. The same thing is true for the even weaker assumption that the iterates \(\omega^{(k)}\) are in a compact set. We do not assume the sets \(\Omega_s\) are connected, i.e. they could be discrete.

Theorem 1 is very general, but the conclusions are quite weak. We have convergence of the function values, but about the sequence \(\{\omega^{(k)}\}\) we only know that it has one or more accumulation points, and that all accumulation points have the same function value. We have not established other desirable properties of these accumulation points.

In order to prove global convergence (i.e. convergence from any initial point) we use the general theory developed initially by Zangwill (1969) (and later by Polak (1969), R. R. Meyer (1976), G. G. L. Meyer (1975), and others). The best introduction and overview is perhaps the volume edited by Huard ((Ed) (1979)).

2.3 Convergence of Solutions

Theorem 2: [Zangwill]

  • If \(\mathcal{A}\) is uniformly compact on \(\Omega,\) i.e. there is a compact \(\Omega_0\subseteq\Omega\) such that \(\mathcal{A}(\omega)\subseteq\Omega_0\) for all \(\omega\in\Omega,\)
  • If \(\mathcal{A}\) is upper-semicontinuous or closed on \(\Omega-\mathcal{P},\) i.e. if \(\xi_i\in\mathcal{A}(\omega_i)\) and \(\xi_i\rightarrow\xi\) and \(\omega_i\rightarrow\omega\) then \(\xi\in\mathcal{A}(\omega),\)
  • If \(\mathcal{A}\) is strictly monotonic on \(\Omega-\mathcal{P}\), i.e. \(\xi\in\mathcal{A}(\omega)\) implies \(\psi(\xi)\) < \(\psi(\omega)\) if \(\omega\) is not a desirable point. then all accumulation points of the sequence \(\{\omega^{(k)}\}\) generated by the algorithm are desirable points.

Proof: Compactness implies that \(\{\omega^{(k)}\}\) has a convergent subsequence. Suppose its index-set is \(\mathcal{K}=\{k_1,k_2,\cdots\}\) and that it converges to \(\omega_\mathcal{K}.\) Since \(\{\psi(\omega^{(k)})\}\) converges to, say \(\psi_\infty,\) we see that also \[ \{\psi(\omega^{(k_1)}),\psi(\omega^{(k_2)}),\cdots\} \rightarrow\psi_\infty. \] Now consider \(\{\omega^{(k_1+1)},\omega^{(k_2+1)},\cdots\},\) which must again have a convergent subsequence. Suppose its index-set is \(\mathcal{L}=\{\ell_1+1,\ell_2+1,\cdots\}\) and that it converges to \(\omega_\mathcal{L}.\) Then \(\psi(\omega_\mathcal{K})=\psi_(\omega_\mathcal{L})=\psi_\infty.\)

Assume \(\omega_\mathcal{K}\) is not a fixed point. Now \[ \{\omega^{(\ell_1)},\omega^{(\ell_2)},\cdots\} \rightarrow \omega_\mathcal{K} \] and \[ \{\omega^{(\ell_1+1)},\omega^{(\ell_2+1)},\cdots\} \rightarrow\omega_\mathcal{L}, \] with \(\omega^{(\ell_j+1)}\in\mathcal{A}(\omega^{(\ell_j+1)}).\) Thus, by usc, \(\omega_\mathcal{L}\in\mathcal{A}(\omega_\mathcal{K}).\) If \(\omega_\mathcal{K}\) is not a fixed point, then strict monotonicity gives gives \(\psi(\omega_\mathcal{L})\) < \(\psi(\omega_\mathcal{K}),\) which contradicts our earlier \(\psi(\omega_\mathcal{K})=\psi_(\omega_\mathcal{L}).\) QED

The concept of closedness of a map can be illustrated with the following picture, showing a map which is not closed at at least one point.

My Figure

My Figure

It is easy to see that desirable points are generalized fixed points, in the sense that \(\omega\in\mathcal{P}\) is equivalent to that \(\omega\in\mathcal{A}(\omega).\) According to theorem 2 each accumulation point is a generalized fixed point. This, however, does not prove convergence, because there can be more than one accumulation point. If we define strongly desirable points as points such that \(\mathcal{A}(\omega)=\{\omega\},\) then we can strengthen the theorem (R. R. Meyer (1976)).

Theorem 3: [Meyer] Suppose the uniform compactness and closedness conditions of theorem 2 are satisfied and the monotonicity assumption is replaced by

  • \(\xi\in\mathcal{A}(\omega)\) implies \(\psi(\xi)\) < \(\psi(\omega)\) if \(\omega\) is not a strongly desirable point,

then in addition to what we had before \(\{\omega^{(k)}\}\) is asymptotically regular, i.e. \(\|\omega^{(k)}-\omega^{(k+1)}\|\rightarrow 0.\)

Proof: Use the notation in the proof of 2. Suppose \(\|\omega^{(\ell_i+1)}-\omega^{(\ell_i)}\|\) > \(\delta\) > \(0.\) Then \(\|\omega_\mathcal{L}-\omega_\mathcal{K}\|\geq\delta.\) But \(\omega_\mathcal{K}\) is a strongly desirable point and thus \(\omega_\mathcal{L}\in\mathcal{A}(\omega_\mathcal{K})=\{\omega_\mathcal{K}\},\) a contradiction. QED

Theorem 4: [Ostrowski] If the bounded sequence \(\Omega=\{\omega^{(k)}\}\) satisfies \(\|\omega^{(k)}-\omega^{(k+1)}\|\rightarrow 0,\) then the derrived set, i.e. the set of its accumulation points, is either a point or a continuum.

Proof: We follow Ostrowski (1966) (pages 203–204). A continuum is a closed set, which is not the union of two or more disjoint closed sets. Of course the derived set is closed. Suppose it is the union of the disjoint closed sets \(C_1\) and \(C_2,\) which are a distance of at least \(p\) apart. We can choose \(k_0\) such that \(\|\omega^{(k)}-\omega^{(k+1)}\|\leq\frac{p}{3}\) for all \(k\geq k_0.\) QED

Only if the derived set is a single point, we have actual convergence. Thus Meyer’s theorem still does not prove actual convergence, but it is close enough for all practical purposes. Observe boundedness is essential in theorem 4, otherwise the derived set may be empty.

3 Ostrowski Theory

3.1 Rate of Convergence

The basic result we use is due to Ostrowski (1966).

Theorem 5: [Rate]

  • If the iterative algorithm \(x^{(k+1)}=\mathcal{A}(x^{(k)}),\) converges to \(x_\infty,\)
  • and \(\mathcal{A}\) is differentiable at \(x_\infty,\)
  • and \(0<\rho=\|\mathcal{D}\mathcal{A}(\omega_\infty)\|<1,\)

then the algorithm is linearly convergent with rate \(\rho.\)

Proof:

QED

The norm in the theorem is the spectral norm, i.e. the modulus of the maximum eigenvalue. Let us call the derivative of \(A\) the iteration matrix and write it as \(\mathcal{M}.\) In general block relaxation methods have linear convergence, and the linear convergence can be quite slow. In cases where the accumulation points are a continuum we have sublinear rates. The same things can be true if the local minimum is not strict, or if we are converging to a saddle point.

Generalization to non-differentiable maps.

Points of attraction and repulsion.

Superlinear etc

4 Sidi Theory

References

(Ed), P. Huard. 1979. Point-to-set Maps and Mathematical Programming. Edited by P. Huard. Amsterdam, Netherlands: North Holland Publishing Company.

Meyer, G. G. L. 1975. “A Systematic Approach to the Synthesis of Algorithms.” Numerische Mathematik 24: 277–89.

Meyer, R. R. 1976. “Sufficient Conditions for the Convergence of Monotonic Mathematical Programming Algorithms.” Journal of Computer and System Sciences 12: 108–21.

Ostrowski, A. M. 1966. Solution of Equations and Systems of Equations. New York, N.Y.: Academic Press.

Polak, E. 1969. “On the Convergence of Optimization Algorithms.” Revue Francaise d’Automatique, d’Informatique Et De Recherche Opérationelle 3: 17–34.

Zangwill, W. I. 1969. Nonlinear Programming: a Unified Approach. Englewood-Cliffs, N.J.: Prentice-Hall.