next up previous
Next: Proof of quadratic convergence Up: Newton's method for nonlinear Previous: Newton's method for nonlinear

An example of the convergence of Newton's method

As a concrete example, I define $F:{\bf {\rm R}}^2\rightarrow{\bf {\rm R}}^2$ by


The solutions are points of intersection of the circle x12+x22=1 and the parabola x2=x12. A simple graph shows that there are two solutions, and some simple algebra shows that the solution lying in the first quadrant is


(the other solution is the reflection of x* across the x2-axis).

Applying Newton's method with x(0)=(0.5,0.5), I obtain the results shown in Table 1. Several comments about these results are in order. First of all, the computations were carried out in IEEE double precision arithmetic, which translates to about 16 decimal digits of precision. Therefore, for example, where the results show that $\Vert x^*-x^{(5)}\Vert=0$ and $\Vert F(x^{(5)}\Vert=1.1102\cdot 10^{-16}$, the apparent discrepancy is due to round-off error. Five iterations of Newton's method were sufficient to compute the solution exactly to the given precision, but when F(x(5)) is computed in floating point arithmetic, round-off error caused the result to differ from zero by a very small amount.

Table: Results of applying Newton's method to a $2\times 2$ nonlinear system.
k $\Vert x^*-x^{(k)}\Vert$ $\Vert F(x^{(k)})\Vert$ x1(k) x2(k)
0 $3.0954\cdot 10^{-1}$ $5.5902\cdot 10^{-1}$ 0.50000000000000000 5.0000000000000000
1 $8.9121\cdot 10^{-2}$ $2.1021\cdot 10^{-1}$ 0.87500000000000000 0.62500000000000000
2 $4.5233\cdot 10^{-3}$ $1.0090\cdot 10^{-2}$ 0.79067460317460314 0.61805555555555558
3 $1.2938\cdot 10^{-5}$ $2.8769\cdot 10^{-5}$ 0.78616431593458214 0.61803398895790196
4 $1.0646\cdot 10^{-10}$ $2.3673\cdot 10^{-10}$ 0.78615137786388734 0.61803398874989490
5 0.0000 $1.1102\cdot 10^{-16}$ 0.78615137775742328 0.61803398874989490

Second, the convergence of $\Vert x^*-x^{(k)}\Vert$ to zero follows a definite pattern:

\begin{displaymath}\frac{\Vert x^*-x^{(3)}\Vert}{\Vert x^*-x^{(2)}\Vert^2}\doteq...
...Vert x^*-x^{(4)}\Vert}{\Vert x^*-x^{(3)}\Vert^2}\doteq 0.6360,

which suggests that the ratio

\begin{displaymath}\frac{\Vert x^*-x^{(k+1)}\Vert}{\Vert x^*-x^{(k)}\Vert^2}

is asymptotically constant as $k\rightarrow\infty$. It is difficult to verify this conjecture numerically, since the error so quickly falls below round-off level. For example, I would predict that

\begin{displaymath}\Vert x^*-x^{(5)}\Vert\doteq 0.63\Vert x^*-x^{(4)}\Vert^2\doteq 7.1\cdot 10^{-21},

but in fact this error is below the precision of the machine, and all I can verify is that $\Vert x^*-x^{(5)}\Vert$ is less than about 10-16. However, I will prove below that the conjecture is correct. In this regard, the following definitions are relevant.

Definition 2.1   Suppose $\{x^{(k)}\}$ is a sequence in ${\bf {\rm R}}^n$ that converges to x*.
The sequence is said to converge linearly (or q-linearly) if there exists $c\in (0,1)$ such that

\begin{displaymath}\Vert x^*-x^{(k+1)}\Vert\le c\Vert x^*-x^{(k)}\Vert\ \mbox{for all $k$\space sufficiently large}.

The sequence is said to converge superlinearly (or q-superlinearly) if

\begin{displaymath}\frac{\Vert x^*-x^{(k+1)}\Vert}{\Vert x^*-x^{(k)}\Vert}\rightarrow 0\ \mbox{as}\

The sequence is said to converge quadratically (or q-quadratically) if there exists $c\in (0,\infty)$ such that

\begin{displaymath}\Vert x^*-x^{(k+1)}\Vert\le c\Vert x^*-x^{(k)}\Vert^2\ \mbox{for all $k$\space sufficiently large}.

Cubic convergence, quartic convergence, and so on, are defined analogously (but are rarely used in analysis of optimization algorithms).

It is not difficult to show that quadratic convergence implies superlinear convergence, which in turn implies linear convergence.

Below I will prove the following theorem:

Theorem 2.2   Suppose $F:{\bf {\rm R}}^n\rightarrow{\bf {\rm R}}^n$ is continuously differentiable and F(x*)=0. If
the Jacobian J(x*) of F at x* is nonsingular, and
J is Lipschitz continuous on a neighborhood of x*,
then, for all x(0) sufficiently close to x*, Newton's method produces a sequence $x^{(1)},x^{(2)},\ldots$ that converges quadratically to x*.

Lipschitz continuity is a technical condition that is stronger than the mere continuity of J but weaker than the condition that F be twice continuously differentiable. Lipschitz continuity will be defined carefully below.

Quadratic convergence has two important consequences. First of all, it guarantees that if a point close to the solution can be found, then Newton's method will rapidly home in on the exact solution. Second, it provides a stopping test for Newton's method. Since Newton's method is an iterative algorithm, it is necessary to have some criteria for deciding whether the current approximation x(k) is sufficiently close to the solution x* that the algorithm can be halted. There is a simple stopping test for any superlinearly convergent sequence, which I will now derive.

I assume that it is desired to find x(k) such that $\Vert x^*-x^{(k)}\Vert<\epsilon$, where $\epsilon$ is a given error tolerance, and I also assume that $x^{(k)}\rightarrow x^*$ superlinearly. I will now show that

\frac{\Vert x^{(k)}-x^{(k-1)}\Vert}{\Vert x^*-x^{(k-1)}\Vert}\rightarrow 1\ \mbox{as}\
\end{displaymath} (2)

This implies that $\Vert x^{(k)}-x^{(k-1)}\Vert$ (which is a computable quantity) is a good estimate of $\Vert x^*-x^{(k-1)}\Vert$ when x(k-1) is close to x*, and so it is reasonable to stop the iteration when $\Vert x^{(k)}-x^{(k-1)}\Vert<\epsilon$ is satisfied. This is not guaranteed to produce an estimate with an error less than $\epsilon$, since it cannot be known for sure that x(k-1) is close enough to x* that

\begin{displaymath}\frac{\Vert x^{(k)}-x^{(k-1)}\Vert}{\Vert x^*-x^{(k-1)}\Vert}\doteq 1.

However, it works well because Newton's method usually does not take small steps unless it is close to the solution. Moreover, having verified that $\Vert x^{(k)}-x^{(k-1)}\Vert<\epsilon$ holds, so that $\Vert x^*-x^{(k-1)}\Vert<\epsilon$is expected to hold, the algorithm then returns x(k), which should be much closer to x* than x(k-1) is. For these reasons, the stopping test

\Vert x^{(k)}-x^{(k-1)}\Vert<\epsilon
\end{displaymath} (3)

is quite reliable.

To prove (2), I simply use the triangle inequality,1 the reverse triangle inequality,2 and the definition of superlinear convergence. First of all,

\begin{eqnarray*}\frac{\Vert x^{(k)}-x^{(k-1)}\Vert}{\Vert x^*-x^{(k-1)}\Vert}&\...
&=&1+\frac{\Vert x^{(k)}-x^*\Vert}{\Vert x^*-x^{(k-1)}\Vert}.


\begin{eqnarray*}\frac{\Vert x^{(k)}-x^{(k-1)}\Vert}{\Vert x^*-x^{(k-1)}\Vert}&\...
...{\Vert x^{(k)}-x^*\Vert}{\Vert x^*-x^{(k-1)}\Vert}-1\right\vert.


\begin{displaymath}\left\vert\frac{\Vert x^{(k)}-x^*\Vert}{\Vert x^*-x^{(k-1)}\V...
1+\frac{\Vert x^{(k)}-x^*\Vert}{\Vert x^*-x^{(k-1)}\Vert},

and so

\begin{displaymath}\frac{\Vert x^{(k)}-x^*\Vert}{\Vert x^*-x^{(k-1)}\Vert}\rightarrow 0\ \mbox{as}\

yields (2).

next up previous
Next: Proof of quadratic convergence Up: Newton's method for nonlinear Previous: Newton's method for nonlinear
Mark S. Gockenbach