next up previous
Next: Newton's method for unconstrained Up: Newton's method for nonlinear Previous: An example of the

Proof of quadratic convergence of Newton's method

To prove Theorem 2.2 requires some background from linear algebra and multivariable calculus, which I will now review.

I need to apply the following result, which can be easily proved from the Fundamental Theorem of Calculus:

Theorem 2.3   Suppose $F:{\bf {\rm R}}^n\rightarrow{\bf {\rm R}}^m$ is continuously differentiable and $a,b\in{\bf {\rm R}}^n$. Then

 \begin{displaymath}
F(b)=F(a)+\int_0^1J(a+\theta (b-a))(b-a)\,d\theta,
\end{displaymath} (4)

where J is the Jacobian of F.

The integral of a vector-valued function, as in (4), is interpreted as the vector whose components are the integrals of the components of the integrand.3 I also need the triangle inequality for integrals:

Theorem 2.4   If $F:{\bf {\rm R}}\rightarrow{\bf {\rm R}}^n$ is integrable over the interval [a,b], then

 \begin{displaymath}
\left\Vert\int_a^bF(t)\,dt\right\Vert\le\int_a^b\Vert F(t)\Vert\,dt.
\end{displaymath} (5)

In order to estimate the errors in Newton's method, I will need to use a matrix norm. The reader should recall the following definition:

Definition 2.5   A norm $\Vert\cdot\Vert$ for a vector space X is a real-valued function defined on X satisfying the following properties:
1.
$\Vert x\Vert\ge 0$ for all $x\in X$, and $\Vert x\Vert=0$ if and only if x=0;
2.
$\Vert\alpha x\Vert=\vert\alpha\vert\Vert x\Vert$ for all $x\in X$ and all scalars $\alpha$;
3.
$\Vert x+y\Vert\le \Vert x\Vert+\Vert y\Vert$ for all $x,y\in X$ (the triangle inequality).

The space ${\bf {\rm R}}^{m\times n}$ of $m\times n$ matrices is a vector space, since such matrices can be added and multiplied by scalars in a fashion analogous to Euclidean vectors. Many norms could be defined on ${\bf {\rm R}}^{m\times n}$, but, as I will show, the following operator norm has significant advantages for analysis:

Definition 2.6   Given any $A\in{\bf {\rm R}}^{m\times n}$, the norm of A is defined by

 \begin{displaymath}
\Vert A\Vert=\max\left\{\frac{\Vert Ax\Vert}{\Vert x\Vert}\,:\,x\in{\bf {\rm R}}^n,x\neq 0\right\}
\end{displaymath} (6)

The vector norms used on the right-hand side of (6) are the Euclidean norms on ${\bf {\rm R}}^m$ and ${\bf {\rm R}}^n$, and the matrix norm is called the operator norm induced by the Euclidean norm.

Theorem 2.7   The norm defined by (6) has the following properties:
1.
It is a norm on the space ${\bf {\rm R}}^{m\times n}$;
2.
$\Vert Ax\Vert\le\Vert A\Vert\Vert x\Vert$ for all $A\in{\bf {\rm R}}^{m\times n},x\in{\bf {\rm R}}^n$;
3.
$\Vert AB\Vert\le\Vert A\Vert\Vert B\Vert$ for all $A\in{\bf {\rm R}}^{m\times n},B\in{\bf {\rm R}}^{n\times p}$.

The second and third properties of the operator norm are key in analyzing errors, particularly in producing upper bounds.

The next fact I need involves both linear algebra and analysis.

Theorem 2.8   Suppose $J:{\bf {\rm R}}^m\rightarrow{\bf {\rm R}}^{n\times n}$ is a continuous matrix-valued function. If J(x*) is nonsingular, then there exists $\delta>0$ such that, for all $x\in{\bf {\rm R}}^m$ with $\Vert x-x^*\Vert<\delta$, J(x) is nonsingular and

\begin{displaymath}\left\Vert J(x)^{-1}\right\Vert<2\left\Vert J(x^*)^{-1}\right\Vert.
\end{displaymath}

This theorem implies that the set of nonsingular matrices is an open set. The second part of the theorem follows from the fact that, if $x\mapsto J(x)$is continuous, then so is $x\mapsto J(x)^{-1}$ wherever this second map is defined.

Finally, I need to define Lipschitz cotinuity.

Definition 2.9   Suppose $F:{\bf {\rm R}}^n\rightarrow{\bf {\rm R}}^m$. Then F is said to be Lipschitz continuous on $S\subset{\bf {\rm R}}^n$ if there exists a positive constant L such that

\begin{displaymath}\Vert F(x)-F(y)\Vert\le L\Vert x-y\Vert\ \mbox{for all}\x,y\in S.
\end{displaymath}

The same definition can be applied to a matrix-valued function $J:{\bf {\rm R}}^n\rightarrow{\bf {\rm R}}^{m\times n}$ (like the Jacobian), using a matrix norm to measure the size of J(x)-J(y). The meaning of Lipschitz continuity is clear: The difference F(x)-F(y) is, roughly speaking, proportional in size to x-y.

I can now prove Theorem 2.2. I begin with the definition of the Newton iteration,

x(k+1)=x(k)-J(x(k))-1F(x(k)),

assuming that x(k) is close enough to x* that J(x(k)) is nonsingular. I then subtract x* from both sides to obtain

x(k+1)-x*=x(k)-x*-J(x(k))-1F(x(k)).

Since, by assumption, F(x*)=0, I can write this as

x(k+1)-x*=x(k)-x*-J(x(k))-1(F(x(k))-F(x*)).

I now use (4) to estimate F(x(k))-F(x*):

\begin{eqnarray*}F(x^{(k)})-F(x^*)&=&\int_0^1J(x^*+\theta(x^{(k)}-x^*))(x^{(k)}-...
...J(x^*+\theta(x^{(k)}-x^*))-
J(x^*)\right)(x^{(k)}-x^*)\,d\theta.
\end{eqnarray*}


Therefore,

\begin{eqnarray*}\left\Vert F(x^{(k)})-F(x^*)-J(x^*)(x^{(k)}-x^*)\right\Vert&=&
...
...}-x^*\Vert^2\,d\theta\\
&=&\frac{L}{2}\Vert x^{(k)}-x^*\Vert^2.
\end{eqnarray*}


(The reader should notice that, without the Lipschitz continuity of J, I can conclude that $F(x^{(k)})-F(x^*)-J(x^*)(x^{(k)}-x^*)=o(\Vert x^{(k)}-x^*\Vert)$, but I need the Lipschitz continuity and the above argument to get the stronger estimate $F(x^{(k)})-F(x^*)-J(x^*)(x^{(k)}-x^*)=O(\Vert x^{(k)}-x^*\Vert^2)$).

I now have

\begin{eqnarray*}x^{(k+1)}-x^*&=&x^{(k)}-x^*-J(x^{(k)})^{-1}(F(x^{(k)})-F(x^*))\...
...x^{(k)})^{-1}\left(F(x^{(k)})-F(x^*)-J(x^*)(x^{(k)}-x^*)\right),
\end{eqnarray*}


and so

\begin{eqnarray*}\Vert x^{(k+1)}-x^*\Vert&\le&\left\Vert\left(I-J(x^{(k)})^{-1}J...
...2}\left\Vert J(x^{(k)})^{-1}\right\Vert\Vert x^{(k)}-x^*\Vert^2.
\end{eqnarray*}


I use the Lipschitz continuity of J again, this time to estimate the size of I-J(x(k))-1J(x*):

\begin{eqnarray*}\left\Vert I-J(x^{(k)})^{-1}J(x^*)\right\Vert&=&\left\Vert J(x^...
...le&L\left\Vert J(x^{(k)})^{-1}\right\Vert\Vert x^{(k)}-x^*\Vert.
\end{eqnarray*}


I have now obtained

\begin{displaymath}\Vert x^{(k+1)}-x^*\Vert\le \frac{3L}{2}\left\Vert J(x^{(k)})^{-1}\right\Vert\Vert x^{(k)}-x^*\Vert^2.
\end{displaymath}

The final step is to recognize that, for all x(k) sufficiently close to x*,

 \begin{displaymath}
\left\Vert J(x^{(k)})^{-1}\right\Vert\le 2M,
\end{displaymath} (7)

where $M=\left\Vert J(x^*)^{-1}\right\Vert$. Then, for x(k) sufficiently close to x*,

 \begin{displaymath}
\Vert x^{(k+1)}-x^*\Vert\le 3LM\Vert x^{(k)}-x^*\Vert^2.
\end{displaymath} (8)

If

 \begin{displaymath}
\Vert x^{(k)}-x^*\Vert<\frac{1}{6LM},
\end{displaymath} (9)

then

 \begin{displaymath}
\Vert x^{(k+1)}-x^*\Vert<\frac{1}{2}\Vert x^{(k)}-x^*\Vert.
\end{displaymath} (10)

I have now proved Theorem 2.2: If x(0) is chosen close enough to x* that (7) and (9) both hold, then (10) shows that $x^{(k)}\rightarrow x^*$ and (8) shows that the convergence is quadratic.


next up previous
Next: Newton's method for unconstrained Up: Newton's method for nonlinear Previous: An example of the
Mark S. Gockenbach
2003-01-23