next up previous
Next: Understanding stationary points Up: The spectral theorem and Previous: Introduction

The spectral theorem for symmetric matrices

Symmetric matrices have many special properties, the most important of which are expressed in the following theorem:

Theorem 2.1   Suppose $A\in{\bf {\rm R}}^{n\times n}$ is symmetric. Then
every eigenvalue $\lambda$ of A is a real number and there exists a (real) eigenvector $u\in{\bf {\rm R}}^n$ corresponding to $\lambda$: $Au=\lambda u$;
eigenvectors corresponding to distinct eigenvalues are necessarily orthogonal:

\begin{displaymath}Au^{(1)}=\lambda_1u^{(1)},\ Au^{(2)}=\lambda_2u^{(2)},\ \lambda_1\neq \lambda_2\ \Rightarrow\
u^{(1)}\cdot u^{(2)}=0.

there exists a diagonal matrix $D\in{\bf {\rm R}}^{n\times n}$ and an orthogonal matrix $U\in{\bf {\rm R}}^{n\times n}$ such that A=UDUT. The diagonal entries of D are the eigenvalues of A and the columns of U are the corresponding eigenvectors:

\begin{displaymath}D=\mbox{diag}{(\lambda_1,\lambda_2,\ldots,\lambda_n)},\ U=[u^...
...s\vert u^{(n)}],\
Au^{(i)}=\lambda_iu^{(i)},\ i=1,2,\ldots,n.

An orthogonal matrix U satisfies, by definition, UT=U-1, which means that the columns of U are orthonormal (that is, any two of them are orthogonal and each has norm one). The expression A=UDUT of a symmetric matrix in terms of its eigenvalues and eigenvectors is referred to as the spectral decomposition of A.

The spectral theorem implies that there is a change of variables which transforms A into a diagonal matrix. Before explaining this change of variables, I will show why it is important. The reader will recall that every quadratic function in the n variables $x_1,x_2,\ldots,x_n$ can be expressed in the form

\begin{displaymath}q(x)=x\cdot Hx=\sum_{i=1}^n\sum_{j=1}^nH_{ij}x_ix_j.

The formula for q(x) involves n2 terms, and the variables are typically coupled. However, if H happens to be a diagonal matrix, then the formula for q(x) simplifies considerably:


Such a quadratic is easy to understand: In each coordinate direction xi, the graph is a parabola, opening upward if Hii>0 and opening downward if Hii<0. There is also the degenerate case Hii=0, in which case q is constant with respect to xi and the graph in that direction is a horizontal line.

Therefore, in two variables (the only case that can be visualized), a quadratic function defined by $H=\mbox{diag}{(\lambda_1,\lambda_2)}$ has six possible shapes, corresponding to the following cases:

$\lambda_1>0,\lambda_2<0$ or $\lambda_1<0,\lambda_2>0$;
$\lambda_1>0,\lambda_2=0$ or $\lambda_1=0,\lambda_2>0$;
$\lambda_1<0,\lambda_2=0$ or $\lambda_1=0,\lambda_2<0$;
Four of the possibilities are graphed in Figure 1.
Figure 1: The graphs of four quadratic functions: two positive eigenvalues (upper left), two negative eigenvalues (upper right), one positive and one negative eigenvalue (lower left), one positive and one zero eigenvalue (lower right).

Now I will explain the change of variables that diagonalizes a symmetric matrix. A vector

\begin{displaymath}x=\left[\begin{array}{c}x_1\\ x_2\\ \vdots\\ x_n\end{array}\right]

is implicitly expressed in terms of the standard basis $e^{(1)},e^{(2)},\ldots,e^{(n)}$:



\begin{displaymath}e^{(1)}=\left[\begin{array}{c}1\\ 0\\ 0\\ \vdots\\ 0\end{arra...{array}{c}0\\ 1\\ 0\\ \vdots\\ 0\end{array}\right],\ \ldots.

If $\{u^{(1)},u^{(2)},\ldots,u^{(n)}\}$ is an orthonormal set, then it is an alternate basis: Every $x\in{\bf {\rm R}}^n$ can be expressed as


Moreover, the coefficients $\alpha_1,\alpha_2,\ldots,\alpha_n$ are easy to compute:

\begin{displaymath}\alpha_i=u^{(i)}\cdot x,\ i=1,2,\ldots,n.

When the orthonormal basis forms a matrix $U=[u^{(1)}\vert u^{(2)}\vert\ldots\vert u^{(n)}]$, then the computation of the coefficients $\alpha_1,\alpha_2,\ldots,\alpha_n$takes for the form of a matrix-vector product:

\begin{displaymath}\left[\begin{array}{c}\alpha_1\\ \alpha_2\\ \vdots\\ \alpha_n...
...(2)}\cdot x\\
\vdots\\ u^{(n)}\cdot x\end{array}\right]=U^Tx.

The key point here is that the numbers $\alpha_1,\alpha_2,\ldots,\alpha_n$ can be thought of as new variables representing the vector x. Specifically, $x_1,x_2,\ldots,x_n$ represent x in the standard basis $\{e^{(1)},e^{(2)},\ldots,e^{(n)}\}$, while $\alpha_1,\alpha_2,\ldots,\alpha_n$represent x in the alternate basis $\{u^{(1)},u^{(2)},\ldots,u^{(n)}\}$.

I now digress to remind the reader of the following fundamental property of matrices, vectors, and the dot product: If $A\in{\bf {\rm R}}^{m\times n}$, then

\begin{displaymath}y\cdot Ax=(A^Ty)\cdot x\ \ \mbox{for all}\x\in{\bf {\rm R}}^n,y\in{\bf {\rm R}}^m.

This is really the reason that the transpose of a matrix is important.

Assuming $H\in{\bf {\rm R}}^{n\times n}$ is symmetric, it has a spectral decomposition H=UDUT. Therefore,

\begin{displaymath}x\cdot Hx=x\cdot UDU^Tx=(U^Tx)\cdot D(U^Tx)=\sum_{i=1}^n\lambda_i\alpha_i^2,

where I have applied the change of variables $\alpha=U^Tx$. Therefore, the quadratic $q(x)=x\cdot Hx$ is a simple decoupled quadratic when expressed in terms of the alternate basis $\{u^{(1)},u^{(2)},\ldots,u^{(n)}\}$. Since every symmetric matrix has a spectral decomposition, this means that every quadratic function $q(x)=x\cdot Hx$ can be expressed as a simple decoupled quadratic, provided the correct coordinate system is chosen. In particular, this shows that the graph of every quadratic in two variables looks like one of the graphs in Figure 1 (or like one of the two other possibilities not illustrated in that figure), possibly rotated from the standard coordinates.

Example 2.2   Define $q:{\bf {\rm R}}^2\rightarrow{\bf {\rm R}}$ by


Then $q(x)=x\cdot Hx$, where

\begin{displaymath}H=\left[\begin{array}{cc}1&3\\ 3&1\end{array}\right].

The spectral decomposition of H is H=UDUT, where

\begin{displaymath}D=\left[\begin{array}{cc}4&0\\ 0&-2\end{array}\right],\

The vectors

\begin{displaymath}u^{(1)}=\left[\begin{array}{r}\frac{1}{\sqrt{2}}\\ \frac{1}{\...
...{r}\frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}}

define the coordinate system illustrated in Figure 2.
Figure 2: Standard coordinates and a rotated coordinate system.

The graph of q, which is shown in Figure 3, is now predictable: It curves up in the direction of u(1) and down in the direction of u(2).
Figure 3: The function q(x)=x12+6x1x2+x22.

next up previous
Next: Understanding stationary points Up: The spectral theorem and Previous: Introduction
Mark S. Gockenbach