next up previous [pdf]

Next: Null space and iterative Up: KRYLOV SUBSPACE ITERATIVE METHODS Previous: Sign convention

Method of random directions and steepest descent

Let us minimize the sum of the squares of the components of the residual vector given by

$\displaystyle {\rm residual}$ $\textstyle =$ $\displaystyle {\rm
\quad \hbox{transform} \quad \quad \hbox{model space}
\quad - \quad
\hbox{data space}
}$ (42)
$\displaystyle \left[
\begin{array}{c}
\\
\\
\\
\bold r \\
\\
\\
\
\end{array}\right]$ $\textstyle =$ $\displaystyle \left[
\begin{array}{ccc}
\quad & \quad & \quad \\
\quad & \quad...
...d
\left[
\begin{array}{c}
\\
\\
\\
\bold d \\
\\
\\
\
\end{array}\right]$ (43)

A contour plot is based on an altitude function of space. The altitude is the dot product $\bold r \cdot \bold r$. By finding the lowest altitude, we are driving the residual vector $\bold r$ as close as we can to zero. If the residual vector $\bold r$ reaches zero, then we have solved the simultaneous equations $\bold d= \bold F \bold x$. In a two-dimensional world the vector $\bold x$ has two components, $(x_1 , x_2 )$. A contour is a curve of constant $\bold r \cdot \bold r$ in $(x_1 , x_2 )$-space. These contours have a statistical interpretation as contours of uncertainty in $(x_1 , x_2 )$, with measurement errors in $\bold d$.

Let us see how a random search-direction can be used to reduce the residual $0\approx \bold r= \bold F \bold x - \bold d$. Let $\Delta \bold x$ be an abstract vector with the same number of components as the solution $\bold x$, and let $\Delta \bold x$ contain arbitrary or random numbers. We add an unknown quantity $\alpha$ of vector $\Delta \bold x$ to the vector $\bold x$, and thereby create $\bold x_{\rm new}$:

\begin{displaymath}
\bold x_{\rm new} \eq \bold x+\alpha \Delta \bold x
\end{displaymath} (44)

This gives a new residual:
$\displaystyle \bold r_{\rm new}$ $\textstyle =$ $\displaystyle \bold F \bold x_{\rm new} -\bold d$ (45)
$\displaystyle \bold r_{\rm new}$ $\textstyle =$ $\displaystyle \bold F( \bold x+\alpha\Delta\bold x)-\bold d$ (46)
$\displaystyle \bold r_{\rm new} \eq
\bold r+\alpha \Delta\bold r$ $\textstyle =$ $\displaystyle (\bold F \bold x-\bold d)
+\alpha\bold F\Delta\bold x$ (47)

which defines $\Delta \bold r = \bold F \Delta \bold x$.

Next we adjust $\alpha$ to minimize the dot product: $ \bold r_{\rm new} \cdot \bold r_{\rm new} $

\begin{displaymath}
(\bold r+\alpha\Delta \bold r)\cdot (\bold r+\alpha\Delta \b...
...ta \bold r)  +\
\alpha^2 \Delta \bold r \cdot \Delta \bold r
\end{displaymath} (48)

Set to zero its derivative with respect to $\alpha$ using the chain rule
\begin{displaymath}
0\eq
(\bold r+\alpha\Delta \bold r)
\cdot
\Delta \bold r
 ...
...d r)
\eq
2
(\bold r+\alpha\Delta \bold r)
\cdot
\Delta \bold r
\end{displaymath} (49)

which says that the new residual $\bold r_{\rm new} = \bold r + \alpha \Delta \bold r$ is perpendicular to the ``fitting function'' $\Delta \bold r$. Solving gives the required value of $\alpha$.
\begin{displaymath}
\alpha \eq -  { (\bold r \cdot \Delta \bold r ) \over
( \Delta \bold r \cdot \Delta \bold r ) }
\end{displaymath} (50)

A ``computation template'' for the method of random directions is


		 $\bold r \quad\longleftarrow\quad \bold F \bold x - \bold d$ 

iterate {
$\Delta \bold x \quad\longleftarrow\quad {\rm random numbers}$
$\Delta \bold r \quad\longleftarrow\quad \bold F  \Delta \bold x$
$\alpha \quad\longleftarrow\quad -( \bold r \cdot \Delta\bold r )/ (\Delta \bold r \cdot \Delta\bold r ) $
$\bold x \quad\longleftarrow\quad \bold x + \alpha\Delta \bold x$
$\bold r \quad\longleftarrow\quad \bold r + \alpha\Delta \bold r$
}
A nice thing about the method of random directions is that you do not need to know the adjoint operator $\bold F'$.

In practice, random directions are rarely used. It is more common to use the gradient direction than a random direction. Notice that a vector of the size of $\Delta \bold x$ is

\begin{displaymath}
\bold g \eq \bold F' \bold r
\end{displaymath} (51)

Notice also that this vector can be found by taking the gradient of the size of the residuals:
\begin{displaymath}
{\partial \over \partial \bold x' }  \bold r \cdot \bold r
...
...)  
( \bold F   \bold x  - \bold d)
\eq
\bold F'  \bold r
\end{displaymath} (52)

Choosing $\Delta \bold x$ to be the gradient vector $\Delta\bold x = \bold g = \bold F' \bold r$ is called ``the method of steepest descent.''

Starting from a model $\bold x = \bold m$ (which may be zero), below is a template of pseudocode for minimizing the residual $\bold 0\approx \bold r = \bold F \bold x - \bold d$ by the steepest-descent method:


		 $\bold r \quad\longleftarrow\quad \bold F \bold x - \bold d$ 

iterate {
$\Delta \bold x \quad\longleftarrow\quad \bold F' \bold r$
$\Delta \bold r \quad\longleftarrow\quad \bold F  \Delta \bold x$
$\alpha \quad\longleftarrow\quad -( \bold r \cdot \Delta\bold r )/ (\Delta \bold r \cdot \Delta\bold r ) $
$\bold x \quad\longleftarrow\quad \bold x + \alpha\Delta \bold x$
$\bold r \quad\longleftarrow\quad \bold r + \alpha\Delta \bold r$
}


next up previous [pdf]

Next: Null space and iterative Up: KRYLOV SUBSPACE ITERATIVE METHODS Previous: Sign convention

2008-11-06