next up previous [pdf]

Next: A FORMAL DEFINITION FOR Up: Preconditioning Previous: THEORY OF UNDERDETERMINED LEAST-SQUARES

SCALING THE ADJOINT

First I remind you of a rarely used little bit of mathematical notation. Given a vector $ \bold m$ with components $ (m_1,m_2,m_3)$ , the notation $ {\bf diag }\bold m$ means

$\displaystyle {\bf diag }\bold m \quad =\quad \left[ \begin{array}{ccc} m_1 & 0 & 0  0 &m_2 & 0  0 & 0 & m_3 \end{array} \right]$ (38)

Given the usual linearized fitting goal between data space and model space, $ \bold d \approx \bold F \bold m$ , the simplest image of the model space results from application of the adjoint operator $ \hat{\bold m} = \bold F' \bold d$ . Unless $ \bold F$ has no physical units, however, the physical units of $ \hat{\bold m}$ do not match those of $ \bold m$ , so we need a scaling factor. The theoretical solution $ \bold m_{\rm theor} = (\bold F'\bold F)^{-1}\bold F'\bold d$ tells us that the scaling units should be those of $ (\bold F'\bold F)^{-1}$ . We are going to approximate $ (\bold F'\bold F)^{-1}$ by a diagonal matrix $ \bold W^2$ with the correct units so $ \hat{\bold m} = \bold W^2 \bold F'\bold d$ .

What we use for $ \bold W$ will be a guess, simply a guess. If it works better than nothing, we'll be happy, and if it doesn't we'll forget about it. Experience shows it is a good idea to try. Common sense tells us to insist that all elements of $ \bold W^2$ are positive. $ \bold W^2$ is a square matrix of size of model space. From any vector $ \tilde {\bold m}$ in model space with all positive components, we could guess that $ \bold W^2$ be $ {\bf diag } \tilde {\bold m}$ to any power. To get the right physical dimensions we choose $ \tilde {\bold m}=\bold 1$ , a vector of all ones and choose

$\displaystyle \bold W^2 \quad =\quad { 1 \over {\bf diag } {\bold F'\bold F\bold 1} }$ (39)

A problem with the choice (5.39) is that some components might be zero or negative. Well, we can take the square root of the squares of components and/or smooth the result.

To go beyond the scaled adjoint we can use $ \bold W$ as a preconditioner. To use $ \bold W$ as a preconditioner we define implicitly a new set of variables $ \bold p$ by the substitution $ \bold m=\bold W\bold p$ . Then $ \bold d \approx \bold F\bold m=\bold F\bold W\bold p$ . To find $ \bold p$ instead of $ \bold m$ , we iterate with the operator $ \bold F\bold W$ instead of with $ \bold F$ . As usual, the first step of the iteration is to use the adjoint of $ \bold d\approx \bold F\bold W\bold p$ to form the image $ \hat{\bold p}=(\bold F\bold W)'\bold d$ . At the end of the iterations, we convert from $ \bold p$ back to $ \bold m$ with $ \bold m=\bold W\bold p$ . The result after the first iteration $ \hat{\bold m}=\bold W\hat{\bold p}=\bold W(\bold F\bold W)'\bold d=\bold W^2\bold F'\bold d$ turns out to be the same as scaling.

By (5.39), $ \bold W$ has physical units inverse to $ \bold F$ . Thus the transformation $ \bold F\bold W$ has no units so the $ \bold p$ variables have physical units of data space. Experimentalists might enjoy seeing the solution $ \bold p$ with its data units more than viewing the solution $ \bold m$ with its more theoretical model units.

The theoretical solution for underdetermined systems $ \bold m =\bold F' (\bold F \bold F')^{-1} \bold d$ suggests an alternate approach using instead $ \hat{\bold m} =\bold F' \bold W_d^2 \bold d$ . This diagonal weighting matrix $ \bold W_d^2$ must be drawn from vectors in data space. Again I chose a vector of all 1's getting the weight

$\displaystyle \bold W_d^2 \quad =\quad { 1 \over {\bf diag } {\bold F\bold F'\bold 1} }$ (40)

My choice of a vector of 1's is quite arbitrary. I might as well have chosen a vector of random numbers. Bill Symes, who suggested this approach to me, suggests using an observed data vector $ \bold d$ for the data space weight, and $ \bold F' \bold d$ for the model space weight. This requires an additional step, dividing out the units of the data $ \bold d$ .

Experience tells me that a broader methodology than all above is needed. Appropriate scaling is required in both data space and model space. We need two other weights $ \bold W_m$ and $ \bold W_d$ where $ \hat{\bold m} = \bold W_m \bold F'\bold W_d \bold d$ .

I have a useful practical example (stacking in $ v(z)$ media) in another of my electronic books (BEI), where I found both $ \bold W_m$ and $ \bold W_d$ by iterative guessing. First assume $ \bold W_d =\bold I$ and estimate $ \bold W_m$ as above. Then assume you have the correct $ \bold W_m$ and estimate $ \bold W_d$ as above. Iterate. (Perhaps some theorist can find a noniterative solution.) I believe this iterative procedure leads us to the best diagonal pre- and post- multipliers for any operator $ \bold F$ . By this I mean that the modified operator $ (\bold W_d \bold F \bold W_m)$ is as close to being unitary as we will be able to obtain with diagonal transformation. Unitary means it is energy conserving and that the inverse is simply the conjugate transpose.

What good is it that $ (\bold W_d \bold F \bold W_m)'
(\bold W_d \bold F \bold W_m) \approx \bold I$ ? It gives us the most rapid convergence of least squares problems of the form

$\displaystyle \bold 0 \quad\approx\quad \bold W_d ( \bold F\bold m - \bold d) \quad =\quad \bold W_d ( \bold F\bold W_m \bold p - \bold d)$ (41)

Thus it defines for us the best diagonal transform to a preconditioning variable $ \bold p=\bold W_m^{-1}\bold m$ to use during iteration, and suggests to us what residual weighting function we need to use if rapid convergence is a high priority. Suppose we are not satisfied with $ \bold W_d$ being the weighting function for residuals. Equation (5.41) could still help us speed iteration. Instead of beginning iteration with $ \bold p=\bold 0$ , we could begin from the solution $ \bold p$ to the regression (5.41).

The PhD thesis of James Rickett experiments extensively with data space and model space weighting functions in the context of seismic velocity estimation.


next up previous [pdf]

Next: A FORMAL DEFINITION FOR Up: Preconditioning Previous: THEORY OF UNDERDETERMINED LEAST-SQUARES

2008-11-06