next up previous [pdf]

Next: Need for an invertible Up: OPPORTUNITIES FOR SMART DIRECTIONS Previous: OPPORTUNITIES FOR SMART DIRECTIONS

The meaning of the preconditioning variable $ \bold p$

To accelerate convergence of iterative methods, we often change variables. The model-styling regression $ \bold 0 \approx \epsilon \bold A \bold m$ is changed to $ \bold 0 \approx \epsilon \bold p$ . Experience shows, however, that the variable $ \bold p$ is often more interesting to look at than the model $ \bold m$ . Why should a new variable introduced for computational convenience turn out to have more interpretive value? There is a little theory explaining why. Begin from

$\displaystyle \bold 0$ $\displaystyle \approx$ $\displaystyle \bold W (\bold F \bold m -\bold d)$ (22)
$\displaystyle \bold 0$ $\displaystyle \approx$ $\displaystyle \epsilon  \bold A \bold m$ (23)

Introduce the preconditioning variable $ \bold p$ .
$\displaystyle \bold 0$ $\displaystyle \approx$ $\displaystyle \bold W (\bold F \bold A^{-1}\bold p -\bold d)$ (24)
$\displaystyle \bold 0$ $\displaystyle \approx$ $\displaystyle \epsilon  \bold p$ (25)

Rewriting as a single regression:

$\displaystyle \bold 0 \quad\approx\quad \left[ \begin{array}{c} \bold r_d \ \b...
... - \quad \left[ \begin{array}{c} \bold W \bold d \ \bold 0 \end{array} \right]$ (26)

The gradient vanishes at the best solution. To get the gradient, we put the residual into the adjoint operator. Thus, we put the residuals (column vector) in (26) into the transpose of the operator in (26), the row $ ( (\bold W\bold F\bold A^{-1})\T, \epsilon \bold I$ ). Finally, replace the $ \approx$ by $ =$ . Thus,
$\displaystyle \bold 0$ $\displaystyle =$ $\displaystyle (\bold W\bold F\bold A^{-1})\T \bold r_d + \epsilon   \bold r_m$  
$\displaystyle \bold 0$ $\displaystyle =$ $\displaystyle (\bold W\bold F\bold A^{-1})\T \bold r_d + \epsilon^2   \bold p$ (27)

The two terms in Equation (27) are identical but oppositely signed. These terms represent images in model space. This image represents the fight between the data space residual and the model space residual. You really do want to plot this image. It shows the battle of the model wanted by the data against our preconceived statistical model expressed by our model styling goal. That is why the preconditioned variable $ \bold p$ is interesting to inspect and interpret. It is not simply a computational convenience. It is telling you what you have learned from data (that someone has recorded at great expense!).

The preconditioning variable $ \bold p$ is not simply a computational convenience. This model-space image $ \bold p$ tells us where our data contradicts our prior model. Admire it! Make a movie of it evolving with iteration.

If I were young and energetic like you, I would write a new basic tool for optimization. Instead of scanning only the space of the gradient and previous step, it would scan also over the ``smart'' direction. Using both directions should offer the benefit of preconditioning the regularization at early iterations while offering more assured fitting data at late iterations. The improved module for cgstep would need to solve a $ 3\times 3$ matrix. I would also be looking for ways to assure all $ \Delta\bold m $ directions were scaled to have the prior model spectrum and prior energy function of space.


next up previous [pdf]

Next: Need for an invertible Up: OPPORTUNITIES FOR SMART DIRECTIONS Previous: OPPORTUNITIES FOR SMART DIRECTIONS

2015-05-07