|
|
|
|
Model fitting by least squares |
For use in linear problems, the conjugate-direction method described in this book follows an identical path with the more well-known conjugate-gradient method. We use the conjugate-direction method for convenience in exposition and programming.
The simple form of the conjugate-direction algorithm covered here is a sequence of steps. In each step the minimum is found in the plane given by two vectors: the gradient vector and the vector of the previous step.
Given the linear operator
and a generator of solution steps
(random in the case of random directions or gradient in the case of
steepest descent),
we can construct an optimally convergent iteration process,
which finds the solution in no more than
steps,
where
is the size of the problem.
This result should not be surprising.
If
is represented by a full matrix,
then the cost of direct inversion is proportional to
,
and the cost of matrix multiplication is
.
Each step of an iterative method boils down to a matrix multiplication.
Therefore, we need at least
steps to arrive at the exact solution.
Two circumstances make large-scale optimization practical.
First, for sparse convolution matrices
the cost of matrix multiplication is
instead of
.
Second, we can often find a reasonably good solution
after a limited number of iterations.
If both these conditions are met, the cost of optimization
grows linearly with
,
which is a practical rate even for very large problems.
Fourier-transformed variables are often capitalized.
This convention will be helpful here,
so in this subsection only,
we capitalize vectors transformed by the
matrix.
As everywhere, a matrix such as
is printed in boldface type
but in this subsection,
vectors are not printed in boldface print.
Thus we define the solution, the solution step
(from one iteration to the next),
and the gradient by
| (58) | |||
| (59) | |||
| (60) |
| (61) |
| (62) |
The gradient vector
is a vector with the same number
of components as the solution vector
.
A vector with this number of components is
The minimization (2.48) is now generalized
to scan not only the line with
,
but simultaneously another line with
.
The combination of the two lines is a plane:
| (68) |
| (69) |
This may look complicated.
The steepest descent method requires us to compute
only the two dot products
and
while equation (2.67) contains five dot products,
but the extra trouble is well worth while because the ``conjugate direction''
is such a much better direction than the gradient direction.
The many applications in this book all need to
find
and
with (2.70) and then
update the solution with (2.63) and
update the residual with (2.64).
Thus we package these activities in a subroutine
named cgstep.
To use that subroutine we will have a computation template
like we had for steepest descents, except that we
will have the repetitive work done by subroutine cgstep.
This template (or pseudocode) for minimizing the residual
by the conjugate-direction method is
where the subroutine cgstep() remembers the previous iteration and works out the step size and adds in the proper proportion of the![]()
iterate {![]()
![]()
![]()
}
|
|
|
|
Model fitting by least squares |