A parallel sweeping preconditioner for heterogeneous 3D Helmholtz equations |

One interpretation of radiation conditions is that they allow for the analysis of a finite portion of an infinite domain, as their purpose is to enforce the condition that waves propagate outwards and not reflect at the boundary of the truncated domain. This concept is crucial to understanding the Schur complement approximations that take place within the moving PML sweeping preconditioner which is reintroduced in this paper for the sake of completeness.

For the sake of simplicity, we will describe the preconditioner in the context of a second-order finite-difference discretization over the unit cube, with PML used to approximate a radiation boundary condition on the face and homogeneous Dirichlet boundary conditions applied on all other boundaries (an cross-section is shown in Fig. 1). If the domain is discretized into an grid, then ordering the vertices in the grid such that vertex is assigned index results in a block tridiagonal system of equations, say

(2) |

where propagates sources from the 'th plane into the 'th plane, and the overall linear system is complex symmetric (

plane-with-single-pml
An
cross section of a cube with PML on its
face.
The domain is shaded in a manner which loosely corresponds to its
extension into the complex plane.
Figure 1. |
---|

If we were to ignore the sparsity within each block, then the following naïve factorization and solve algorithms would be appropriate for a direct solver, where the application of implicitly makes use of the factorization of .

While the computational complexities of
Algs. 0.0.1 and 0.0.2 are significantly higher
than multifrontal algorithms
(21,27,12),
which have
factorization complexity and
solve complexity for regular three-dimensional meshes, they are the conceptual
starting points of the sweeping preconditioner.^{}

Suppose that we paused Alg. 0.0.1 after computing the 'th Schur complement, , where the 'th plane is in the interior of the domain (i.e., it is not in a PML region). Due to the ordering imposed on the degrees of freedom of the discretization, the first Schur complements are equivalent to those that would have been produced from applying Alg. 0.0.1 to an auxiliary problem formed by placing a homogeneous Dirichlet boundary condition on the 'th plane and ignoring all of the successive planes (see the left half of Fig. 2). While this observation does not immediately yield any computational savings, it does allow for a qualitative description of the inverse of the 'th Schur complement, : it is the restriction of the half-space Green's function of the exact auxiliary problem onto the 'th plane (recall that PML can be interpreted as approximating the effect of a domain extending to infinity).

The main approximation made in the sweeping preconditioner can now be
succinctly described: since
is a restricted half-space Green's
function which incorporates the velocity field of the first
planes, it is
natural to approximate it with another restricted half-space
Green's function which only takes into account the *local* velocity field,
i.e., by moving the PML next to the
'th plane (see the right half of
Fig. 2).

auxiliary
(Left) A depiction of the portion of the domain involved in the
computation of the Schur complement of an
plane (marked with
the dashed line) with respect to all of the planes to its left during
execution of Alg. 0.0.1. (Middle) An equivalent
auxiliary problem which generates the same Schur complement; the
original domain is truncated just to the right of the marked plane and
a homogeneous Dirichlet boundary condition is placed on the cut.
(Right) A local auxiliary problem for generating an approximation to
the relevant Schur complement; the radiation boundary condition of the
exact auxiliary problem is moved next to the marked plane.
Figure 2. |
---|

If we use
to denote the number of grid points of PML as a
function of the frequency,
, then it is
important to recognize that the depth of the approximate auxiliary problems
in the
direction is
.^{}If the depth of the approximate auxiliary problems was
, then
combining nested dissection with the multifrontal method over the regular
mesh would only require
work and
storage (21).
However, if the PML size is a slowly growing function of frequency, then
applying 2D nested dissection to the *quasi-2D* domain requires
work and
storage,
as the number of degrees of freedom in each front increases by a factor of
and dense factorizations have cubic complexity.

Let us denote the quasi-2D discretization of the local auxiliary problem for the 'th plane as , and its corresponding approximation to the Schur complement as . Since is essentially a dense matrix, it is advantageous to come up with an abstract scheme for applying : Assuming that was ordered in a manner consistent with the global ordering, the degrees of freedom corresponding to the 'th plane come last and we find that

(3) |

where the irrelevant portions of the inverse have been marked with a . Then, trivially,

(4) |

which implies a method for quickly computing given a factorization of :

*From now on, we write
to refer to the application of the
(approximate) inverse of the Schur complement for the
'th plane.*

There are two more points to discuss before defining the full serial algorithm.
The first is that, rather than performing an approximate
factorization
using a discretization of
, the preconditioner is instead built
from a discretization of an *artificially damped* version of the Helmholtz
operator, say

where is responsible for the artificial damping. This is in contrast to shifted Laplacian preconditioners (16,5), where is typically (18), and our motivation is to avoid introducing large long-range dispersion error by damping the long range interactions in the preconditioner. Just as refers to the discretization of the original Helmholtz operator, , we will use to refer to the discretization of the artificially damped operator, .

The second point is much easier to motivate: since the artificial PML in
each approximate auxiliary problem is of depth
, processing
a single plane at a time would require processing
subdomains with
work each. Clearly, processing
planes at once
has the same asymptotic complexity as processing a single plane, and so
combining
planes reduces the setup cost from
to
.
Similarly, the memory usage becomes
instead of
.^{} If we refer to these
sets of
contiguous planes as *panels*, which we label from
0
to
, where
, and correspondingly redefine
,
,
,
, and
,
we have the following algorithm for setting up an approximate
block
factorization of the discrete artificially damped Helmholtz
operator:

Once the preconditioner is set up, it can be applied using a straightforward modification of Alg. 0.0.2 which avoids redundant solves by combining the application of and :

Given that the preconditioner can be set up with
work,
and applied with
work, it provides a near-linear scheme
for solving Helmholtz equations if only
iterations are required for
convergence. The experiments in this paper seem to indicate that, as long as
the velocity model does not include a large cavity, the sweeping preconditioner
indeed only requires
iterations.

Though this paper is focused on the parallel solution of Helmholtz equations, which are the time-harmonic form of acoustic wave equations, Tsuji et al. have shown that the moving PML sweeping preconditioner is equally effective for time-harmonic Maxwell's equations (39,38), and we believe that the same will hold true for time-harmonic linear elasticity. The rest of the paper will be presented in the context of the Helmholtz equation, but we emphasize that the parallelization should carry over to more general wave equations in a conceptually trivial way.

A parallel sweeping preconditioner for heterogeneous 3D Helmholtz equations |

2014-08-20