next up previous [pdf]

Next: Selective inversion Up: Parallel sweeping preconditioner Previous: Parallel sweeping preconditioner


Parallel multifrontal algorithms

While a large number of techniques exist for parallelizing multifrontal factorizations and triangular solves, we focus on parallelizations which combine subtree-to-subteam (20) mappings of processes to the elimination tree (34) that also make use of two-dimensional distributions of the frontal matrices (35).% latex2html id marker 3529
\setcounter{footnote}{9}\fnsymbol{footnote}More specifically, we make use of supernodal (3) elimination trees defined through nested dissection (see Figs. 3 and 4), which have been shown to result in highly scalable factorizations (23,24) and moderately scalable triangular solutions (26).

sep-tree
sep-tree
Figure 3.
A separator-based supernodal elimination tree (right) over a quasi-2D subdomain (left).
[pdf] [png] [tikz]

subteam
subteam
Figure 4.
Overlay of the process ranks (in binary) of the owning subteams of each supernode from the elimination tree in Fig. 3 when the tree is assigned to eight processes using a subtree-to-subteam mapping; a `*' is used to denote both 0 and 1, so that `$ 00*$ ' represents processes 0 and 1, `$ 01*$ ' represents processes 2 and 3, and `$ ***$ ' represents all eight processes.
[pdf] [png] [tikz]

Roughly speaking, the analysis in (26) shows that, if $ p_F$ processes are used in the multifrontal factorization of our quasi-2D subdomain problems, then we must have $ \gamma n=\Omega(p_F^{1/2})$ in order to maintain constant efficiency as $ p_F$ is increased; similarly, if $ p_S$ processes are used in the multifrontal triangular solves for a subdomain, then we must have $ \gamma n\approx \Omega(p_S)$ (where we use $ \approx$ to denote that the equality holds within logarithmic factors). Since we can simultaneously factor the $ m=O(n/\gamma)$ subdomain matrices, we denote the total number of processes as $ p$ and set $ p_S=p$ and $ p_F=O(p/m)$ ; then the subdomain factorizations only require that $ n^3=\Omega(p/\gamma)$ , while the subdomain solves have the much stronger constraint that $ n \approx \Omega(p/\gamma)$ . This last constraint should be considered unacceptable, as we have the conflicting requirement that $ n^3 \approx O(p/\gamma)$ in order to store the factorizations in memory. It is therefore advantageous to consider more scalable alternatives to standard multifrontal triangular solves, even if they require additional computation.


next up previous [pdf]

Next: Selective inversion Up: Parallel sweeping preconditioner Previous: Parallel sweeping preconditioner

2014-08-20