next up previous [pdf]

Next: Numerical examples Up: Theory Previous: Basic formulation

Fast 3D butterfly algorithm

Equation 10 is the discretized form of a 3D oscillatory integral of the type

$\displaystyle u(\mathbf{x})=\int_K e^{ 2\pi i \Phi(\mathbf{x},\mathbf{k})}v(\mathbf{k})  d\mathbf{k},\quad \mathbf{x}\in X,$ (11)

whose fast evaluation can be realized by a butterfly algorithm (Candès et al., 2009).

The overall structure of the 3D butterfly algorithm basically follows its 2D analogue. The idea is to partition the computational domains $ X$ and $ K$ recursively into a pair of octrees, $ T_X$ and $ T_K$ , ending at level $ l=\log N$ (see Figure 1 for an illustration). Here $ N$ is chosen as an integer power of two, which is on the order of the maximum of $ \vert\Phi(\mathbf{x},\mathbf{k})\vert$ for all possible $ \mathbf{x}$ and $ \mathbf{k}$ (so it is mainly determined by the range of variables $ (f,x,y)$ and $ (\tau,p,q)$ ). A crucial property of this structure is that at arbitrary level $ l$ , the side lengths $ w(A)$ of a box $ A$ in $ T_X$ and $ w(B)$ of a box $ B$ in $ T_K$ always satisfy $ w(A)w(B)=1/N$ . Then when $ \mathbf{x}$ , $ \mathbf{k}$ restricted in $ A$ and $ B$ respectively, one can construct a low-rank, separated expansion for the kernel function $ e^{2\pi i\Phi(\mathbf{x},\mathbf{k})}$ (via a Chebyshev interpolation):

$\displaystyle \left \vert e^{2\pi i \Phi(\mathbf{x},\mathbf{k})} -\sum_{r=1}^{r_{\epsilon}}\alpha_r^{AB}(\mathbf{x})\beta_r^{AB}(\mathbf{k})\right\vert<\epsilon.$ (12)

By embedding this approximation in the above tree structure and traversing $ T_X$ from top to bottom, $ T_K$ from bottom to top, we arrive at a fast algorithm running in complexity $ O(N^3\log N)$ (there are $ N^3$ pairs of boxes $ (A,B)$ on every level, and there are $ \log N$ levels in total). Detailed description of the algorithm can be found in Hu et al. (2013), where the difference between 2D and 3D formulations should be clear from the context.

Figure 1.
Butterfly tree structure for the special case of $ N=4$ .
[pdf] [png]

Considering the initial Fourier transform for preparing data in $ (f,x,y)$ domain, the overall complexity of our algorithm is roughly $ O(N_xN_yN_t\log N_t)+ O(C_1(r_{\epsilon})(N_fN_xN_y+N_{\tau}N_pN_q))+O(C_2(r_{\epsilon})N^3\log N)$ ( $ r_{\epsilon}$ terms are due to low-rank approximations, and $ C_2(r_{\epsilon})$ is bigger than $ C_1(r_{\epsilon})$ ).

By comparison, the conventional straightforward velocity scan requires at least $ O(N_{\tau}N_pN_qN_xN_y)$ computations, which may quickly become a bottleneck as the problem size increases. Yet the efficiency of our algorithm is controlled mainly by $ O(N^3\log N)$ with an $ \epsilon$ -dependent constant, where $ N$ is determined by the degree of oscillations in the kernel $ e^{2\pi i\Phi(\mathbf{x},\mathbf{k})}$ . Generally speaking, $ N$ depends on the maximum frequency and offset in the dataset, and the range of parameters in the model space. In practice, $ N$ can often be chosen smaller than the grid size.

The significance of above analysis for the fast algorithm lies in the fact that the input and output data sizes $ N_tN_xN_y$ and $ N_{\tau}N_p N_q$ have little impact on the final computational cost; a dense sampling therefore becomes affordable.

next up previous [pdf]

Next: Numerical examples Up: Theory Previous: Basic formulation