Kernels, the most basic unit of cu$Q$-RTM to accomplish a series of specific tasks such as variable initialization and applying an absorbing boundary condition (ABC), can further be integrated into a fully functional module. Different variables are initialized with distinct kernels cuda_kernel_initialization($\ldots$), cuda_kernel_initialization_images($\ldots$), and cuda_kernel_initialization_Finals($\ldots$) based on their scope in modules. Wavefield variables are updated by cuda_kernel_update($\ldots$).

In the framework of cu$Q$-RTM, the basic forward and backward modules based on viscoacoustic wave equation with DFLs are efficiently simulated by PSM coupled with the CUFFT library. However, CUFFT library functions can only be executed on the device and called from the host, so we split the customized kernel functions into a k-space component and x-space component. The Fourier transform function cufftExecC2C($\ldots$, CUFFT_FORWARD) and inverse Fourier transform function cufftExecC2C($\ldots$, CUFFT_INVERSE) serve as the link between the x-space operator cuda_kernel_visco_PSM_2d_forward_x_space($\ldots$) and the k-space operator cuda_kernel_visco_PSM_2d_forward_k_space($\ldots$). Absorbing boundary conditions for these modeling operators are conducted by the multiple transmitting formula (MTF) cuda_kernel_MTF_2nd($\ldots$) (Liao et al., 1984).

In the cu$Q$-RTM package, we adopt the CATRC scheme to reconstruct source wavefields, which combines the efficiency of reverse propagation and the stability of checkpointing. Kernel functions cuda_kernel_checkpoints_Out($\ldots$) and cuda_kernel_checkpoints_In($\ldots$) are designed to record and fetch forward wavefields at predefined checkpoints. Furthermore, wavefields on the outermost layer boundary of simulation domain at each time step are also recorded in global device variables $d\_borders\_up, d\_borders\_bottom, d\_borders\_left$, and $d\_borders\_right$. The total memory storage for 2D reconstruction can be estimated as

$\displaystyle Storage_{2D} [GB] \approx \frac{2(nx+nz) nt + (N_c+2) (nx \times nz)}{1024^3/4},$ (11)

and for the 3D case,

$\displaystyle Storage_{3D} [GB] \approx \frac{2(nx \times ny + nx \times nz + ny \times nz) nt + (N_c+2) (nx \times ny \times nz)}{1024^3/4},$ (12)

where $nx, ny, nz$ and $nt$ are spatial and temporal grid sizes. Figure 2a shows memory storage against the scale of the model (where we denote size of the simulation as $nx = ny = nz = 0.1 nt = 10 Nc$) for 2D and 3D cases. Figure 2b presents the memory ratio between boundary savings and checkpointing savings for 2D and 3D cases. Such a large amount of memory storage is unacceptable for the 3D case, so we have to output the boundary savings to the disk and then read the borders by memory copying between host and device memory.

Fig2a_v Fig2b_v
Figure 2.
(a) Memory storage and (b) memory ratio of boundary wavefield to checkpointing wavefield (B/C) for both 2D and 3D cases.
[pdf] [pdf] [png] [png] [scons]

I develop an adaptive stabilization method to deal with numerical instability in $Q$-RTM, which exhibits superior properties of time-variance and $Q$-dependence over conventional low-pass filtering. Both the adaptive stabilization scheme cuda_kernel_AdaSta($\ldots$) and low-pass filtering scheme cuda_kernel_filter2d($\ldots$) are provided in this package. Users can choose either of these two stabilizing methods to suppress high-frequency noises as they like.