$Q$-RTM on Marmousi model

The second synthetic example presented here is CUDA-based $Q$-RTM for the Marmousi model. Figures 5a and 5b show its velocity and $Q$ models, which contains a high-attenuation zone with a low $Q$ value. The model has 234 nodes with a sampling interval of $dz=10$ m in depth and 663 nodes with a sampling interval of $dx=10$ m in the horizontal direction. In the observation system, 60 sources are distributed laterally with a shot interval $ds=100$ m, and each shot has 301 double-sided receivers with a maximum offset of 1500 m. The point source is a Ricker wavelet with a dominant frequency $f_d=20$ Hz. The synthetic seismic data are modeled by the PSM with time interval $dt=0.001$ s, and the records last 2 s.

Figure 6 shows the migrated image using conventional RTM from acoustic data (Figure 6a) and viscoacoustic data without compensation (Figure 6b), and $Q$-RTM from viscoacoustic data (Figures 6c and 6d), respectively. The acoustic imaging result shown in Figure 6a serves as a reference for comparison. Due to the presence of a high-attenuation zone, the imaging result of the structure beneath high-attenuation zone shown in the blue frame in Figure 6b exhibits attenuated amplitudes and blurred structures. The attenuation also severely affects the migrated image of the anticlinal structure, shown in the green frame in Figure 6b below the unconformity. Figures 6c and 6d show compensated images from $Q$-RTM using conventional low-pass filtering and the proposed adaptive stabilization scheme. The compensated images exhibit a clear anticlinal structure and recovered amplitudes compared with the non-compensated image. For another comparison, Figure 7 shows migrated seismic traces, which are selected arbitrarily at three distances of 1500 m, 3600 m and 5200 m from the imaging results shown in Figure 6. From these traces, one find that the compensated traces match well with the reference traces. It indicates that the developed cu$Q$-RTM package is capable of improving imaging quality.

The strong scaling plot shows how the execution time decreases with an increasing number of computing resources. During large-scale imaging, the proportion of computational time spent to simulate wave propagation mandates that the solver must be efficient and scale well. In this regard, 60 shots of $Q$-RTM are evenly distributed among every GPU card with the number of GPUs (Tesla K10) varying from one to six. We record scheduling runtime and computational runtime during every test and present them in Table 3. Figure 8 shows the results of a strong scaling test of cu$Q$-RTM on the Marmousi model. It demonstrates that very close to ideal efficiency can be achieved with a balanced load on each GPU. Thus, the code package exhibits excellent scalability and can be run with almost ideal code performance, in part because communications are almost entirely overlapped with calculations.

vp qp
Figure 5.
(a) Velocity and (b) $Q$ of the Marmousi model.
[pdf] [pdf] [png] [png] [scons]

img0 img1 img2 img3
Figure 6.
Migrated images of the Marmousi model using (a) conventional RTM from acoustic data, (b) conventional RTM from viscoacoustic media without compensation, (c) $Q$-RTM using low-pass filtering and (d) $Q$-RTM using adaptive stabilization scheme.
[pdf] [pdf] [pdf] [pdf] [png] [png] [png] [png] [scons]

Fig7a_v Fig7b_v Fig7c_v
Figure 7.
Migrated seismic traces selected at three distances of (a) 1500 m, (b) 3600 m and (c) 5200 m from migration results shown in Figure 6.
[pdf] [pdf] [pdf] [png] [png] [png] [scons]

Table 3: runtime of cu$Q$-RTM using multiple Tesla K10 GPUs and the corresponding speedup ratio against the number of GPUs. The model has 234 nodes with in depth and 663 nodes in the horizontal direction.
The number of GPUs 1 2 3 4 5 6
Manipulational Runtime (s) 7.62 10.07 10.52 11.02 11.41 11.92
Computational Runtime (s) 2639.29 1329.40 889.82 672.78 539.21 449.97
Total Runtime (s) 2646.91 1339.50 900.34 683.80 550.62 461.89
Speedup Ratio 1.0000 1.9761 2.9399 3.8709 4.8071 5.7306

Figure 8.
Strong scaling for cu$Q$-RTM on the Marmousi model using multiple Tesla K10 GPUs. Speedup ratios are plotted against the number of GPUs. The model has 234 nodes with in depth and 663 nodes in the horizontal direction.
[pdf] [png] [scons]