next up previous [pdf]

Next: 3D examples Up: Weiss & Shragge: GPU-based Previous: Multiple GPU Implementation

Modeling Examples

In this section we demonstrate the utility of the GPU-based modeling approach by presenting reproducible numerical tests using the 2D/3D elastic FDTD codes for different TI models. The first set of tests involve applying the FDTD code to 2D homogeneous isotropic and VTI media. We define our test isotropic medium by P- and S-wave velocities of $ v_p$ =2.0 km/s and $ v_s$ = $ v_p/\sqrt{3}$ , and a density of $ \rho$ =2000 kg/m$ ^3$ . The VTI medium uses the same $ v_p$ , $ v_s$ and $ \rho$ , but includes three Thomsen parameters (Thomsen, 1986) of $ [\epsilon_1,\delta_1,\gamma_1] = [0.2, -0.1, 0.2]$ . Figures 5(a)-(b) present the vertical and horizontal components, $ u_z$ and $ u_x$ respectively, of the 2D isotropic impulse response tests, while Figures 5(c)-(d) show the similar components for the VTI model. Both tests generate the expected wavefield responses when compared to the CPU-only code results.

ISO-UZ ISO-UX VTI-UZ VTI-UX
ISO-UZ,ISO-UX,VTI-UZ,VTI-UX
Figure 5.
2D Impulse response tests with the ewefd2d_gpu modelling code. (a) Isotropic model $ u_z$ component. (b) Isotropic model $ u_x$ component. (c) VTI model $ u_z$ component. (d) VTI model $ u_x$ component.
[pdf] [pdf] [pdf] [pdf] [png] [png] [png] [png] [scons]

Figure 6 presents comparative GPU versus CPU metrics for a number of squared ($ N^2$ ) model domains and runs of 1000 time steps. We ran the OpenMP-enabled CPU ewefd2d code on a dedicated workstation with a dual quad-core Intel Xeon chipset, and computed the corresponding GPU benchmarks on the 480-core NVIDIA GTX 480 GPU card. Because we output receiver data at every tenth time step, the reported runtimes involve both parallel and serial sections, which hides some of the speedup advantage of the GPU parallelism. Figure 6(a) presents computational runtimes for the CPU (red line) and GPU (blue line) implementations. The reported runtime numbers are the mean value of ten repeat trials conducted for each data point. Figure 6(b) shows the 10$ \times$ speedup of the GPU implementation relative the CPU-only version.

Runtimes Speedup
Runtimes,Speedup
Figure 6.
GPU (blue line) versus CPU (red line) performance metrics showing the mean of ten trials for various square ($ N^2$ ) model domain using the ewefd2d code. (a) Computational run time. (b) Speedup.
[pdf] [pdf] [png] [png]

Our second example tests the relative accuracy of the two implementations on a heterogenous isotropic elastic model. We use the publicly available P-wave velocity and density models of the 2004 BP synthetic dataset (Billette and Brandsberg-Dahl, 2005), and assume a S-wave model defined by $ v_s=v_p/\sqrt{3}$ . We use temporal and spatial sampling intervals of $ \Delta t=0.5$  ms and $ \Delta x=\Delta y=0.005$  km and inject a 40 Hz Ricker wavelet as a stress source for each wavefield component.

Figure 7 shows a snapshot of the propagating wavefield overlying the P-wave velocity model. Figures 8(a)-(b) presents the corresponding data from the CPU and GPU implementations, respectively, while Figure 8(c) shows their difference clipped to the same scale. The $ L_2$ energy norm in the difference panel is roughly $ 2.0{\rm e}^{-5}$ of that in the CPU/GPU versions, indicating that the GPU version is accurate to within a modest amount above floating-point precision. This slight discrepancy is expected due to the differences in treatment of math operations between the GPU and CPU hardware (Whitehead and Fit-Florea, 2011); however, we assert that this will not create problems for realistic modeling applications.

wavevel
wavevel
Figure 7.
Wavefield snapshot overlying part of the P-wave model of the realistic BP velocity synthetic.
[pdf] [png] [scons]

BPdata
BPdata
Figure 8.
Data Modeled through for BP velocity synthetic model. (a) GPU implementation. (b) CPU implementation. (c) Data difference between GPU and CPU implementations clipped at the same level as (a) and (b).
[pdf] [png] [scons]



Subsections
next up previous [pdf]

Next: 3D examples Up: Weiss & Shragge: GPU-based Previous: Multiple GPU Implementation

2013-12-07