next up previous [pdf]

Next: Code organization Up: GPU implementation using CPML Previous: CPML boundary condition

Memory manipulation

Consider the Marmousi model (size=751x2301) and the Sigsbee model (size=1201x3201). Assume $ nt=10000$ and the finite difference of order $ 2N=8$ . Conventionally, one have to store 64.4 GB for Marmousi model and 143.2 GB for Sigsbee model on the disk of the computer. Using the method of Dussaud et al. (2008) or regular grid based effective boundary saving, the storage requirement will be greatly reduced, about 0.9 GB and 1.3 GB for the two models. Staggered grid finite difference is preferable due to higher accuracy, however, the saving amount of effective boundary needs 1.6 GB and 2.3 GB for the two models, much larger than regular grid. Besides the additional variable allocation, the storage requirement may still be a bottleneck to save all boundaries on GPU to avert the CPU saving and data exchange for low-level hardware, even if we are using effective boundary saving.

Fortunately, page-locked (also known as pinned) host memory provides us a practical solution to mitigate this conflict. Zero-copy system memory has identical coherence and consistency to global memory. Copies between page-locked host memory and device memory can be performed concurrently with kernel execution (Nvidia, 2011). % latex2html id marker 2359
\setcounter{footnote}{3}\fnsymbol{footnote} Therefore, we store a certain percentage of effective boundary on the page-locked host memory, and the rest on device. A reminder is that overuse of the pinned memory may degrade the bandwidth performance.


next up previous [pdf]

Next: Code organization Up: GPU implementation using CPML Previous: CPML boundary condition

2021-08-31