Much as the kernel layer insulates the user from the programming details of a series of specific tasks, the module layer insulates the user from the implementation details of the module of $Q$-RTM, which contains forward extrapolation, wavefield reconstruction, backward extrapolation and imaging. Each of them is made up of several kernel functions and streams. A stream is defined by creating a stream object using cudaStreamCreate($\ldots$) and specifying it as the stream parameter to a sequence of kernel launches and host-device memory copies. Streams are released by calling cudaStreamDestroy($\ldots$), which waits for all preceding commands in the given stream to complete before destroying the stream and returning control to the host thread. THe forward module cuda_visco_PSM_2d_forward($\ldots$) in cu$Q$-RTM is designed in a splitting fashion and called by the main function to conduct forward wavefield extrapolation. As wavefield reconstruction, backward extrapolation and imaging are conducted during time-reversal simulation, these three modules can be merged into one module cuda_visco_PSM_2d_backward($\ldots$). Both forward and backward modules are presented by brief codes in Appendix A.