The largest computational savings come from the use of FFTs for AMO,
instead of slow Fourier integration necessary in the absence of
log-stretch. Standard means of minimizing the CPU time and the amount of memory used
to compute the AMO have also been employed. They include computing the AMO
shift for only half of the elements of the cube in the complex domain,
since the Fourier transform of a real function is Hermitian:
(13)
(where denotes the frequency domain variable and the star symbol
denotes the complex conjugate). Another way of reducing computational
expenses was through the use of RFFTW and FFTW type Fourier
Transforms (Frigo and Johnson, 1998), adaptive to hardware architecture, and
taking advantage of
the property stated in (13). Also, the code was
divided into subroutines in such a way that some quantities
were not computed unnecessarily several times when AMO was applied to
more than one cube of data. Finally, shared memory parallelization
with the OpenMP standard was applied to all the computationally
intensive do loops in the code.
Effective AMO implementation in the log-stretch,
frequency-wavenumber domain