We follow the shaping regularization approach (Fomel, 2007b) to constrain the models. After several iterations of the conjugate gradient algorithm (denoted by $n$), output of which is denoted by $[\mathbf{m}_r^{n+1},\mathbf{m}_d^{n+1}]^T$, shaping operators are applied to the models as follows:

$\displaystyle \begin{bmatrix}\mathbf{m}_{r}^{k} \\ \mathbf{m}_d^k\end{bmatrix}\...
...athbf{m}_{r}^{n+1}] \\ \mathbf{T}_{\lambda}[\mathbf{m}_{d}^{n+1}]\end{bmatrix},$ (13)

where $\mathbf{T}_{\lambda}$ is a thresholding operator with a threshold level $\lambda$ and $\mathbf{S}_{\sigma}$ is smoothing along dominant local slopes with a radius $\sigma$. Smoothing and thresholding correspond to external iterations of the optimization scheme (denoted by $k$), result of which is denoted by $[\mathbf{m}_r^k,\mathbf{m}_d^k]^T$. Shaped models are then input to the next cascade of internal iterations by conjugate gradients as a new starting model. Therefore, internal iterations minimize the misfit term of the equation 6 and regularization in the form of shaping (equation 13) corresponds to external iterations. For wavenumber restoration inversions (equations 10 and 11) shaping operators should be applied separately: thresholding for diffraction restoration, and smoothing along dominant local slopes for reflections.

As soon as PWD is removed from the misfit term in the second inversion cascade (equation 8) the residual gets dominated by reflections. Thus, before disabling PWD we need to make sure that the majority of the diffracted energy has been extracted by the first inversion (equation 6). Because of the high energy of reflections, the sparsity constraints alone can be insufficient to remove reflected component leaked to the diffraction image after disabling PWD. Reflection removal will require higher threshold levels leading to diffraction supression.

To address this issue signal and noise orthogonalization workflow is employed (Chen and Fomel, 2015). We follow the approach and measure the similarity between the update to diffractivity $\mathbf{s}_{d}^{n}$ and reflectivity obtained on the previous external iteration $\mathbf{m}_r^{k-1}$. Events in the diffraction image update $\mathbf{s}_{d}^{n}$ caused by reflection energy leakage have high similarity to reflectivity. Based on the local similarity estimate (Fomel, 2007a) orthogonalization operator is formed $\mathbf{O}_{\mathbf{m}_r^{k-1}}$. It is then applied to the update of the diffractions $\mathbf{O}_{\mathbf{m}_r^{k-1}}(\mathbf{s}_{d}^{n})$ removing reflected energy with high similarity to reflectivity as noise. This procedure can be treated as additional regularization allowing for component separation in the second cascade of the inversions with PWD disabled (equation 8). Orthogonalization can also be applied to the diffractivity model itself rather than to its update. Although both ways are mathematically equivalent to each other in the examples shown below update orthogonalization demonstrates better reflection crosstalk supression. After update orthogonalization, sparsity constraints prove to be efficient to denoise diffractions and suppress the remaining crosstalk, major part of which has been eliminated in the update.