Low-rank approximation of the data matrix

In the similarity-weighted stacking approaches (2)-(4), $\hat{s}(t)= \frac{1}{N}\sum_{i=1}^{N} s_i(t)$ is not an appropriate zero-offset approximation. The spatial arithmetic mean of the data matrix is the true zero-offset trace only if the random noise is statistically white, and all traces after NMO correction are aligned well. Furthermore, no existing abnormal traces should exist in the data matrix. These requirements are seldom met due to the extremely complicated features of real seismic data and seismic data are always contaminated with different types of noise, e.g. erratic noise and colored noise.

A better way for calculating the approximation of the zero-offset trace is to calculate the spatial arithmetic mean of a low-rank approximated data matrix using principal component analysis (PCA). PCA is an important tool for multivariate analysis in statistics. The idea is to reduce the dimensionality of a data set while preserving as much variability of data variables as possible (Jolliffe, 2010).

Suppose the data matrix $\mathbf{D}$ is composed of signal component $\mathbf{S}$, random noise $\mathbf{N}$, erratic noise $\mathbf{E}$, and mis-aligned data components $\mathbf{M}$:

$\displaystyle \mathbf{D} = \mathbf{S} + \mathbf{N} + \mathbf{E} + \mathbf{M}.$ (5)

For seismic stacking in this paper, $\mathbf{D}$ is simply a common midpoint gather. If we assume the error components $\mathbf{N} + \mathbf{E} + \mathbf{M}$ are composed of small random perturbations, an optimal estimate of $\mathbf{S}$ can be acquired via the following optimization problem:

&\min \parallel \mathbf{N} + \mathbf{E} + \math...
... \mathbf{S} + \mathbf{N} + \mathbf{E} + \mathbf{M},
\end{split}\end{displaymath} (6)

where $k$ denotes the rank constraint applied to the target signal components. The problem can be efficiently solved via singular value decomposition (SVD). The observed data matrix $\mathbf{D}$ can be decomposed into a group of eigen-images via the SVD. The low-rank component $\mathbf{S}$ can be described with a few eigen-images that are associated with the largest singular values. The other noise items $\mathbf{N}$, $\mathbf{E}$, $\mathbf{M}$, however, will have energy spread over all the eigen-images.