Damped MSSA

However, $\tilde{\mathbf{M}}$ is actually still mixed with residual noise. In the following part, we will first analyse the reason why $\tilde{\mathbf{M}}$ also contains noise component, and then introduce the modified MSSA method by damping the singular values, which we call the damped MSSA.

The singular value decomposition (SVD) of $\mathbf{S}$ can be represented as:

$\displaystyle \mathbf{S} = [\mathbf{U}_1^S\quad \mathbf{U}_2^S]\left[\begin{arr...
...t[\begin{array}{c}
(\mathbf{V}_1^S)^H\\
(\mathbf{V}_2^S)^H
\end{array}\right].$ (7)

The corresponding matrices in equations 5 and 7 have the same size.

Because of the deficient rank, the matrix $\mathbf{S}$ can be written as

$\displaystyle \mathbf{S}=\mathbf{U}_1^S\Sigma_1^S(\mathbf{V}_1^S)^H.$ (8)

Combining equations 4, 7, and 8, we can factorize $\mathbf{M}$ as follows:

$\displaystyle \mathbf{M} = [\mathbf{U}_1^S \quad \mathbf{U}_2^S]\left[\begin{ar...
...a_1^S)^H\\
(\Sigma_{2})^{-1}(\mathbf{N}^H\mathbf{U}_2^S)^H
\end{array}\right],$ (9)

where $\Sigma_1$ and $\Sigma_2$ denote diagonal and positive definite matrices. Please note that M is constructed such that it is close to a square matrix (Oropeza and Sacchi, 2011), and thus the $\Sigma_1$ and $\Sigma_2$ are assumed to be square matrices for derivation convenience. The appendix A provides the derivation of how we factorize $\mathbf{M}$ into the form of equation 9. We observe that the left matrix has orthonormal columns and the middle matrix is diagonal. It can be proven that the right matrix also has orthonormal columns. The proof is provided in appendix A. Thus, equation 9 is an SVD of $\mathbf{M}$. According to the TSVD method, we let $\Sigma_2$ be $\mathbf{0}$ and then the following equation holds:

\begin{displaymath}\begin{split}
\tilde{\mathbf{M}} &= \mathbf{U}_1^S(\mathbf{U}...
...bf{S} + \mathbf{U}_1^S(\mathbf{U}_1^S)^H\mathbf{N}.
\end{split}\end{displaymath} (10)

It is clear that $\tilde{\mathbf{M}}\neq\mathbf{S}$. Because the matrices $\mathbf{U}_1^S$ and $\mathbf{N}$ are unknown, we cannot use equation 10 directly to attenuate the residual noise. However, by combining equations 6, 9 and 10, we can derive $\mathbf{S}$ as:

$\displaystyle \mathbf{S}=\mathbf{U}_1^M\left\{\Sigma_1^M(\mathbf{V}_1^M)^H- \left[\Sigma_1(\mathbf{V}_1^M)^H-\Sigma_1^S(\mathbf{V}_1^S)^H\right]\right\}$ (11)

The appendix B gives a detailed derivation to obtain equation 11 from equations 6, 9 and 10.

For simplification, we assume that there exist such $\mathbf{A}$ and $\mathbf{B}$ that $\mathbf{V}_1^S=\mathbf{V}_1^M\mathbf{A}$ and $\Sigma_1= \Sigma_1^M\mathbf{B}$. $\mathbf{A}$ is a square matrix of $K\times K$ and $\mathbf{B}$ is a diagonal matrix of $K\times K$. Then we can simplify $\mathbf{S}$ as:

$\displaystyle \mathbf{S}$ $\displaystyle = \mathbf{U}_1^M\Sigma_1^M\mathbf{T}\left(\mathbf{V}_1^M\right)^H.$ (12)
$\displaystyle \mathbf{T}$ $\displaystyle = \mathbf{I} - \mathbf{B}\left(\mathbf{I}-(\Sigma_1)^{-1}\Sigma_1^S\mathbf{A}^H\right),$ (13)

where $\mathbf{I}$ is a unit matrix and here we name $\mathbf{T}$ the damping operator. In fact, from equation 9, we can also approximate $\mathbf{A}$ and $\mathbf{B}$ as follows:

$\displaystyle \mathbf{A}$ $\displaystyle \approx(\mathbf{I}-\Gamma)\Sigma_1\left(\Sigma_1^S\right)^{-1},$ (14)
$\displaystyle \mathbf{B}$ $\displaystyle =\mathbf{I}.$ (15)

where $\Gamma=(\mathbf{V}_1^M)^{o}\mathbf{N}^H\mathbf{U}_1^S(\Sigma_1)^{-1}$ is an unknown matrix. $(\mathbf{V}_1^M)^{o}$ can be regarded as an approximate inverse of $\mathbf{V}_1^M$, which satisfies that $\parallel\mathbf{I}-\mathbf{V}_1^M(\mathbf{V}_1^M)^{o} \parallel\rightarrow 0$. The appendix B provides the detailed derivation for obtaining equations 14 and 15.

Inserting equations 14 and 15 into equation 13, we can obtain a simplified formula:

\begin{displaymath}\begin{split}
\mathbf{T} &= \mathbf{I} - \mathbf{I}\left(\mat...
...a_1^S\right)^{-1}\right)^H \\
&=\mathbf{I}-\Gamma.
\end{split}\end{displaymath} (16)

Combing equations 12 and 16, we can conclude that the true signal is a damped version of the previous TSVD method (equation 6), with the damping operator defined by equation 16. Right now, there is still one unknown parameter needed to be defined: $\Gamma$. Although we have a potential selection $\Gamma=(\mathbf{V}_1^M)^{o}\mathbf{N}^H\mathbf{U}_1^S(\Sigma_1)^{-1}$, as defined during the derivation of $\mathbf{A}$, we cannot calculate it because we do not know $\mathbf{N}$ and $\mathbf{U}_1^S$.

Instead, we seek the form of $\Gamma$ from a different way. We treat $\Gamma$ as a whole instead of paying attention to each detailed component in $\Gamma=(\mathbf{V}_1^M)^{o}\mathbf{N}^H\mathbf{U}_1^S(\Sigma_1)^{-1}$. We have known that the true signal is a damped version of the TSVD method from the previous derivation, and the damping operator equals to $\mathbf{I}-\Gamma$, acting on the diagonal matrix $\Sigma_1^M$. We can begin our search for an approximation of $\Gamma$ based on the following conditions:

  1. As we know from the previous derivation, the truncating point of TSVD is the rank of the signal matrix $\mathbf{S}$. In other words, the rank of $\tilde{\mathbf{M}}= \mathbf{U}_1^M\Sigma_1^M(\mathbf{V}_1^M)^H$ equals to the rank of $\mathbf{S}$. The rank of $\tilde{\mathbf{M}}$ should remain unchanged when we damp $\Sigma_1^M$. Therefore, the damping operator should be a diagonal and positive definite matrix.
  2. Each element of the damping operator should be in the interval $(0,1]$.
  3. The lower the SNR is, the stronger the damping should be, because the energy of random noise in the signal-noise space is relatively stronger.
  4. In the case of zero random noise, the damping operator should be a unit matrix.
  5. Since we always hope to preserve the main components of seismic data, the damping operator should have a weaker effect on the larger singular value.
  6. The power of the damping operator can be controlled by one coefficient and the damped MSSA can revert to the traditional MSSA.
From the above analysis, we propose to use a $\Gamma$ of the following form:

\begin{displaymath}\Gamma=
\left(
\begin{array}{cccc}
a_1/b_1 & & & \\ [3mm]
& a...
...\ [3mm]
& & \ddots & \\ [3mm]
& & & a_K/b_K
\end{array}\right),\end{displaymath} (17)

where $\{a\} $ contains the information of random noise, $\{b\} $ contains the information of signal, $a_1/b_1<a_2/b_2<\cdots<a_K/b_K $ and $0<a_i<b_i $, $i=1,2,\cdots,K $.

We tried a lot of numerical experiments and found that a very pleasant denoising performance can be obtained when $\Gamma$ is chosen as

$\displaystyle \Gamma \approx \hat{\delta}^N\left(\Sigma_1^M\right)^{-N},$ (18)

where $\hat{\delta}$ denotes the maximum element of $\Sigma_2^M$ and $N$ denotes the damping factor. We use such approximation because of three reasons. (1) $\hat{\delta}$ reflects the energy of random noise and $\Sigma_1^M$ contains the information of signal. (2) Because the diagonal elements of $\Sigma^M $ are in a descending order, $\hat{\delta}$ is certainly smaller than every diagonal element of $\Sigma_1^M$, and $\hat{\delta}/\delta_1<\hat{\delta}/\delta_2<\cdots<\hat{\delta}/\delta_K $, where $\delta_i$ denotes $i$th diagonal entry in $\Sigma_1^M$. (3) $\hat{\delta}$ is zero in the zero random noise situation. Besides, We introduce the parameter $N$ to control the strength of damping operator, the greater the $N$, the weaker the damping, and the damped MSSA reverts to the basic MSSA when $N\to\infty $.

Combining equations 12, 16, and 18, we conclude the approximation of $\mathbf{S}$ as:

$\displaystyle \mathbf{S}$ $\displaystyle = \mathbf{U}_1^M \Sigma_1^M\mathbf{T}(\mathbf{V}_1^M)^H,$ (19)
$\displaystyle \mathbf{T}$ $\displaystyle =\mathbf{I}-(\Sigma_1^M)^{-N}\hat{\delta}^N.$ (20)

We call the newly developed algorithm (equations 19 and 20) damped MSSA algorithm. In the proposed algorithm, we only need to decide two parameters: the rank $K$ and the damping factor $N$. It is worth mentioning that the damping operator $\mathbf{T}$ (equation 20) also behaves as a soft-thresholding operator applied on the singular matrix $\mathbf{\Sigma_1^M}$. The damping (thresholding) step may thus cause slight damage to useful signal while more noise is scaled down, and the final S/N can still be greatly improved.

The parameterization for the DMSSA approach is quite convenient. Although the traditional MSSA has a rank with broad range according to the data size and data complexity, the damping factor is usually chosen between 2-5. When damping factor is chosen as 1, the damping is very strong and will cause some useful energy loss, but when the damping factor is chosen as 2, 3, or even larger value, the compromise between preservation of useful signals and removal of random noise are much improved. The implementation of the DMSSA approach can be straightforwardly based on the MSSA framework, except for the slight difference, which introduces the damping factor.


2020-02-21