The Newton, Gauss-Newton, and steepest-descent methods

Next: Conjugate gradient (CG) implementation Up: Full waveform inversion (FWI) Previous: Full waveform inversion (FWI)

The Newton, Gauss-Newton, and steepest-descent methods

In terms of Eq. (64),

$\begin{displaymath}\begin{split}\frac{\partial E(\textbf{m})}{\partial m_i} &=\f... ...^{\dagger}\Delta \textbf{p}\right], i=1,2,\ldots,M. \end{split}\end{displaymath}$

(71)

That is to say,

$\displaystyle \nabla E_{\textbf{m}}=\nabla E(\textbf{m})=\frac{\partial E(\text... ...textbf{p}\right] =\mathtt{Re}\left[\textbf{J}^{\dagger}\Delta \textbf{p}\right]$

(72)

where $\mathtt{Re}$ takes the real part, and $\textbf{J}=\frac{\partial \textbf{p}_{cal}}{\partial \textbf{m}}=\frac{\partial \textbf{f}(\textbf{m})}{\partial \textbf{m}}$ is the Jacobian matrix, i.e., the sensitivity or the Fréchet derivative matrix.

Differentiation of the gradient expression (71) with respect to the model parameters gives the following expression for the Hessian $\textbf{H}$ :

$\begin{displaymath}\begin{split}\textbf{H}_{i,j}&=\frac{\partial^2 E(\textbf{m})... ...frac{\partial\textbf{p}_{cal}}{\partial m_j}\right] \end{split}\end{displaymath}$

(73)

In matrix form

$\displaystyle \textbf{H}=\frac{\partial^2 E(\textbf{m})}{\partial \textbf{m}^2}... ...(\Delta \textbf{p}^*, \Delta \textbf{p}^*, \ldots, \Delta \textbf{p}^*)\right].$

(74)

In many cases, this second-order term is neglected for nonlinear inverse problems. In the following, the remaining term in the Hessian, i.e., $\textbf{H}_a=\mathtt{Re}[\textbf{J}^{\dagger}\textbf{J}]$ , is referred to as the approximate Hessian. It is the auto-correlation of the derivative wavefield. Eq. (68) becomes

$\displaystyle \Delta \textbf{m} =-\textbf{H}^{-1}\nabla E_{\textbf{m}} =-\textbf{H}_a^{-1}\mathtt{Re}[\textbf{J}^{\dagger}\Delta \textbf{p}].$

(75)

The method which solves equation (74) when only $\textbf{H}_a$ is estimated is referred to as the Gauss-Newton method. To guarantee th stability of the algorithm (avoiding the singularity), we can use $\textbf{H}=\textbf{H}_a+\eta \textbf{I}$ , leading to

$\displaystyle \Delta \textbf{m} =-\textbf{H}^{-1}\nabla E_{\textbf{m}} =-(\text... ... \textbf{I})^{-1}\mathtt{Re}\left[\textbf{J}^{\dagger}\Delta \textbf{p}\right].$

(76)

Alternatively, the inverse of the Hessian in Eq. (68) can be replaced by $\textbf{H}=\textbf{H}_a=\mu \textbf{I}$ , leading to the gradient or steepest-descent method:

$\displaystyle \Delta \textbf{m} =-\mu^{-1}\nabla E_{\textbf{m}} =-\alpha\nabla ... ...a\mathtt{Re}\left[\textbf{J}^{\dagger}\Delta \textbf{p}\right],\alpha=\mu^{-1}.$

(77)

At the -th iteration, the misfit function can be presented using the 2nd-order Taylor-Lagrange expansion

$\displaystyle E(\textbf{m}_{k+1})=E(\textbf{m}_k-\alpha_k \nabla E(\textbf{m}_k... ...2}\alpha_k^2\nabla E(\textbf{m}_k)^{\dagger}\textbf{H}_k\nabla E(\textbf{m}_k).$

(78)

Setting $\frac{\partial E(\textbf{m}_{k+1})}{\partial \alpha_k}=0$ gives

$\displaystyle \alpha_k=\frac{\nabla E(\textbf{m}_k)^{\dagger}\nabla E(\textbf{m... ...le\textbf{J}_k\nabla E(\textbf{m}_k),\textbf{J}_k\nabla E(\textbf{m}_k)\rangle}$

(79)

Next: Conjugate gradient (CG) implementation Up: Full waveform inversion (FWI) Previous: Full waveform inversion (FWI)

2021-08-31