Next: Pareto curve Up: Hennenfent et al.: Pareto Previous: Introduction

Problem statement

Consider the following underdetermined system of linear equations

$\displaystyle \ensuremath{\mathbf{y}} = \tensor{A}\ensuremath{\mathbf{x}}_0+\ensuremath{\mathbf{n}},$

(1)

where the

-vectors $\ensuremath{\mathbf{y}}$ and $\ensuremath{\mathbf{n}}$ represent observations and additive noise, respectively. The

-by-

matrix $\tensor{A}$ is the modeling operator that links the model $\ensuremath{\mathbf{x}}_0$ to the noise-free data given by $\ensuremath{\mathbf{y}}-\ensuremath{\mathbf{n}}$ . We assume that $N\gg n$ and that $\ensuremath{\mathbf{x}}_0$ has few nonzero or significant entries. We use the terms ``model'' and ``observations'' in a broad sense, so that many linear geophysical problems can be cast in the form shown in equation 1. In the case of wavefield reconstruction, for example, $\ensuremath{\mathbf{y}}$ is the acquired seismic data with missing traces, $\tensor{A}$ can be the restriction operator combined with the curvelet synthesis operator so that $\ensuremath{\mathbf{x}}_0$ is the curvelet representation of the fully-sampled wavefield (Herrmann and Hennenfent, 2008; Hennenfent and Herrmann, 2008).

Because $\ensuremath{\mathbf{x}}_0$ is assumed to be (almost) sparse, one can promote sparsity as a prior via one-norm regularization to overcome the singular nature of $\tensor{A}$ when estimating $\ensuremath{\mathbf{x}}_0$ from $\ensuremath{\mathbf{y}}$ . A common approach is to solve the convex optimization problem

QP $\displaystyle _\lambda: \quad\min_{\ensuremath{\mathbf{x}}} \textstyle \frac{1... ...\ensuremath{\mathbf{x}}\Vert _2^2 +\lambda\Vert\ensuremath{\mathbf{x}}\Vert _1,$

which is closely related to quadratic programming (QP); the positive parameter $\lambda$ is the Lagrange multiplier, which balances the tradeoff between the two norm of the data misfit and the one norm of the solution. Many algorithms are available for solving QP $_\lambda$ , including IRLS, iterative soft thresholding (IST), introduced by Daubechies et al. (2004), and the IST extension to include cooling (ISTc - Figueiredo and Nowak, 2003), which was tailored to geophysical applications by Herrmann and Hennenfent (2008).

It is generally not clear, however, how to choose the parameter $\lambda$ such that the solution of QP $_\lambda$ is, in some sense, optimal. A directly related optimization problem, the basis pursuit (BP) denoise problem (Chen et al., 1998), minimizes the one norm of the solution given a maximum misfit, and is given by

BP $\displaystyle _\sigma:\quad\min_{\ensuremath{\mathbf{x}}} \Vert\ensuremath{\mathbf{x}}\Vert _1$ s.t. $\displaystyle \quad\Vert\ensuremath{\mathbf{y}}-\tensor{A}\ensuremath{\mathbf{x}}\Vert _2\leq\sigma.$

This formulation is often preferred when an estimate of the noise level $\sigma\geq 0$ in the data is available. BP $_\sigma$ can be solved using ISTc or the spectral projected-gradient algorithm (SPG $\ell _1$ ) introduced by van den Berg and Friedlander (2008).

For interest, a third optimization problem, connected to QP $_\lambda$ and BP $_\sigma$ , minimizes the misfit given a maximum one norm of the solution, and is given by the LASSO (LS) problem (Tibshirani, 1996)

LS $\displaystyle _\tau:\quad\min_{\ensuremath{\mathbf{x}}} \textstyle \frac{1}{2}\Vert\ensuremath{\mathbf{y}}-\tensor{A}\ensuremath{\mathbf{x}}\Vert _2^2$ s.t. $\displaystyle \quad\Vert\ensuremath{\mathbf{x}}\Vert _1\leq\tau.$

Because an estimate of the one norm of the solution $\tau\geq 0$ is typically not available for geophysical problems, this formulation is seldom used directly. It is, however, a key internal problem used by SPG $\ell _1$ in order to solve BP $_\sigma$ .

To understand the connection between these approaches and compare their related solvers in different scenarios, we propose to follow Daubechies et al. (2007) and van den Berg and Friedlander (2008) and look at the Pareto curve.

Next: Pareto curve Up: Hennenfent et al.: Pareto Previous: Introduction

2008-03-27