next up previous [pdf]

Next: Simple dip filters Up: PREDICTION-ERROR FILTER OUTPUT IS Previous: Weathered layer resonance

PEF whiteness proof in 1-D

The basic idea of least-squares fitting is that the residual is orthogonal to the fitting functions. Applied to the PE filter, this idea means that the output of a PE filter is orthogonal to lagged inputs. The orthogonality applies only for lags in the past, because prediction knows only the past while it aims to the future. What we want to show here is different, namely, that the output is uncorrelated with itself (as opposed to the input) for lags in both directions; hence the output spectrum is white.

In (21) are two separate and independent autoregressions, $ \bold 0\approx\bold Y_a\bold a$ for finding the filter $ \bold a$ , and $ \bold 0\approx\bold Y_b\bold b$ for finding the filter $ \bold b$ . By noticing that the two matrices are really the same (except a row of zeros on the bottom of $ \bold Y_a$ is a row in the top of $ \bold Y_b$ ) we realize that the two regressions must result in the same filters $ \bold a =\bold b$ , and the residual $ \bold r_b$ is a shifted version of $ \bold r_a$ . In practice, I visualize the matrix being a thousand components tall (or a million) and a hundred components wide.

$\displaystyle \bold 0  \approx \bold r_a  = \left[ \begin{array}{ccc} y_1 &...
...d{array} \right] \; \left[ \begin{array}{c} 1 \ b_1 \ b_2 \end{array} \right]$ (21)

When the energy $ \bold r\T\bold r$ of a residual has been minimized, the residual $ \bold r$ is orthogonal to the fitting functions. For example, choosing $ a_2$ to minimize $ \bold r\T\bold r$ gives $ 0=\partial\bold r\T\bold r/\partial a_2=2\bold r\T\partial\bold r/\partial a_2$ . This shows that $ \bold r\T$ is perpendicular to $ \partial \bold r / \partial a_2$ which is the rightmost column of the $ \bold Y_a$ matrix. Thus the vector $ \bold r_a$ is orthogonal to all the columns in the $ \bold Y_a$ matrix except the first (because we do not minimize with respect to $ a_0$ ).

Our goal is a different theorem that is imprecise when applied to the three coefficient filters displayed in (21), but becomes valid as the filter length tends to infinity $ \bold a = (1,a_1, a_2, a_3,\cdots)$ and the matrices become infinitely wide. Actually, all we require is the last component in $ \bold b$ , namely $ b_n$ tend to zero. This generally happens because as $ n$ increases, $ y_{t-n}$ becomes a weaker and weaker predictor of $ y_t$ .

Here's a mathematical fact we soon need: For any vectors $ \bold u$ and $ \bold v$ , if $ \bold r\cdot\bold u=\bold 0$ and $ \bold r\cdot\bold v=\bold 0$ , then $ \bold r\cdot(\bold u + \bold v)=\bold 0$ and $ \bold r\cdot(6\bold u - 3\bold v)=\bold 0$ and $ \bold r\cdot(a_1\bold u + a_2\bold v)=\bold 0$ for any $ a_1$ and $ a_2$ .

The matrix $ \bold Y_a$ contains all of the columns that are found in $ \bold Y_b$ except the last (and the last one is not important). This means that $ \bold r_a$ is not only orthogonal to all of $ \bold Y_a$ 's columns (except the first) but $ \bold r_a$ is also orthogonal to all of $ \bold Y_b$ 's columns except the last. Although $ \bold r_a$ isn't really perpendicular to the last column of $ \bold Y_b$ , it doesn't matter because that column has hardly any contribution to $ \bold r_b$ since $ \vert b_n\vert«1$ . Because $ \bold r_a$ is (effectively) orthogonal to all the components of $ \bold r_b$ , $ \bold r_a$ is also orthogonal to $ \bold r_b$ itself.

Here is a detail: In choosing the example of equation (21), I have shifted the two fitting problems by only one lag. We would like to shift by more lags and get the same result. For this we need more filter coefficients. By adding many more filter coefficients we are adding many more columns to the right side of $ \bold Y_b$ . That's good because we'll be needing to neglect more columns as we shift $ \bold r_b$ further from $ \bold r_a$ . Neglecting these columns is commonly justified by the experience that ``after short range regressors have had their effect, long range regressors generally find little remaining to predict.'' (Recall that the damped harmonic oscillator from physics, the finite difference equation that predicts the future from the past, uses only two lags.)

Here is the main point: Since $ \bold r_b$ and $ \bold r_a$ both contain the same signal $ \bold r$ but time-shifted, the orthogonality at all shifts means that the autocorrelation of $ \bold r$ vanishes at all lags. An exception, of course, is at zero lag. The autocorrelation does not vanish there because $ \bold r_a$ is not orthogonal to its first column (because we did not minimize with respect to $ a_0$ ).

As we redraw $ \bold 0\approx\bold r_b =\bold Y_b\bold b$ for various lags, we may shift the columns only downward because shifting them upward would bring in the first column of $ \bold Y_a$ and the residual $ \bold r_a$ is not orthogonal to that. Thus we have only proven that one side of the autocorrelation of $ \bold r$ vanishes. That is enough however, because autocorrelation functions are symmetric, so if one side vanishes, the other must also.

If $ \bold a$ and $ \bold b$ were two-sided filters like $ (\cdots ,b_{-2}, b_{-1}, 1, b_1, b_2, \cdots)$ the proof would break. If $ \bold b$ were two-sided, $ \bold Y_b$ would catch the nonorthogonal column of $ \bold Y_a$ . Not only is $ \bold r_a$ not proven to be perpendicular to the first column of $ \bold Y_a$ , but it cannot be orthogonal to it because a signal cannot be orthogonal to itself.

The implications of this theorem are far reaching. The residual $ \bold r$ , a convolution of $ \bold y$ with $ \bold a$ has an autocorrelation that is an impulse function. The Fourier transform of an impulse is a constant. Thus the spectrum of the residual is ``white''. Thus $ \bold y$ and $ \bold a$ have mutually inverse spectra.

Since the output of a PEF is white, the PEF itself has a spectrum inverse to its input.

An important application of the PEF is in missing data interpolation. We'll see examples later in this chapter. My third book, PVI[*]has many examples[*]in one dimension with both synthetic data and field data including the gap parameter. Here we next extend these ideas to two (or more) dimensions.


next up previous [pdf]

Next: Simple dip filters Up: PREDICTION-ERROR FILTER OUTPUT IS Previous: Weathered layer resonance

2013-07-26