Dictionary learning by K-SVD

The K-SVD method (Aharon et al., 2006) is one approach that solves equation 4 with good performance. The dictionaries in $\mathbf{F}$ are not obtained at a time. Instead, $K$ columns in $\mathbf{F}$ are updated one by one while fixing $\mathbf{M}$. In order to update the $k$th column, one can first write the objective function in equation 4 as

\parallel \mathbf{D} - \mathbf{FM} \parallel_F^...
...bf{E}_k- \mathbf{f}_k\mathbf{m}_T^k \parallel_F^2 ,
\end{split}\end{displaymath} (5)

where $\mathbf{f}_j$ is the $j$th column vector in $\mathbf{F}$, $\mathbf{m}_T^j$ is the $j$th row vector in $\mathbf{M}$. Here, $[\cdot]_T$ simply indicates a row vector. For simplicity, in equation 5 and the following derivations, I omit the superscript $n$ shown in equation 4. $\mathbf{E}_k$ is the fitting error using all column vectors other than the $k$th dictionary and their corresponding coefficients row vectors. Note that in equation 5, $\mathbf{D}$ and $\mathbf{E}_k$ are of size $M\times N$, $\mathbf{F}$ is of size $M\times K$, and $\mathbf{M}$ is of size $K\times N$. Here, $M$ is the length of each training signal, $N$ is the number of training signals, and $K$ is the number of atoms in the dictionary.

It is now obvious that the $k$th dictionary in $\mathbf{F}$ is updated by minimizing the misfit between the rank-1 approximation of $\mathbf{f}_k\mathbf{m}_T^k$ and the $\mathbf{E}_k$ term. The rank-1 approximation is then solved using the singular value decomposition (SVD).

A problem in the direct use of SVD for rank-1 approximation of $\mathbf{E}_k$ is the loss of sparsity in $\mathbf{m}_T^k$. After SVD on $\mathbf{E}_k$, $\mathbf{m}_T^k$ is likely to be filled. In order to solve such problem, K-SVD restricts minimization of equation 5 to a small set of training signals $\mathbf{D}_k=\{\mathbf{d}_i: \mathbf{m}_T^k(i)\ne 0\}$. To achieve this goal, one can define a transformation matrix $\mathbf{R}_k$ to shrink $\mathbf{E}_k$ and $\mathbf{m}_T^k$ by rejecting the zero columns in $\mathbf{E}_k$ and zero entries in $\mathbf{m}_T^k$. First one can define a set $r_k=\{i\vert 1\le i \le N, \mathbf{m}_T^k(i)\ne 0 \}$, which selects the entries in $\mathbf{m}_T^k$ that are non-zero. One then constructs $\mathbf{R}_k$ as a matrix of size $N\times N_r^k$ with ones on the $(r_k(i),i)$ entries and zeros otherwise.

Applying $\mathbf{R}_k$ to both $\mathbf{E}_k$ and $\mathbf{m}_T^k$, then the objective function in equation 5 becomes

$\displaystyle \parallel\mathbf{E}_k\mathbf{R}_k- \mathbf{f}_k\mathbf{m}_T^k\mat...
...arallel_F^2 = \parallel\mathbf{E}^R_k- \mathbf{f}_k\mathbf{m}_R^k \parallel_F^2$ (6)

and can be minimized by directly using SVD:

$\displaystyle \mathbf{E}^R_k = \mathbf{U}\Sigma\mathbf{V}^T.$ (7)

$\mathbf{f}_k$ is then set as the first column in $\mathbf{U}$, the coefficient vector $\mathbf{m}_R^k$ as the first column of $\mathbf{V}$ multiplied by first diagonal entry $\sigma_1$. After $K$ columns are all updated, one turns to solve equation 3 for $\mathbf{M}$.