**Draft**

If you are familiar with linear algebra, then you have probably heard of SVD, which is a method of matrix decomposition to produce a singular matrix $\Sigma$ from a given matrix $A$ such the $A = U\Sigma V^T$. In signal processing, we can think of $A$ as a dictionary $D \in \mathbb{R}^{n\times m}$ with atoms ${d_i}_1^m$ representing the columns of the matrix. Our objective is to find a representation that can estimate the elements of a matrix with as few atoms as possible.

A coding of the dictionary is a linear combination of atoms in order to represent a signal $y \in \mathbb{R}^n$. We can let this be $y = Dx$. Sparseness becomes important if $m > n$ and $D$ is a full rank matrix, because there are an infinite number of solutions to approximate $y$. For this reason, we must apply the condition of sparseness, which states that either

- $(P_o) \, min_x|x|_{0}$ subject to $y = Dx$
- $(P_o, \epsilon) \, min_x|x|_0$ subject to $|y-Dx|_2 \leq \epsilon$

The first condition simply means that we find a solution to $y = Dx$ while minimizing the number of atoms we choose from $D$ to represent $y$. The other possibility is we are not able to exactly calculate y, in which case hold the condition that the distance between $y$ and $Dx$ $\leq \epsilon$ while minimizing the $l_0$-norm (choosing the fewest number of atoms from D). In other words, the solution is the fewest number of atoms required to reduce the distance between $y$ and $Dx$ to at most $\epsilon$.

To touch on some of the motivations for why dictionary learning is useful, you can imagine trying to classify, compress, or perform other operations on images which can be represented as large sparse matrices. This paper does a good job of going further into the topic.