Interesting Properties of Matrix Norms and Singular Values

$ \DeclareMathOperator*{\argmax}{arg\,max} $

Matrix norms and singular values have special relationships. Before I forget about them, I’ll summarized them in this post.

Definitions

Schatten p-Norm
The Schatten p-Norm is defined as the following.¹
\[\|X\|_{S_p} := \left( \sum_i^n s_i(X)^p \right)^\frac{1}{p}\]
Nuclear Norm
The nuclear norm of a matrix is defined as a special case of the Schatten p-norm where $p=1$.
Frobenius Norm
\[\|X\|_F = \sqrt{ \sum_{i=1}^n \sum_{j=1}^n \lvert a_{ij} \rvert^2 }\]
Matrix p-Norm
Matrix p-norm is defined as
\[\|A\|_p = \sup_{x \neq 0} \frac{\|Ax\|_p}{\|x\|_p}\]
In another word, matrix p-Norm is defined as the largest scalar that you can get for a unit vector $e$.
Spectral Norm
Largest singular value of a matrix $\sigma_1(X)$.
Special case of the matrix p-norm where $p=2$ when the matrix $X$ is positive semi-definite. For negative definite matrix, the matrix 2-norm is not necessarily the largest norm.

Lemmas

$ A \in \mathbf{S}^n \; tr(A) = \sum_i^n \lambda_i =\lVert A \rVert_{S_1}$
Trace of a symmetric matrix $A$ is equal to the sum of eigen values. Let A be a symmetric matrix $A \in \mathcal{S}^{n}$. Then there exists a orthogonal matrix $U$ and diagonal matrix $\Lambda$ such that $A = U \Lambda U^T$.
$$ \begin{align} tr(A) & = tr(A^T)\\ & = \sum_i^n e_i^T A^T e_i \\ & = \sum_i^n e_i^T U^T \Lambda U e_i \\ & = \sum_i^n \sum_j^n u_{ji}^T \lambda_j u_{ji} \\ & = \sum_j^n \lambda_j \sum_i u{ji}^2\\ & = \sum_j^n \lambda_j \end{align} $$
Where $U = [ u_1, u_2, u_3, \dots u_n]$. We used the fact that $u_i^T u_i = 1$.
$ A \in \mathbf{S}_+^n \; tr(A) =\sum_i^n \lvert \sigma_i \rvert = |A|_{S_1} $
Trace of a positive semi-definite matrix $A$ is equal to the L1 norm of singular values, or is equal to the Schatten 1-Norm (Nuclear Norm).
This is the direct extension of Lemma 1.
$$ \begin{align} tr(A) & = \sum_i^n \lambda_i\\ & = \sum_i^n \sigma_i\\ & = \sum_i^n \| \sigma_i \|\\ & = \|A\|_{S_1} \end{align} $$
Since the L1 norm of singular values enforce sparsity on the matrix rank, yhe result is used in many application such as low-rank matrix completion and matrix approximation.
$ \lVert X\rVert_F = \sqrt{ \sum_i^n \sigma_i^2 } = \lVert X\rVert_{S_2} $
Frobenius norm of a matrix is equal to L2 norm of singular values, or is equal to the Schatten 2 norm.
$$ \begin{align} \|X\|_F & = \sqrt{ tr(X^T X) }\\ & = \sqrt{ tr(V \Sigma U^T U \Sigma V^T) }\\ & = \sqrt{ tr(V\Sigma^2 V^T)}\\ & = \sqrt{ \sum_i \sigma_i^2 }\\ & = \|X\|_{S_2} \end{align} $$
$ \lVert A \rVert_1 = \max_j \sum_i^n \lvert a_{ij} \rvert $
L1 matrix norm of a matrix is equal to the maximum of L1 norm of a column of the matrix.
To begin with, the solution of L1 optimization usually occurs at the corner. If the function of interest is piece-wise linear, the extrema always occur at the corners. One can show this by showing that $\frac{|Ax|_1}{|x|_1}$ is Lipschitz except $x = 0$ and differentiate $\frac{|Ax|_1}{|x|_1}$ w.r.t. $x_i$. Then show that they are non-zero. Use the fact on line 1 and 2.
Finally, we can just compute the L1 norms of each column. Let’s reformulate the problem.
$$ \begin{align} \|A\|_1 & = \sup_{x \neq 0} \frac{\|Ax\|_1}{\|x\|_1}\\ & = \max_{\|x\|_1 = 1} \sum_j \sum_i^n \lvert a_{ij}\rvert \lvert x_j \rvert\\ & = \max_j \sum_i \lvert a_{ij} \rvert \end{align} $$
$ \lVert A \rVert_\infty = \max_i \sum_j^n \lvert a_{ij} \rvert $
The supremum occurs at the corner of the hypercube since infinity nomr of a vector is the absolute value of the largest element in it. Thus,
$$ \begin{align} \|A\|_\infty & = \max \frac{\|Ax\|_\infty}{\|x\|_\infty}\\ & = \max_{\|x_j\| = 1} \|Ax\|_\infty \\ & = \max_i \|a_i^T\| \end{align} $$
Where $a_i^T$ is the $i$ th row of the matrix $A$.

http://en.wikipedia.org/wiki/Schatten_norm ↩

Share on

Twitter Facebook Google+ LinkedIn

Chris Choy

Interesting Properties of Matrix Norms and Singular Values

Definitions

Lemmas

Share on

Leave a Comment

You May Also Enjoy

Spacetime Surface Regularization for Neural Scene Reconstruction

Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids

Acid: Action-conditional implicit visual dynamics for deformable object manipulation