I’m writing a talk for the Machine Learning Lunch at UMass/Amherst. It has got me thinking about the issue of sparsity in the precision error. Here are the technical details. The precision error when using $n$ models leads to a $n^2$ covariance matrix. Because the matrix is symmetric, it really has $n(n+1)/2$ independent components. You can think of the covariance matrix as a data structure that lives in a $n(n+1)/2$. It does not inhabit the full space because cross-correlations must be less than the variance of the cross-correlated variables.
The autonomous difference equations that were first publishd in the Swiss paper lead to $n(n+1)/2 – n$ independent equations. So the linear algebra system for the precision error covariance matrix is always under-determined by $n$ equations. The $\ell_1$ minimization technique advocated by Donoho would be able to solve this system if enough of the entries in the matrix were zero. This is what I mean by: is the precision error covariance matrix sparse enough?
You can calculate that the percentage of equations you have as the following fraction less than one
$$\frac{n(n+1)/2-n}{n(n+1)/2}.$$
This number is equal to
$$ 1 – \frac{2}{n+1}$$
so it approaches a well-determined system (the value one) with a term that dies as order $1/n$. Doesn’t this mean that if you have enough models ($n$ large enough) you would be able to become sparse enough to reconstruct the covariance matrix?
This means that sparsity of the error of the signal behaves empirically different than the sparsity of the signal. This has profound implications for the wide-applicability of the theory Howard Schultz and I have developed.