Do enough models always guarantee that the precision error will be sparse?

I’m writing a talk for the Machine Learning Lunch at UMass/Amherst. It has got me thinking about the issue of sparsity in the precision error. Here are the technical details. The precision error when using n models leads to a n 2 covariance matrix. Because the matrix is symmetric, it really has n(n+1 )/2 independent components. You can think of the covariance matrix as a data structure that lives in a n(n+1 )/2 . It does not inhabit the full space because cross-correlations must be less than the variance of the cross-correlated variables.

The autonomous difference equations that were first publishd in the Swiss paper lead to n(n+1 )/2 n independent equations. So the linear algebra system for the precision error covariance matrix is always under-determined by n equations. The 1 minimization technique advocated by Donoho would be able to solve this system if enough of the entries in the matrix were zero. This is what I mean by: is the precision error covariance matrix sparse enough?

You can calculate that the percentage of equations you have as the following fraction less than one
n(n+1 )/2 nn(n+1 )/2 .
This number is equal to
1 2 n+1
so it approaches a well-determined system (the value one) with a term that dies as order 1 /n. Doesn’t this mean that if you have enough models (n large enough) you would be able to become sparse enough to reconstruct the covariance matrix?

This means that sparsity of the error of the signal behaves empirically different than the sparsity of the signal. This has profound implications for the wide-applicability of the theory Howard Schultz and I have developed.

Decreasing precision errors with randomness

If I was to rate the things I have learned from computer science, I would place the algorithmic use of randomness right at the top. The uses of randomness in computations is too vast to start a list here. Check out Probability and Computing: Randomized Algorithms and Probabilistic Analysis for many examples. I want to discuss another way of using randomness in computation by discussing the estimation of precision errors in Digital Elevation Models.

I’ll use some simple linear algebra to explain how precision error can be discussed in the language of compressed sensing. The Swiss paper describes how to turn the estimation of the precision error covariance matrix into a linear algebra problem of the
form
S=Φα.
“S” is the signal. In this case, the autonomous difference terms one can calculate from the DEM elevation estimates. This makes “S” the signal because it can be calculated from what we observe — the DEM elevations. The vector α are the precision error covariance terms the robot is trying to estimate without knowing any ground truth. The “reconstruction” matrix Φ(n) tells you how to go between these two quantities. Φis can be calculated exactly and is only a function of the number n of DEMs.

Randomness comes into the error estimation process because there exist many different ways to specify the reconstruction matrix. Take the example of 10 DEMs I have used before since it corresponds to the case I have studied most in my work with Howard Schultz. The autonomous difference equations give us about 5,000 different ways of calculing quantities that do not depend on ground truth. Out of all those many equations, only 45 are linearly independent. Which 45? Any 45. That means that there are many ways of constructing Φ. So many that I can randomly do it by picking equations from the set of 5,000 equations until I get a set of 45 linearly independent ones.

So it becomes possible to check the precision error estimate many different ways. Could one then use this to improve the error estimation process itself? I do not know but I’m investigating that issue today by running experiments with randomly picked independent sets and plotting how the values vary for the same DEMs used as input.