No data is wasted

Compressed sensing caught my attention last year. I was doing a literature search on the Internet to see if anyone else had discussed the autonomous difference equations that Howard Schultz and I had devised to measure the precision errors in Digital Elevation Models (DEMs). One of the basic tenets of compressed sensing is that since many natural signals are sparse, why waste your resources taking many measurements when you can just take fewer to get the same reconstructed signal? For example, our high mega-pixel cameras capture images that we end up compressing anyhow to much smaller files. So why have CCDs with so many pixels?

I have a new hypothesis that would justify all these redundant measurements. Scientists view measurements as two numbers. The guess for the measured quantity, say the temperature of a glass of water, is the one that gets quoted first. But equally important is the error on that temperature guess. So the temperature of the glass should be properly be quoted as $10.0 \pm 0.2$ Celsius degrees, for example. So repeated measurements may not improve the color and intensity estimate for a pixel in a photograph but it dramatically improves our error on that measurement. Measurements should never be wasted! In the case of maps, it means that repeated images of a terrain would not necessarily improve the resolution of the map, but they would have a dramatic effect on the resolution of the error map for our elevation estimates.

I am currently writing a paper for the International Conference on Machine Learning that is studying this hypothesis in the context of DEMs. I’ll post some of the results in a later post if the hypothesis turns out to be correct. I mentioned this hypothesis for my pre-proposal submission to the National Science Foundation’s new Cyber-Enabled Discovery program but I don’t think it will get much traction just yet.

Digital Elevation Model errors are a sparse signal!

I have spoken in previous blogs of how the Terrest system developed by Howard Schultz exploits the asymmetry of computer stereo matching algorithms to produce two Digital Elevation Models (DEMs) from a pair of aerial photographs. This seems like a kind of trickery to many who are exposed to this feature of Terrest since the common practice in map-making from photographs is to produce only one DEM from a photographic pair.

Last Spring we started to develop a theory that explains why this DEM doubling is completely justified. Our approach was to use other photographic pairs to study the correlations between the two DEMs of a photographic pair. We called this model “the correlated-pair error model”. It rests on the assumption that DEMs from different photographic pairs are completely uncorrelated. Mathematically this is expressed by saying that the cross correlation between the errors of two DEMs from unrelated pairs is zero:
$$\left[ \delta_{AB} \delta_{CD} \right] = 0,$$
where the square brackets denote averaging of the elevation error of each DEM, $\delta$, over the whole area of the map.

This cross-correlation is not zero when considering the two DEMs from a photographic pair. For example, with two photographs A and B we can produce two DEMs AB and BA. This non-zero value means that on average the error of one DEM at a particular location is likely to be similar in the other DEM at the same location. The existence of this correlation is what has repelled others from producing two DEMs from a photographic pair.

I recently finished a manuscript for the IEEE Computer Vision and Pattern Recognition 2008 conference that was discussed in an earlier post. A misunderstanding on my part lead me to suspect a result I was obtaining with a calculation so I decided to finally do the compressed sensing calculation I had been speculating would be useful in these situations. What do I mean by that? Let me explain it by considering the particular calculation I did.

I had 10 DEMs that came from five photographic pairs. The covariance matrix of the average error of these DEMs is of dimension 10×10. Because it is a symmetric matrix, it only has 55 independent components. The linear equations in our theory only give us 45 independent equations: too many unknowns, too few equations. This is strictly speaking an unsolvable linear equation system. But if most components in the covariance matrix are zero (the assumption of the correlated pair error model), the system is solvable by something called the prime-dual interior point method. As luck would have it, the scientific software system Mathematica has a function call to solve these kind of problems! It is appropriately called “LinearProgramming”.

So I put in all the data from my 10 DEMs into the function using the 45 equations I had to try to figure out the 55 components of the covariance matrix. Imagine the pleasant surprise I felt when out came the very correlated-pair error model I have been assuming all along as correct. That is, without any assumptions, it turns out that the best way to explain the error in the 10 DEMs is precisely the one where only DEMs from the same photographic pair are correlated but DEMs from different pairs are not. How good is this result? DEMs from the same photographic have a typical cross-correlation of $0.08 m^2$. Across photographic pairs that cross-correlation is of order $10^{-12} m^2.$ I call that as close to zero as one can hope with real data!

This is proof that error is itself a sparse signal, so all the theory of compressed sensing also applies to it.

Autonomous horizontal correlation length in DEM data

The “Swiss paper” that I discussed in an earlier post solved the problem of vertical precision error estimation. I used a set of difference equations that range over $\{l,m\}$ where $l$ and $m$ are integers from 1 to the number of observations. The equations look like the difference of simple averages. My purpose in using them is that I would be able to cancel out the true value and be left with the error in each measurement. Surprisingly, the set of independent equations you get from considering all possible equations of this type can be turned into a well-determined linear algebra problem for the entries in a particular sparse covariance matrix. This allowed me to measure the vertical uncertainty in a composite Digital Elevation Model (DEM) without knowing ground truth. I currently interpret this as meaning that I have an estimate of how good my DEM model is. I could still be off by a scale, rotation, and translation.

But vertical uncertainty is only the first of two important numbers for the quality of a map. Another important one is the horizontal resolution. How fine grained are the readings in the map? Another way to capture this resolution is to ask what is the horizontal correlation length — how far apart do two measurements have to be so that they are de-correlated with each other. A way to study this is with the variogram. I had never heard of a ‘variogram’ until a couple of years ago. It essentially measures the spatial correlation of data by taking the average of the spatial difference of a function:
$$E[(f(x)-f(x+L))^2]$$
One typical behavior of this variogram function is that it starts at zero and rises to a plateau in an exponential fashion:
$$r*(1-\exp(-L/\lambda))$$
The horizontal correlation length is defined to be where the variogram reaches 67% of its final value. In the exponential rise curve this happens when $L=\lambda$.

I have had to put off the calculation of the correlation length until today. I was astounded to get almost text-book like exponential rise curves. The autonomous difference equations work for horizontal correlation lengths also! I found a four postings correlation length for our Twenty-Nine Palms dataset. Howard says this is very good resolution. What surprises me is that this could even be done.

Computation in series/parallel: Estimating the ideal throughput speed in uncertain traffic conditions

One of my favorite games to play when I drive is to try to estimate the ideal throughput speed — the speed at which you can go through traffic without having to change your speed. This involves a couple of behavioral changes: you have to allow more space than normal between you and the car in front, you have to be aware of upcoming red lights, etc. Over the years I have found that some drivers behind me get really annoyed over the amount of space in front of me. They race around and drive as fast as they can to the red light ahead whereupon they have to immediately brake. Sometimes I get to go past these drivers since I approach the green light moving while they have to race me again from a dead stop.

It occurred to me the other day that if the driver behind me noticed that I was doing this, they could, in turn, engage in the same estimation game. Their estimate, however, would be better than mine since I am already smoothing out the flow in front of them. This computational chain could continue behind them, each time getting easier and easier to estimate the ideal throughput speed no matter how uncertain the traffic was in front of us. Is this a computation in series or in parallel? We are doing it at the same time so it is in parallel, but each driver depends on the estimate in front of them so it is in series.

Autonomous estimation of the shape of the landscape

In an earlier post, I asserted that it was possible to get a precise map from photographs that only required a translation and rotation. These operations are not enough. You also need a scale change. This is the well-known relative orientation problem in photogrammetry.

However, the conclusion still remains that it is operationally possible to precisely estimate the shape of the terrain without knowing anything about the actual locations of your imaging sensor. If you couple the photographs with a GPS receiver signal you would then narrow considerably the scale change needed to overlay your constructed map with the actual shape of the landscape. In other words, a lot of GPS measurements would get you close to the true scale needed to overlay the map. Let me explain

One can well imagine a noisy GPS receiver that makes you think that two locations of the aerial camera are closer than they actually are. This would result in the constructed map being at a smaller scale than the real world. Likewise, the GPS readings could make you think that the cameras were farther apart when the photographs were taken. This would inflate your map scale. But in either case, the correct scale would be close to the inferred scale.

Now imagine that you consider more and more photographs with their corresponding GPS reading. Since GPS readings are unbiased (one of their great virtues), it would be extremely unlikely in a probabilistic sense that your inferred scale would be far from the real world scale.

Users can take your map and easily overlay it on another map by performing three operations: translation, rotation, and a small scale change. This is an extremely easy thing to do in comparison to trying to get absolute accuracy from the get go!

The random binary detector with the lowest error cost variation

Last night I did some calculations to see what random detector has the lowest error cost variation. That is, given that the detector will be deployed in an uncertain environment, what random detector should you use to minimize its error cost variation. The math is simple so I will show it all in this post.

As in previous posts, I am restricting myself to the simplest case: a binary classification problem. We have two classes, call them A and B. The variability of the testing environment is codified by the single parameter $f_A$, the frequency of class A. The binary random detector is described by a single parameter also, $d_A$ — the classification rate for class A.

The cost of misclassifying class A as class B will be denoted by $C_{A \rightarrow B}$. The cost of classifying class B as class A by $C_{B \rightarrow A}$. The error cost or loss function in this simple example is then equal to
$$L = C_{A \rightarrow B} f_A (1 – d_A) + C_{B \rightarrow A} (1-f_A) d_A.$$

What random detector setting $d_A$ minimizes $\frac{\partial L}{\partial f_A}$, the loss variation under a changing environment? Calculating the derivative one quickly obtains that zero variation in the cost is attainable when
$$ d_A = \frac{C_{A \rightarrow B}}{C_{A \rightarrow B} + C_{B \rightarrow A}}.$$
Note that this is not the most accurate detector setting for a particular environment, $f_A$.

The strategic advantage of stabilizing errors in uncertain environments

This weekend I was talking to a friend from graduate school about how to stabilize errors in uncertain environments by sacrificing accuracy, the subject of the previous two posts. During the conversation, I realized another advantage of stabilizing errors: it makes it easier to modify your detector/classifier in an adversarial situation.

The “arms race” game is a common one in conflict situations. You have some technological advantage over an opponent. The opponent adapts and changes their strategy. This is different from the issue of a noisy environment. Here the signal used by your detector is intentionally modified to circumvent its performance level. The enemy camouflages their tanks better. New strategies are deployed. All of which leads to an increase in the error rate of your detector.

A user confronted with this situation would want to slow down their error rate increase so it can be below their learning rate. Such a user would be willing to sacrifice accuracy for the ability to adapt to an adversary. There is a strategic advantage to stabilizing errors.