Archive for the 'Compressed Sensing' Category

Positive and negative precision error correlations, real or not?

One of the noisy maps based on the synthetic landscape
noisy-map

Synthetic dataset
ground-truth

All the experiments we have been carrying out with precision error have, so far, been with real data. Because of this, we do not have “ground truth” to determine if the reconstruction is correct. That changed today.

Synthetic experiments are a well-known device for studying models or algorithms. By artificially creating data where one knows exactly what is going on, one can then see if the algorithm one is testing is able to reproduce the artificially created “ground truth”.

I did that today by creating an artificial landscape as shown in the left figure at the top. A single example from the ten noisy versions of this landscape is shown in the figure on the right.

The advantage of using precision error is shown in the figures at the bottom. The figure on the left shows what happens after weighting the maps with the discovered precision error covariance matrix. The picture on the right is the result of simple averaging. The difference is clear. The weighted average is better, and precision error estimation was the way to obtain the weights.

Weighted average using precision error covariance matrix
weighted-average

Simple average of all ten maps
simple-average

ICML accepts precision error via L1 minimization paper

Our technical report on how to recover precision error estimates with 1 -minimization has been accepted by the 2008 International Conference on Machine Learning.

The paper originally got three anonymous reviews. Two were positive, one strongly negative. In our response to the reviews, we agreed with the general criticism by the reviewers that one experimental demonstration is not enough. In our precision error papers so far, we have only been using one dataset — aerial photographs from the Twenty-Nine Palms region in California. So we are going to include some results from North Carolina forest data to show that our technique works for all sorts of images.

Readers of previous posts may note that besides maps, the precision error has been recovered for questions in a multiple-choice-quesiton (MCQ) exam. It would be nice to include this in our ICML paper, but the title of the paper is “Autonomous geometric precision error estimation in low-level computer vision tasks” so it seems incongruous to do so.

The paper was submitted in early January. Afterwards, we realized that our precision error technique for elevation errors in maps applies to any set of models that make scalar predictions about multiple entities. We are now working on a draft for a Science magazine article that will combine the examples from maps and exams to illustrate the wide applicability of our technique.

Student answers versus random answers

An interesting baseline for thinking about precision error is to consider the case of uniformly random answers. The student may be completely ignorant: you gave a college level test to kindergarten kids. Your questions were so hard or so incomprehensible (think Chris Kattan’s mumbling character giving a “uupp-uizzz” (pop-quiz) to this students) that students are just guessing.

Precision error for random answer responses
Fig. 1: Precision error for random guessers

Actual student answers
Fig. 2: Actual student answers

Note how the answer look uniformly gray. There really is no pattern in the students response. This is a uniform group answering this exam — the random answering makes everyone belong to the same group.

This second figure is actual student answers in a test. The grayness of the diagonal squares is varying. Some questions have a precision error lower than random! Others, just as high. The noisy (imprecise questions) are the ones that gave students the most trouble.

Random faster than systematic

I am writing a Mathematica program to produce the precision error signal and reconstruction matrix for an arbitrary number of models. The maximum number I had tried before was ten models because it corresponded to the number of maps we have for the 29 Palms dataset.

My first try consisted of systematically creating all possible permutations of the precision error equations, squaring them, and then storing the coefficients. The program would then systematically look at the equations and augment an independent set every time it found an equation that could not be written as linear combinations of the previous ones in the set.

This worked okay for ten or so models, but I want to produce the full covariance matrix for twenty questions in a multiple-choice exam. No problem, I was making some grilled lamb for Easter dinner yesterday, so I put the computer to work and walked away. Three hours later, the computer was still trying to finish the list of all possible equation permutations! I confess that I have not worked out the combinatorics for the equations yet so perhaps it is of order 20 !. This would be 2,432,902,008,176,640,000 combinations. Compare this to 10 !=3628800 and you can see why the computation got hard quickly.

So my second incantation of the program was to do the computation randomly. Two integers P and Q are picked randomly such that 1 Pm and 1 Qm where m is the number of models. These random numbers are then used to randomly sample the model variables and construct a precision error difference equation. If the equation is independent from the set currently at hand, it is kept, otherwise discarded.

This second version is taking about ten minutes to produce a result. This made me think about how we perceive randomness as haphazardly: “Oh, you are just randomly trying to guess the right answer.” We perceive randomly as wasteful or misguided. The case presented here is just another example of how random is sometimes faster than systematic.

X-raying the geometric precision error of DEMs with Fourier analysis

In a previous post I mentioned a way of Fourier analyzing the geometric precision error of DEMs. Today I realized that the scheme I proposed can only account for part of the error signal. The approach I proposed is correct but it can only capture one particular aspect of the total error. The simplest way of seeing this is to consider the S 2 symmetry group. This would be the one to use for p=2 photographs. From two photographs I can produce two DEMs: AB and BA. The covariance matrix for these two DEMS would be a 2×2 covariance matrix of the form:
(ab bc).
But the representation induced by S 2 on these two DEMs generates the matrices:
(1 0 0 1 )and(0 1 1 0 )
These two matrices cannot capture the three independent degrees of freedom in the 2×2 covariance matrix. Therefore, the induced representation cannot capture all of the possible errors that are observed when two photograps are used to produce two maps. But the representation would allow you to project out that component of the error that is explained by permutations of the images.

You would need at least p=7 photograps to have enough members in S p to completely model the variation in the DEMs observed when you use two photographs to produce a map. To understand the error that cannot be explained by the permutation group you would need to use three photographs to create a DEM. For p photographs this would create p*(p1 )*(p2 ) DEMs. Since we can produce as much or even more than p! DEMs from p photographs, at some point we will always overwhelm the representational power of the symmetry group of p objects (in this case, p photographs). What error remains after we project out the component that can be modelled by S n? I hypothesize that it would be error that can be further Fourier analyzed by using the symmetry group associated with the orientation and positions of the cameras. These parameters are themselves error prone and would, by virtue of their geometry, only induce certain error patterns.

This viewpoint of the errors would therefore view the observed error as one that can be captured by a succesive series of symmetry groups. One component would be that related to the finite group of S p. Another component would be that one induced by translations and rotations of the camera positions and orientations. Like any real theory of errors, this approach would only peel away layers of error — always remaining would be a nugget of error that would require more and more complex models to decompose. The second law of thermodynamics is not violated!

The metaphor to x-raying in the title of this post comes from using Fourier analysis to study X-ray diffraction photographs by crystals. Crystals induce a certain periodicity on the scattered X-rays even when the sample is crushed into a powder. In other words, the randomly scattered blocks of crystal in the powder individually send a perfect difraction pattern. But the X-ray photograph records the mismash of the signals — the picture is blurry. Nonetheless, the bluriness has a symmetry component that comes from the periodic structure of the crystals and therefore Fourier analysis is able to pick the symmetry in the x-ray caused by the crystal periodicity. The Fourier decompositions for geometric errors are doing the same thing. There are many sources of errors in DEMs from aerial photographs. Some come from the fact that you used individual photographs to create the maps. This component of the error can therefore be accounted by studying representations of the symmetry group of p objects. Others come from uncertainty in the position or orientation of the camera when it took the photograph. These are explained by induced representations of non-abelian Lie groups like 3-D rotations in the space of covariance matrices.

Error covariance matrices as images

I submitted my paper on autonomous precision error estimation in 3-D models to the 2008 International Conference on Machine Learning yesterday. One week early, too, a first for me! The format for the paper is the standard double column format and this makes it very hard to have complex equations in the paper. One mathematical object that is hard to display are the covariance matrices for the DEM errors that I keep talking about in these posts. These are nxn matrices of real numbers. One particular example I use comes from images of a desert terrain in the Twenty-Nine Palms area in California. We have four photographs and can therefore produce 12 =4 *3 DEMs. Because of mistakes, two of the DEMs have to be dropped so I end up with 10 DEMs. The resulting covariance matrices are then 10×10 matrices — a hard thing to display in the double-column format since now you have to present 10 numbers in row. So I have hit upon a simple graphical way to present them that saves space but also ends up being more informative to the reader (or me) about the structure of the matrix.

The idea is to turn the 10×10 matrix into a 10×10 pixel image. Each pixel is now a shade of gray. The highest value in the matrix gets the darkest shade, the lowest gets the lighest. Here is an example that illustrates our correlated-pair error modelCovariance matrix for 10 DEMs of a desert terrain in the Twenty-Nine Palms region in California The only terms that are “turned on” are those along the diagonal. In contrast, here is the covariance matrix when you do 1 -minimization and do not assume beforehand that certain DEMs are uncorrelated with each other.Full covariance matrix for 10 DEMs of the Twenty-Nine Palms dataset So the correlated-pair error model is close to the actual covariances but we see that there are some cross-correlations off the diagonal that are on, albeit weaker than those on the block-diagonal defined by the asymmetric DEM pairs.

I apologize for the strange layout of the mages relative to the text of this post but my WordPress instalation does not save changes that I make to the img tag to identify it as requiring it to have text flow around it.
In any case, I hope this illustration makes clear some of the more abstract ideas I have been discussing about errors in DEMs.

Decreasing precision errors with randomness

If I was to rate the things I have learned from computer science, I would place the algorithmic use of randomness right at the top. The uses of randomness in computations is too vast to start a list here. Check out Probability and Computing: Randomized Algorithms and Probabilistic Analysis for many examples. I want to discuss another way of using randomness in computation by discussing the estimation of precision errors in Digital Elevation Models.

I’ll use some simple linear algebra to explain how precision error can be discussed in the language of compressed sensing. The Swiss paper describes how to turn the estimation of the precision error covariance matrix into a linear algebra problem of the
form
S=Φα.
“S” is the signal. In this case, the autonomous difference terms one can calculate from the DEM elevation estimates. This makes “S” the signal because it can be calculated from what we observe — the DEM elevations. The vector α are the precision error covariance terms the robot is trying to estimate without knowing any ground truth. The “reconstruction” matrix Φ(n) tells you how to go between these two quantities. Φis can be calculated exactly and is only a function of the number n of DEMs.

Randomness comes into the error estimation process because there exist many different ways to specify the reconstruction matrix. Take the example of 10 DEMs I have used before since it corresponds to the case I have studied most in my work with Howard Schultz. The autonomous difference equations give us about 5,000 different ways of calculing quantities that do not depend on ground truth. Out of all those many equations, only 45 are linearly independent. Which 45? Any 45. That means that there are many ways of constructing Φ. So many that I can randomly do it by picking equations from the set of 5,000 equations until I get a set of 45 linearly independent ones.

So it becomes possible to check the precision error estimate many different ways. Could one then use this to improve the error estimation process itself? I do not know but I’m investigating that issue today by running experiments with randomly picked independent sets and plotting how the values vary for the same DEMs used as input.

No data is wasted

Compressed sensing caught my attention last year. I was doing a literature search on the Internet to see if anyone else had discussed the autonomous difference equations that Howard Schultz and I had devised to measure the precision errors in Digital Elevation Models (DEMs). One of the basic tenets of compressed sensing is that since many natural signals are sparse, why waste your resources taking many measurements when you can just take fewer to get the same reconstructed signal? For example, our high mega-pixel cameras capture images that we end up compressing anyhow to much smaller files. So why have CCDs with so many pixels?

I have a new hypothesis that would justify all these redundant measurements. Scientists view measurements as two numbers. The guess for the measured quantity, say the temperature of a glass of water, is the one that gets quoted first. But equally important is the error on that temperature guess. So the temperature of the glass should be properly be quoted as 10.0 ±0.2 Celsius degrees, for example. So repeated measurements may not improve the color and intensity estimate for a pixel in a photograph but it dramatically improves our error on that measurement. Measurements should never be wasted! In the case of maps, it means that repeated images of a terrain would not necessarily improve the resolution of the map, but they would have a dramatic effect on the resolution of the error map for our elevation estimates.

I am currently writing a paper for the International Conference on Machine Learning that is studying this hypothesis in the context of DEMs. I’ll post some of the results in a later post if the hypothesis turns out to be correct. I mentioned this hypothesis for my pre-proposal submission to the National Science Foundation’s new Cyber-Enabled Discovery program but I don’t think it will get much traction just yet.

Digital Elevation Model errors are a sparse signal!

I have spoken in previous blogs of how the Terrest system developed by Howard Schultz exploits the asymmetry of computer stereo matching algorithms to produce two Digital Elevation Models (DEMs) from a pair of aerial photographs. This seems like a kind of trickery to many who are exposed to this feature of Terrest since the common practice in map-making from photographs is to produce only one DEM from a photographic pair.

Last Spring we started to develop a theory that explains why this DEM doubling is completely justified. Our approach was to use other photographic pairs to study the correlations between the two DEMs of a photographic pair. We called this model “the correlated-pair error model”. It rests on the assumption that DEMs from different photographic pairs are completely uncorrelated. Mathematically this is expressed by saying that the cross correlation between the errors of two DEMs from unrelated pairs is zero:
[δ ABδ CD]=0 ,
where the square brackets denote averaging of the elevation error of each DEM, δ, over the whole area of the map.

This cross-correlation is not zero when considering the two DEMs from a photographic pair. For example, with two photographs A and B we can produce two DEMs AB and BA. This non-zero value means that on average the error of one DEM at a particular location is likely to be similar in the other DEM at the same location. The existence of this correlation is what has repelled others from producing two DEMs from a photographic pair.

I recently finished a manuscript for the IEEE Computer Vision and Pattern Recognition 2008 conference that was discussed in an earlier post. A misunderstanding on my part lead me to suspect a result I was obtaining with a calculation so I decided to finally do the compressed sensing calculation I had been speculating would be useful in these situations. What do I mean by that? Let me explain it by considering the particular calculation I did.

I had 10 DEMs that came from five photographic pairs. The covariance matrix of the average error of these DEMs is of dimension 10×10. Because it is a symmetric matrix, it only has 55 independent components. The linear equations in our theory only give us 45 independent equations: too many unknowns, too few equations. This is strictly speaking an unsolvable linear equation system. But if most components in the covariance matrix are zero (the assumption of the correlated pair error model), the system is solvable by something called the prime-dual interior point method. As luck would have it, the scientific software system Mathematica has a function call to solve these kind of problems! It is appropriately called “LinearProgramming”.

So I put in all the data from my 10 DEMs into the function using the 45 equations I had to try to figure out the 55 components of the covariance matrix. Imagine the pleasant surprise I felt when out came the very correlated-pair error model I have been assuming all along as correct. That is, without any assumptions, it turns out that the best way to explain the error in the 10 DEMs is precisely the one where only DEMs from the same photographic pair are correlated but DEMs from different pairs are not. How good is this result? DEMs from the same photographic have a typical cross-correlation of 0.08 m 2 . Across photographic pairs that cross-correlation is of order 10 12 m 2 . I call that as close to zero as one can hope with real data!

This is proof that error is itself a sparse signal, so all the theory of compressed sensing also applies to it.

What is more important: precision or accuracy?

I have just uploaded our autonomous precision error estimation Swiss conference paper to the new UMass/Amherst Digital Repository. I work at the Aerial Imaging and Remote Sensing Lab at UMass/Amherst. For years, Howard Schultz has been doubling the number of Digital Elevation Models (DEMs) his Terrest system makes by using the fact that computer stereo matching algorithms are not symmetric in their inputs. The paper above proves mathematically that this is indeed correct and possible if the correlation between DEMs is sparse, i.e. only a few of the DEMs are highly correlated.

We are now writing a proposal to further this work by connecting it with compressed sensing ideas as we briefly hint at the end of the paper above. This has got us thinking about the difference between precision and accuracy in measurement errors. Precision is the width of your error bars, accuracy is where the centroid of the measurement is located relative to the ‘true’ value. So if you had to choose between a precise and an accurate map, which one would you choose?

A precise map is one that captures the shape of the world very well, it has a lot of details. An accurate map is one that tells you where objects are located in the real world. An accurate but imprecise map is located correctly but it is very fuzzy. The landscape looks melted. A precise but inaccurate map is located wrong but has lots of detail — you tried to map a desert patch and the map says you mapped Paris (there is a caveat to this characterization — horizontal correlation — that I don’t want to get into in this post). The practical significance of this difference is enormous. A precise but inaccurate map is just a rigid body transformation and a scale change of the real world (3 + 3 + 1 = 7 unknown parameters) that can be fixed by measuring 3 ground control points (3 + 3 + 3 = 9 measurements). An imprecise but accurate map would require thousands of measurements to recover the detail lost in the fuzzy estimate. Therefore, precise maps are cheaper to make than accurate ones since measuring ground control points is a time consuming expensive task.

Furthermore, for some tasks, precision is all you really care about. For example, scientist Andrea Laliberte at the Jornada Experimental Range in south-central New Mexico is interested in classifying invasive species in the desert. For this task you need precision, not accuracy. Of course, if you wanted to bomb a particular shrub, you would want accuracy. My point is that precision is sometimes good enough and therefore you can sacrifice accuracy. Does this sound familiar?