Archive for the 'Maps' Category

To err is human, to study your errors is glorious

I’ve been sick all week but today has been the worst. In between my sleeping hallucinations I have been thinking a lot about a proposal I’m currently writing on the use of non-commutative harmonic analysis to study mapping error patterns. It has become clear that the approach we are advocating at the AIRS lab is applicable to other areas of machine learning. Before I describe in additional detail what I mean by this let me present a graphic that abstractly represents the scientific enterprise: An abstract representation of the scientific enterprise
The work I am describing here lies at the bottom of the picture. It is an algorithmic prescription for understanding the precision of our models given a dataset used to construct those models. I present this diagram to make clear the limitations of our work. It is not a description or explanation of errors in general. It is a technique for probing the error patterns in your system. The hard work is still left to you on how to apply it for a specific system that constructs models from data, and its usefulness is not guaranteed. The technique may tell you nothing interesting about your system.

We can view model creation as a black box. It takes data inputs and produces a model. The crucial point is that in some machine learning situations the number of models we can build is very large. Data is model-redundant.The redundancy is sometimes continuous, for example, the initial position of a camera or the weight of a Lagrangian term. But it can also be discrete — a finite set of documents or photographs. Changing the data inputs can then be used to probe the variation of a system’s model predictions. These variations will not be completely random (i.e. patternless). Some documents are more informative, some photographs give us a better view. Therefore we can use the symmetry groups associated with our data inputs to Fourier analyze the model variation. This model variation or model precision is informative about the quality of the data and can be used to reject bad data or discount lower quality inputs.

Error covariance matrices as images

I submitted my paper on autonomous precision error estimation in 3-D models to the 2008 International Conference on Machine Learning yesterday. One week early, too, a first for me! The format for the paper is the standard double column format and this makes it very hard to have complex equations in the paper. One mathematical object that is hard to display are the covariance matrices for the DEM errors that I keep talking about in these posts. These are nxn matrices of real numbers. One particular example I use comes from images of a desert terrain in the Twenty-Nine Palms area in California. We have four photographs and can therefore produce 12 =4 *3 DEMs. Because of mistakes, two of the DEMs have to be dropped so I end up with 10 DEMs. The resulting covariance matrices are then 10×10 matrices — a hard thing to display in the double-column format since now you have to present 10 numbers in row. So I have hit upon a simple graphical way to present them that saves space but also ends up being more informative to the reader (or me) about the structure of the matrix.

The idea is to turn the 10×10 matrix into a 10×10 pixel image. Each pixel is now a shade of gray. The highest value in the matrix gets the darkest shade, the lowest gets the lighest. Here is an example that illustrates our correlated-pair error modelCovariance matrix for 10 DEMs of a desert terrain in the Twenty-Nine Palms region in California The only terms that are “turned on” are those along the diagonal. In contrast, here is the covariance matrix when you do 1 -minimization and do not assume beforehand that certain DEMs are uncorrelated with each other.Full covariance matrix for 10 DEMs of the Twenty-Nine Palms dataset So the correlated-pair error model is close to the actual covariances but we see that there are some cross-correlations off the diagonal that are on, albeit weaker than those on the block-diagonal defined by the asymmetric DEM pairs.

I apologize for the strange layout of the mages relative to the text of this post but my WordPress instalation does not save changes that I make to the img tag to identify it as requiring it to have text flow around it.
In any case, I hope this illustration makes clear some of the more abstract ideas I have been discussing about errors in DEMs.

Fourier theory of DEM precision errors

I’ve finished the experiments with different reconstruction matrices for the DEM precision error and I get a rock solid result independent of which reconstruction matrix I use. So my hypothesis that randomness may be used to increase the precision error was wrong. In the process, however, I have finally understood how to use the symmetry group S n to Fourier analyze the covariance matrix. This has lead me to consider generalizations of our current approach that rely on the asymmetry of stereo matching algorithms.

The covariance matrix for our current procedure for creating maps is made up of photographic pairs. From two images, A and B, we create DEMs AB and BA. So n photographs lead to n*(n1 ) DEMs. The resulting covariance matrix can be Fourier analyzed by considering the representation induced by the symmetry group for n objects (in our case the photographs) on the n*(n1 ) space. That is, for each element of the group, call it π, we define M AB,CD=1 if π(A)π(B)=CD. This matrix representation can then be decomposed into its irreducible components to carry out the Fourier transform.

The above construction can then, in turn, be generalized by using the asymmetry of stereo matching algorithms. One constructs DEMs of the form ABC. This will not, in general, produce the same DEM as ACB and so on. There will be n*(n1 )*(n2 ) ways of constructing these DEMs. A representation of the group can then be induced by generalizing the rule in the previous paragraph. Bringing in more photographs into the chain will induce higher and higher dimensional representations of the symmetry group. But note that all these representations are, by construction, smaller or equal to the n! dimensionality of the symmetry group itself. Higher dimensional representations could be constructed because an arbitrary DEM like ABAC will not be equivalent to the AC DEM, for example. The matching process being imperfect will not return to the same pixel when the matching chain is of the form ABA.

None of these more complicated DEM production processes will lead to anything interesting if there were no errors in the matching process. If creating a 3-D model from photographs was perfect, all the DEMs would be error free and the covariance matrix would be proportional to the identity matrix. In other words, the Fourier decomposition of the covariance matrix is interesting because there is a symmetry to the errors. I’ll keep readers updated on the results of this line of inquiry as I obtain concrete results.

Decreasing precision errors with randomness

If I was to rate the things I have learned from computer science, I would place the algorithmic use of randomness right at the top. The uses of randomness in computations is too vast to start a list here. Check out Probability and Computing: Randomized Algorithms and Probabilistic Analysis for many examples. I want to discuss another way of using randomness in computation by discussing the estimation of precision errors in Digital Elevation Models.

I’ll use some simple linear algebra to explain how precision error can be discussed in the language of compressed sensing. The Swiss paper describes how to turn the estimation of the precision error covariance matrix into a linear algebra problem of the
form
S=Φα.
“S” is the signal. In this case, the autonomous difference terms one can calculate from the DEM elevation estimates. This makes “S” the signal because it can be calculated from what we observe — the DEM elevations. The vector α are the precision error covariance terms the robot is trying to estimate without knowing any ground truth. The “reconstruction” matrix Φ(n) tells you how to go between these two quantities. Φis can be calculated exactly and is only a function of the number n of DEMs.

Randomness comes into the error estimation process because there exist many different ways to specify the reconstruction matrix. Take the example of 10 DEMs I have used before since it corresponds to the case I have studied most in my work with Howard Schultz. The autonomous difference equations give us about 5,000 different ways of calculing quantities that do not depend on ground truth. Out of all those many equations, only 45 are linearly independent. Which 45? Any 45. That means that there are many ways of constructing Φ. So many that I can randomly do it by picking equations from the set of 5,000 equations until I get a set of 45 linearly independent ones.

So it becomes possible to check the precision error estimate many different ways. Could one then use this to improve the error estimation process itself? I do not know but I’m investigating that issue today by running experiments with randomly picked independent sets and plotting how the values vary for the same DEMs used as input.

No data is wasted

Compressed sensing caught my attention last year. I was doing a literature search on the Internet to see if anyone else had discussed the autonomous difference equations that Howard Schultz and I had devised to measure the precision errors in Digital Elevation Models (DEMs). One of the basic tenets of compressed sensing is that since many natural signals are sparse, why waste your resources taking many measurements when you can just take fewer to get the same reconstructed signal? For example, our high mega-pixel cameras capture images that we end up compressing anyhow to much smaller files. So why have CCDs with so many pixels?

I have a new hypothesis that would justify all these redundant measurements. Scientists view measurements as two numbers. The guess for the measured quantity, say the temperature of a glass of water, is the one that gets quoted first. But equally important is the error on that temperature guess. So the temperature of the glass should be properly be quoted as 10.0 ±0.2 Celsius degrees, for example. So repeated measurements may not improve the color and intensity estimate for a pixel in a photograph but it dramatically improves our error on that measurement. Measurements should never be wasted! In the case of maps, it means that repeated images of a terrain would not necessarily improve the resolution of the map, but they would have a dramatic effect on the resolution of the error map for our elevation estimates.

I am currently writing a paper for the International Conference on Machine Learning that is studying this hypothesis in the context of DEMs. I’ll post some of the results in a later post if the hypothesis turns out to be correct. I mentioned this hypothesis for my pre-proposal submission to the National Science Foundation’s new Cyber-Enabled Discovery program but I don’t think it will get much traction just yet.

Digital Elevation Model errors are a sparse signal!

I have spoken in previous blogs of how the Terrest system developed by Howard Schultz exploits the asymmetry of computer stereo matching algorithms to produce two Digital Elevation Models (DEMs) from a pair of aerial photographs. This seems like a kind of trickery to many who are exposed to this feature of Terrest since the common practice in map-making from photographs is to produce only one DEM from a photographic pair.

Last Spring we started to develop a theory that explains why this DEM doubling is completely justified. Our approach was to use other photographic pairs to study the correlations between the two DEMs of a photographic pair. We called this model “the correlated-pair error model”. It rests on the assumption that DEMs from different photographic pairs are completely uncorrelated. Mathematically this is expressed by saying that the cross correlation between the errors of two DEMs from unrelated pairs is zero:
[δ ABδ CD]=0 ,
where the square brackets denote averaging of the elevation error of each DEM, δ, over the whole area of the map.

This cross-correlation is not zero when considering the two DEMs from a photographic pair. For example, with two photographs A and B we can produce two DEMs AB and BA. This non-zero value means that on average the error of one DEM at a particular location is likely to be similar in the other DEM at the same location. The existence of this correlation is what has repelled others from producing two DEMs from a photographic pair.

I recently finished a manuscript for the IEEE Computer Vision and Pattern Recognition 2008 conference that was discussed in an earlier post. A misunderstanding on my part lead me to suspect a result I was obtaining with a calculation so I decided to finally do the compressed sensing calculation I had been speculating would be useful in these situations. What do I mean by that? Let me explain it by considering the particular calculation I did.

I had 10 DEMs that came from five photographic pairs. The covariance matrix of the average error of these DEMs is of dimension 10×10. Because it is a symmetric matrix, it only has 55 independent components. The linear equations in our theory only give us 45 independent equations: too many unknowns, too few equations. This is strictly speaking an unsolvable linear equation system. But if most components in the covariance matrix are zero (the assumption of the correlated pair error model), the system is solvable by something called the prime-dual interior point method. As luck would have it, the scientific software system Mathematica has a function call to solve these kind of problems! It is appropriately called “LinearProgramming”.

So I put in all the data from my 10 DEMs into the function using the 45 equations I had to try to figure out the 55 components of the covariance matrix. Imagine the pleasant surprise I felt when out came the very correlated-pair error model I have been assuming all along as correct. That is, without any assumptions, it turns out that the best way to explain the error in the 10 DEMs is precisely the one where only DEMs from the same photographic pair are correlated but DEMs from different pairs are not. How good is this result? DEMs from the same photographic have a typical cross-correlation of 0.08 m 2 . Across photographic pairs that cross-correlation is of order 10 12 m 2 . I call that as close to zero as one can hope with real data!

This is proof that error is itself a sparse signal, so all the theory of compressed sensing also applies to it.

Autonomous horizontal correlation length in DEM data

The “Swiss paper” that I discussed in an earlier post solved the problem of vertical precision error estimation. I used a set of difference equations that range over {l,m} where l and m are integers from 1 to the number of observations. The equations look like the difference of simple averages. My purpose in using them is that I would be able to cancel out the true value and be left with the error in each measurement. Surprisingly, the set of independent equations you get from considering all possible equations of this type can be turned into a well-determined linear algebra problem for the entries in a particular sparse covariance matrix. This allowed me to measure the vertical uncertainty in a composite Digital Elevation Model (DEM) without knowing ground truth. I currently interpret this as meaning that I have an estimate of how good my DEM model is. I could still be off by a scale, rotation, and translation.

But vertical uncertainty is only the first of two important numbers for the quality of a map. Another important one is the horizontal resolution. How fine grained are the readings in the map? Another way to capture this resolution is to ask what is the horizontal correlation length — how far apart do two measurements have to be so that they are de-correlated with each other. A way to study this is with the variogram. I had never heard of a ‘variogram’ until a couple of years ago. It essentially measures the spatial correlation of data by taking the average of the spatial difference of a function:
E[(f(x)f(x+L)) 2 ]
One typical behavior of this variogram function is that it starts at zero and rises to a plateau in an exponential fashion:
r*(1 exp(L/λ))
The horizontal correlation length is defined to be where the variogram reaches 67% of its final value. In the exponential rise curve this happens when L=λ.

I have had to put off the calculation of the correlation length until today. I was astounded to get almost text-book like exponential rise curves. The autonomous difference equations work for horizontal correlation lengths also! I found a four postings correlation length for our Twenty-Nine Palms dataset. Howard says this is very good resolution. What surprises me is that this could even be done.

Calming the waters with polarization

Above surface scene scrambling by choppy water interface
Howard Schultz and I have just filed our patent application “SYSTEM AND METHOD FOR IMAGING THROUGH AN IRREGULAR WATER SURFACE” for a fully submerged periscope. The etymology of periscope means “seeing around”. In this case we are able to reconstruct the scene above a choppy water surface by measuring the polarization of the light as it refracts through the air-water interface. The figure on the right shows how a choppy air-water interface scrambles the above water scenery. Refracting through a water interface induces a polarization change, so if you can measure the full polarization state of the light, you can make a slope map of the surface. This then allows you to unscramble all the rays and magically reconstruct the scene above. Doing so is not easy. The camera has to measure the full Stokes ‘vector’ of the light rays (I,Q,U,V). A typical camera just measures the light intensity I. The linear polarization of the light is specified by the Q and U terms. And the circular polarization is specified by V.

But one can see that this camera will never be real-time in the sense that a statistical computation will always be required. The choppy water will black out certain parts of the scene and even double up the same object on different pixel locations. We are currently working on figuring this all out. Our underwater periscope is a great combination of physics, computer vision, and statistical techniques.

Autonomous estimation of the shape of the landscape

In an earlier post, I asserted that it was possible to get a precise map from photographs that only required a translation and rotation. These operations are not enough. You also need a scale change. This is the well-known relative orientation problem in photogrammetry.

However, the conclusion still remains that it is operationally possible to precisely estimate the shape of the terrain without knowing anything about the actual locations of your imaging sensor. If you couple the photographs with a GPS receiver signal you would then narrow considerably the scale change needed to overlay your constructed map with the actual shape of the landscape. In other words, a lot of GPS measurements would get you close to the true scale needed to overlay the map. Let me explain

One can well imagine a noisy GPS receiver that makes you think that two locations of the aerial camera are closer than they actually are. This would result in the constructed map being at a smaller scale than the real world. Likewise, the GPS readings could make you think that the cameras were farther apart when the photographs were taken. This would inflate your map scale. But in either case, the correct scale would be close to the inferred scale.

Now imagine that you consider more and more photographs with their corresponding GPS reading. Since GPS readings are unbiased (one of their great virtues), it would be extremely unlikely in a probabilistic sense that your inferred scale would be far from the real world scale.

Users can take your map and easily overlay it on another map by performing three operations: translation, rotation, and a small scale change. This is an extremely easy thing to do in comparison to trying to get absolute accuracy from the get go!

What is more important: precision or accuracy?

I have just uploaded our autonomous precision error estimation Swiss conference paper to the new UMass/Amherst Digital Repository. I work at the Aerial Imaging and Remote Sensing Lab at UMass/Amherst. For years, Howard Schultz has been doubling the number of Digital Elevation Models (DEMs) his Terrest system makes by using the fact that computer stereo matching algorithms are not symmetric in their inputs. The paper above proves mathematically that this is indeed correct and possible if the correlation between DEMs is sparse, i.e. only a few of the DEMs are highly correlated.

We are now writing a proposal to further this work by connecting it with compressed sensing ideas as we briefly hint at the end of the paper above. This has got us thinking about the difference between precision and accuracy in measurement errors. Precision is the width of your error bars, accuracy is where the centroid of the measurement is located relative to the ‘true’ value. So if you had to choose between a precise and an accurate map, which one would you choose?

A precise map is one that captures the shape of the world very well, it has a lot of details. An accurate map is one that tells you where objects are located in the real world. An accurate but imprecise map is located correctly but it is very fuzzy. The landscape looks melted. A precise but inaccurate map is located wrong but has lots of detail — you tried to map a desert patch and the map says you mapped Paris (there is a caveat to this characterization — horizontal correlation — that I don’t want to get into in this post). The practical significance of this difference is enormous. A precise but inaccurate map is just a rigid body transformation and a scale change of the real world (3 + 3 + 1 = 7 unknown parameters) that can be fixed by measuring 3 ground control points (3 + 3 + 3 = 9 measurements). An imprecise but accurate map would require thousands of measurements to recover the detail lost in the fuzzy estimate. Therefore, precise maps are cheaper to make than accurate ones since measuring ground control points is a time consuming expensive task.

Furthermore, for some tasks, precision is all you really care about. For example, scientist Andrea Laliberte at the Jornada Experimental Range in south-central New Mexico is interested in classifying invasive species in the desert. For this task you need precision, not accuracy. Of course, if you wanted to bomb a particular shrub, you would want accuracy. My point is that precision is sometimes good enough and therefore you can sacrifice accuracy. Does this sound familiar?