Archive for October, 2007

Calming the waters with polarization

Above surface scene scrambling by choppy water interface
Howard Schultz and I have just filed our patent application “SYSTEM AND METHOD FOR IMAGING THROUGH AN IRREGULAR WATER SURFACE” for a fully submerged periscope. The etymology of periscope means “seeing around”. In this case we are able to reconstruct the scene above a choppy water surface by measuring the polarization of the light as it refracts through the air-water interface. The figure on the right shows how a choppy air-water interface scrambles the above water scenery. Refracting through a water interface induces a polarization change, so if you can measure the full polarization state of the light, you can make a slope map of the surface. This then allows you to unscramble all the rays and magically reconstruct the scene above. Doing so is not easy. The camera has to measure the full Stokes ‘vector’ of the light rays (I,Q,U,V). A typical camera just measures the light intensity I. The linear polarization of the light is specified by the Q and U terms. And the circular polarization is specified by V.

But one can see that this camera will never be real-time in the sense that a statistical computation will always be required. The choppy water will black out certain parts of the scene and even double up the same object on different pixel locations. We are currently working on figuring this all out. Our underwater periscope is a great combination of physics, computer vision, and statistical techniques.

Enabling math formulas with itex2MML in WordPress 2.3

Two weeks ago I migrated this blog to WordPress 2.3 and in the process I had to go over the steps for enabling itex2MML. I have collected the necessary steps in a draft of an article itex2MML for WordPress 2.3. The whole thing is based on Jacque Distler’s rewrite of itex2MML. The article is in a rough draft form, but if anyone is interested, I’ll put more details in it. By changing itex2MML, the theme pages, and the WordPress 2.3 files, I was able to get output that has been validated as correct XHTML + MathML, CSS by the W3C consortium validators.

Computation in series/parallel: Estimating the ideal throughput speed in uncertain traffic conditions

One of my favorite games to play when I drive is to try to estimate the ideal throughput speed — the speed at which you can go through traffic without having to change your speed. This involves a couple of behavioral changes: you have to allow more space than normal between you and the car in front, you have to be aware of upcoming red lights, etc. Over the years I have found that some drivers behind me get really annoyed over the amount of space in front of me. They race around and drive as fast as they can to the red light ahead whereupon they have to immediately brake. Sometimes I get to go past these drivers since I approach the green light moving while they have to race me again from a dead stop.

It occurred to me the other day that if the driver behind me noticed that I was doing this, they could, in turn, engage in the same estimation game. Their estimate, however, would be better than mine since I am already smoothing out the flow in front of them. This computational chain could continue behind them, each time getting easier and easier to estimate the ideal throughput speed no matter how uncertain the traffic was in front of us. Is this a computation in series or in parallel? We are doing it at the same time so it is in parallel, but each driver depends on the estimate in front of them so it is in series.

Autonomous estimation of the shape of the landscape

In an earlier post, I asserted that it was possible to get a precise map from photographs that only required a translation and rotation. These operations are not enough. You also need a scale change. This is the well-known relative orientation problem in photogrammetry.

However, the conclusion still remains that it is operationally possible to precisely estimate the shape of the terrain without knowing anything about the actual locations of your imaging sensor. If you couple the photographs with a GPS receiver signal you would then narrow considerably the scale change needed to overlay your constructed map with the actual shape of the landscape. In other words, a lot of GPS measurements would get you close to the true scale needed to overlay the map. Let me explain

One can well imagine a noisy GPS receiver that makes you think that two locations of the aerial camera are closer than they actually are. This would result in the constructed map being at a smaller scale than the real world. Likewise, the GPS readings could make you think that the cameras were farther apart when the photographs were taken. This would inflate your map scale. But in either case, the correct scale would be close to the inferred scale.

Now imagine that you consider more and more photographs with their corresponding GPS reading. Since GPS readings are unbiased (one of their great virtues), it would be extremely unlikely in a probabilistic sense that your inferred scale would be far from the real world scale.

Users can take your map and easily overlay it on another map by performing three operations: translation, rotation, and a small scale change. This is an extremely easy thing to do in comparison to trying to get absolute accuracy from the get go!

The random binary detector with the lowest error cost variation

Last night I did some calculations to see what random detector has the lowest error cost variation. That is, given that the detector will be deployed in an uncertain environment, what random detector should you use to minimize its error cost variation. The math is simple so I will show it all in this post.

As in previous posts, I am restricting myself to the simplest case: a binary classification problem. We have two classes, call them A and B. The variability of the testing environment is codified by the single parameter f A, the frequency of class A. The binary random detector is described by a single parameter also, d A — the classification rate for class A.

The cost of misclassifying class A as class B will be denoted by C AB. The cost of classifying class B as class A by C BA. The error cost or loss function in this simple example is then equal to
L=C ABf A(1 d A)+C BA(1 f A)d A.

What random detector setting d A minimizes Lf A, the loss variation under a changing environment? Calculating the derivative one quickly obtains that zero variation in the cost is attainable when
d A=C ABC AB+C BA.
Note that this is not the most accurate detector setting for a particular environment, f A.

The strategic advantage of stabilizing errors in uncertain environments

This weekend I was talking to a friend from graduate school about how to stabilize errors in uncertain environments by sacrificing accuracy, the subject of the previous two posts. During the conversation, I realized another advantage of stabilizing errors: it makes it easier to modify your detector/classifier in an adversarial situation.

The “arms race” game is a common one in conflict situations. You have some technological advantage over an opponent. The opponent adapts and changes their strategy. This is different from the issue of a noisy environment. Here the signal used by your detector is intentionally modified to circumvent its performance level. The enemy camouflages their tanks better. New strategies are deployed. All of which leads to an increase in the error rate of your detector.

A user confronted with this situation would want to slow down their error rate increase so it can be below their learning rate. Such a user would be willing to sacrifice accuracy for the ability to adapt to an adversary. There is a strategic advantage to stabilizing errors.

A simple example of how to stabilize classification errors in uncertain environments by sacrificing accuracy

I corrected an assertion in my earlier post on the difference between classification rates and classification errors. I mistakenly asserted that for a training dataset with a 70/30 prevalence of binary labels the best random classifier is the 70/30 one. This is not so. The best one is the 100/0 classifier since it will correctly classify the prevalent label all of the time and will therefore have an error rate of 30% on the 70/30 training dataset. This is better than the 70/30 random classifier which has a classification error of 42% on this training dataset.

But this clarification provides a simple example of how sacrificing accuracy leads to stabilizing errors in an uncertain environment. As the previous post shows, the 70/30 classifier has an error rate increase of 4% when tried on data that has a 60/40 ratio. Its error has increased by a relative 9.5% ((46 42 )/42 ). But the better random classifier, the 100/0 one, has its error rate increase by 33% ((40 30 )/30 )! It is overtrained and therefore much more unstable when applied to data that differs from the characteristics of its training set. The moral: increased stability of the error rate can be achieved by sacrificing accuracy.

Of course, these are just specific examples. The way to quantify the benefit of sacrificing accuracy is to consider the average stability of a random classifier under various degrees of variability in the input. This can be done, for example, by considering a beta distribution with various amounts of variability to generate synthetic input data. As the variability knob on the beta distribution is increased one should find that the best strategy for stabilizing error on average is to sacrifice accuracy.

Now suppose that a user/employer comes to you and asks that you design a classifier based on a 70/30 training dataset. Should you give them the 100/0 or 70/30 random classifier? I think the responsible thing to do is to give them the 70/30 classifier. It will also make them happier! I don’t think many users/employers are going to be happy with a classifier that gets one class completely wrong all of the time in spite of your pleas that it is more accurate! Why? Because users always have costs associated with the errors of your classifier. Would you be happy with a spam filter that was 70% accurate but delivered no mail?

What is more important: precision or accuracy?

I have just uploaded our autonomous precision error estimation Swiss conference paper to the new UMass/Amherst Digital Repository. I work at the Aerial Imaging and Remote Sensing Lab at UMass/Amherst. For years, Howard Schultz has been doubling the number of Digital Elevation Models (DEMs) his Terrest system makes by using the fact that computer stereo matching algorithms are not symmetric in their inputs. The paper above proves mathematically that this is indeed correct and possible if the correlation between DEMs is sparse, i.e. only a few of the DEMs are highly correlated.

We are now writing a proposal to further this work by connecting it with compressed sensing ideas as we briefly hint at the end of the paper above. This has got us thinking about the difference between precision and accuracy in measurement errors. Precision is the width of your error bars, accuracy is where the centroid of the measurement is located relative to the ‘true’ value. So if you had to choose between a precise and an accurate map, which one would you choose?

A precise map is one that captures the shape of the world very well, it has a lot of details. An accurate map is one that tells you where objects are located in the real world. An accurate but imprecise map is located correctly but it is very fuzzy. The landscape looks melted. A precise but inaccurate map is located wrong but has lots of detail — you tried to map a desert patch and the map says you mapped Paris (there is a caveat to this characterization — horizontal correlation — that I don’t want to get into in this post). The practical significance of this difference is enormous. A precise but inaccurate map is just a rigid body transformation and a scale change of the real world (3 + 3 + 1 = 7 unknown parameters) that can be fixed by measuring 3 ground control points (3 + 3 + 3 = 9 measurements). An imprecise but accurate map would require thousands of measurements to recover the detail lost in the fuzzy estimate. Therefore, precise maps are cheaper to make than accurate ones since measuring ground control points is a time consuming expensive task.

Furthermore, for some tasks, precision is all you really care about. For example, scientist Andrea Laliberte at the Jornada Experimental Range in south-central New Mexico is interested in classifying invasive species in the desert. For this task you need precision, not accuracy. Of course, if you wanted to bomb a particular shrub, you would want accuracy. My point is that precision is sometimes good enough and therefore you can sacrifice accuracy. Does this sound familiar?

On the difference between classification probability and classification error

Last year I realized that a classifier with robust error performance could be created by feeding the errors of a front-end algorithm to a back-end one. To prove the point that this has nothing to do with the particular algorithms, I used a well-known computer science trick — I considered how this would apply to two random classifiers. Initial conversations with others about this idea got a cool reception.

Today I realized that part of the problem is that the distinction between the classification probability of a binary random classifier and its classification error on a particular dataset is not clearly understood. This distinction is important. So I’ll try to explain it on this posting.

Suppose you have a training set for a binary label {(x i,y i)} where y i{1,1 }, and x i is the observed feature vector for each observation. Furthermore, suppose that for this particular dataset the prevalence of the +1 class is 70% versus 30% for the 1 class. Consider the binary random classifier that selects the +1 class 70% of the time and the 1 class 30%. What is the classification error of this random algorithm for this particular dataset?

Well, it correctly classifies class +1 49% of the time (0.7 *0.7 =0.49 ). Class 1 is classified correctly 9% of time (0.3 *0.3 =0.09 ). So the total classification error rate is 42% (100% - 49% - 9%). Note that the error rate of this random classifier on this dataset is not equal to 100% - 70% or 100% - 30% — the classification error rate is not equal to the classification probability.

What does this have to do with error robustness? Users have to deploy our classification algorithms in environments that are unknown to us. The 70/30 random classifier could be deployed in a situation where the prevalence of the classes was 60/40, somewhat different than our training dataset. The error rate of the random classifier on this dataset would then be 46%. Our error rate has increased. For some classification applications we care more about the cost caused by errors than by our accuracy. For example, when I step into a minefield I care more about not stepping on a mine than being accurate about finding a mine every time I gently push a stick into the ground! Therefore, any algorithm that stabilizes the error rate in uncertain environments would be extremely useful to such users. This is a very different line of research than the usual one that you see in the computer science literature where all the focus is on increasing accuracy not stabilizing errors. I’ll leave for another post how one could do precisely this — stabilize the error rate in uncertain environments while sacrificing accuracy by concatenating classification algorithms.