Archive for the 'precision error application' Category

Robust voting in uncertain environments

Combining the judgments of different recognizers is always better than using the best one alone. This observation is universal in machine learning realms. But it seldom gets used in practice. Why?

For one, it costs more to implement. Instead of one recognizer, you must deploy several. Computing cycles grow linearly with the number of recognizers if they all have roughly proportional cost. In academic circles, you get rewarded for finding the best recognizer/algoritm so there is little incentive to work in using the work of others along with your own. Companies usually haveĀ  proprietary rights to a few recognizers and want to deploy their best one to save development costs, hog less cycles on the customer’s machine, etc.

in certain conditions, however, it actually pays to use different recognizers and to incur the extra development and computing costs. Noisy or uncertain deployment environments can seriously degrade the performance of a well-tuned system. In such situations, the recognizer’s output could be worse than random and it would be impossible to place any certainty on it output.

The situation changes dramatically when we use different recognizers. To see this, consider the case of multiple recognizers, all of which have a probability greater than one half of getting the right answer. One can establish rigorous bounds for the number of recognizers that we would need to guarantee that a simple majority vote would have, say, greater than 99% probability of being right. This is discussed in the article on the Chernof Bound in Wikipedia.

But simple majority voting has its problems. For one, it is incredibly costly. Suppose that your detectors where correct two-thirds of the time. it would take at least 15 detectors to have 90 per cent probability of being correct when you averaged their individual votes

In addition, what if the testing condition starts to differ too much from the training data? The detectors could have degraded performance and 15 detectors may only give you fifty per cent chance of being correct. How would you know that your probability of being correct has dropped?

In future posts, I will explain a scheme using the precision error methodology that I have discussed in previous posts to create a robust voting system that automatically weighs the vote of a collection of systems according to their actual accuracy on the data, and it does this without any ground truth and in a completely data-driven way. Two system may “flip”, one becoming more accurate than the other and the algorithm I propose to combine their decisions would detect this reversal. The practical effect of this scheme is that one can predict, in the field, automatically the probability that one’s collection of recognizers is correct with an efficiency that is higher than simple voting. In compressed sensing, you use less sensors to reconstruct the same signal. I propose something similar, but for decision making: using less decision makers in a robust fashion to make the correct recognition decision.

Not every measurement is perfect

Precision error estimate variance decay

Just to show that not all questions behave as nicely as question 9 in the previous post, here is the plot for question 6 in the same exam.The fit is not as good as for question 9. This is expected, there is no reason why the precision error should decay with a perfect exponential behavior. Nonetheless, it still shows a similar decay constant — about six questions. Remember to click on the image to get the larger image in a zoom pane.

Minimum number of questions needed for an exam

Another application of the precision error covariance matrix is to find out the minimum number of questions needed for an exam. The linear algebra system derived from the precision erorr equations requires at least three scientific models before one can measure the precision error. More is better. But what is enough? How many questions does one need to ask in an exam to be guaranteed that the precision error one measures for the questions is well estimated?

To give an idea of what this number might be, consider first the popular parlor game “Twenty Questions”. One person thinks about something and the rest of the group must guess what it is by asking twenty questions or less. Isn’t it the case that this is almost always possible? Twenty questions is plenty to find out about what someone is thinking even with no prior information. This is, in fact, so automatic that popular toys exist that ask questions and are remarkably good at coming up with what one is thinking in twenty questions or less.

It seems to me, then, that twenty questions is way too many questions to ask in an exam if the purpose of that exam is to figure out if a student is competent in the material covered in class. Using precision error covariance matrices, I can actually calculate what the minimum number of questions are. This can be done by looking at how the precision error estimate varies for a single question as one varies the number of questions it is paired against. As the number of questions it is paired with increase, its precisione error estimate settles down to a value that does not change any further after
a certain threshold is reached. This threshold is the minimum number of questions needed in an exam.

I am now carrying out experiments with the exam data I have to empirically measure the number.

Grading mistake detection with precision error

While making a covariance matrix for eighteen questions in an introductory Physics exam I gave in the Spring of 2006, I discovered another use for the precision error measurements: grading mistake detection.

The figure shown first is my initial try. I computed the student score on each question with the function: f(correct)=1.0 ,f(incorrect)=0.0 .Two graded incorrectly. Note the two dark squares at position 5 and 6.

Further investigation showed that I incorrectly scored the two questions. The second figure shows the matrix after I corrected the grading.All graded correctly Note how the squares at position 5 and 6 are now similar to the others.

The precision error covariance matrix also detects grading anomalies!