Archive for the 'Randomness' Category

Minimum number of questions revisited

To show off the installation of FancyZoom (a trick I learned while visiting the excellent Language Log), I present a graph of the percentage variation in the mean square precision error as a function of the number of questions used to compute it. The image looks small but you can now click on it to obtain a zoomed in version. Try it!

Variance of precision error estimates for question 9 as a function of number of questions used.
Question 9 precision error estimate variance

Note how good the fit is to a shifted exponential function of the form:
a+b*exp(c(n q3 )).
The measurements are the small dots at n q={4,6,7,10,12 }. The fitted values are a=0.06 , b=0.2 , and c=0.43 . The variable c is the decay constant for the variability in the estimate. In particular, if you calculate its inverse 1 /c you get the number of questions beyond three that will give you less than 33% variability in the estimate. This turns out to be about 2 questions. So ten or twelve questions should be enough for this group of students.

Once again, this suggests that teachers are asking too many questions in their multiple choice exams.

ICML accepts precision error via L1 minimization paper

Our technical report on how to recover precision error estimates with 1 -minimization has been accepted by the 2008 International Conference on Machine Learning.

The paper originally got three anonymous reviews. Two were positive, one strongly negative. In our response to the reviews, we agreed with the general criticism by the reviewers that one experimental demonstration is not enough. In our precision error papers so far, we have only been using one dataset — aerial photographs from the Twenty-Nine Palms region in California. So we are going to include some results from North Carolina forest data to show that our technique works for all sorts of images.

Readers of previous posts may note that besides maps, the precision error has been recovered for questions in a multiple-choice-quesiton (MCQ) exam. It would be nice to include this in our ICML paper, but the title of the paper is “Autonomous geometric precision error estimation in low-level computer vision tasks” so it seems incongruous to do so.

The paper was submitted in early January. Afterwards, we realized that our precision error technique for elevation errors in maps applies to any set of models that make scalar predictions about multiple entities. We are now working on a draft for a Science magazine article that will combine the examples from maps and exams to illustrate the wide applicability of our technique.

Student answers versus random answers

An interesting baseline for thinking about precision error is to consider the case of uniformly random answers. The student may be completely ignorant: you gave a college level test to kindergarten kids. Your questions were so hard or so incomprehensible (think Chris Kattan’s mumbling character giving a “uupp-uizzz” (pop-quiz) to this students) that students are just guessing.

Precision error for random answer responses
Fig. 1: Precision error for random guessers

Actual student answers
Fig. 2: Actual student answers

Note how the answer look uniformly gray. There really is no pattern in the students response. This is a uniform group answering this exam — the random answering makes everyone belong to the same group.

This second figure is actual student answers in a test. The grayness of the diagonal squares is varying. Some questions have a precision error lower than random! Others, just as high. The noisy (imprecise questions) are the ones that gave students the most trouble.

Mean precision error equations

Back in December, Howard Schultz and I wrote a paper for the IEEE Computer Vision and Pattern Recognition 2008 conference. The paper, by the way, was rejected by the reviewers (this should form an interesting article at some future time). In the paper we calculate the horizontal decorrelation length for a collection of maps using a variogram technique.

The plots were not coming out right and Howard pointed out that I had not de-meaned the data. In particular, I had not de-meaned the average precision error. That is, the precision error for a model has a mean.

For some reason, I had thought that this mean error could be exactly solved with the precision error equations. It cannot. Given m models, the precision error equations give you m1 equations. So the mean value also has to be recovered with 1 -minimization.

How should one interpret this mean precision error? In the case of exams, the mean precision error for a question could be interpreted as the hardness or ease for the question. Hard questions will get answered correctly by few people, easy ones will be answered correctly by everyone.

This is what I observe in the introductory Physics class exam I have been analyzing. The hardest questions have the smallest mean error. The easiest ones have the largest mean error.

The correct way of defining the covariance matrix involves substracting this mean. That is, the entries should be of the form δ iδ jδ iδ j and not just plain δ iδ j as I have been writing in previous posts.

Who is causing the variations in predictions?

Precision error is a measure of the variation in the predictions of a collection of models. If there are no variations — all the models agree. There is no precision error. One is perfectly in focus as far as one can tell. But, of course, scientific models disagree. So who is to blame? Is it the data or is it the algorithms used to process the data (the models). Having enough models allows you to do decide who is to blame.

Consider the case of a piece of data used to train one of the models that always lead to a disagreement, no matter what algorithm is used to process it. Who is to blame in this case? The optimal choice seems to me to be to decide that the data is bad.

Now consider an algorithm that always disagrees with all the other models no matter what set of predictions are compared. This seems to suggest that the model is wrong, not the data.

In both these cases, the availability of a large number of models is what allows one to distinguish the two cases. Real data will not be as stark as the examples above. Here is where Fourier analysis and probability theory come in. As the number of models increases one is able to disentangle the two. For small number of models, blaming the data or the model would explain the observed variation equally well. As the number of models increase, assigning blame becomes asymmetric!

This is sort of like the “Is it me or is him/her?” question. Comparing ourselves to only one other person does not allow us to decide who is the crazy one. But the more people we interact with, the sooner we realize who is to blame.

I’ll try to come up with a simple example with a few models to illustrate the point mathematically in a later post.

The half life of exam question precision error estimates

It sounds strange to think it. But the precision error formalism I have been discussing is a relation between the data and the models. Who is to blame for the variation that is observed? Is it the data? Is the data noisy? Is it the algorithm used to calculate the model? It could be both.

One way to see this dependency of the error estimation on the data is to consider the diagonal terms in the covariance matrix: the self-correlation in the precision error, the terms of the form δ i 2 . Since this involves only one model, we would expect it to be the error of the model. How does this number vary for a question in an exam as we change the set of questions we compare it against? If we use a few questions, the bias in those questions may skew the bias in our estimate. Another set of a few questions could have a wildly different estimate for the self variance of a question.

As the number of questions that are included in the comparison set are increased, what happens to the fluctuations in the estimates of the self-correlations. Do they decrease? How fast do they decrease? Some preliminary experiments with the exam questions show that with three questions, the estimates vary by 51% of the mean. When eighteen questions are used, they vary by 15%. Fitting these two preliminary data points to an exponential decay curve gives a decay constant for the noise in the precisione error estimate (this sounds so weird — the noise of the noise?). The decay constant one gets from the above figures is fourteen questions.

Fourteen questions seem to be enough to have an estimate of the precision error of each question that is within 27% of the estimate. This seems to suggest that twenty questions is about right after one adds extra questions as insurance padding against data variability.

Minimum number of questions needed for an exam

Another application of the precision error covariance matrix is to find out the minimum number of questions needed for an exam. The linear algebra system derived from the precision erorr equations requires at least three scientific models before one can measure the precision error. More is better. But what is enough? How many questions does one need to ask in an exam to be guaranteed that the precision error one measures for the questions is well estimated?

To give an idea of what this number might be, consider first the popular parlor game “Twenty Questions”. One person thinks about something and the rest of the group must guess what it is by asking twenty questions or less. Isn’t it the case that this is almost always possible? Twenty questions is plenty to find out about what someone is thinking even with no prior information. This is, in fact, so automatic that popular toys exist that ask questions and are remarkably good at coming up with what one is thinking in twenty questions or less.

It seems to me, then, that twenty questions is way too many questions to ask in an exam if the purpose of that exam is to figure out if a student is competent in the material covered in class. Using precision error covariance matrices, I can actually calculate what the minimum number of questions are. This can be done by looking at how the precision error estimate varies for a single question as one varies the number of questions it is paired against. As the number of questions it is paired with increase, its precisione error estimate settles down to a value that does not change any further after
a certain threshold is reached. This threshold is the minimum number of questions needed in an exam.

I am now carrying out experiments with the exam data I have to empirically measure the number.

Grading mistake detection with precision error

While making a covariance matrix for eighteen questions in an introductory Physics exam I gave in the Spring of 2006, I discovered another use for the precision error measurements: grading mistake detection.

The figure shown first is my initial try. I computed the student score on each question with the function: f(correct)=1.0 ,f(incorrect)=0.0 .Two graded incorrectly. Note the two dark squares at position 5 and 6.

Further investigation showed that I incorrectly scored the two questions. The second figure shows the matrix after I corrected the grading.All graded correctly Note how the squares at position 5 and 6 are now similar to the others.

The precision error covariance matrix also detects grading anomalies!

Random faster than systematic

I am writing a Mathematica program to produce the precision error signal and reconstruction matrix for an arbitrary number of models. The maximum number I had tried before was ten models because it corresponded to the number of maps we have for the 29 Palms dataset.

My first try consisted of systematically creating all possible permutations of the precision error equations, squaring them, and then storing the coefficients. The program would then systematically look at the equations and augment an independent set every time it found an equation that could not be written as linear combinations of the previous ones in the set.

This worked okay for ten or so models, but I want to produce the full covariance matrix for twenty questions in a multiple-choice exam. No problem, I was making some grilled lamb for Easter dinner yesterday, so I put the computer to work and walked away. Three hours later, the computer was still trying to finish the list of all possible equation permutations! I confess that I have not worked out the combinatorics for the equations yet so perhaps it is of order 20 !. This would be 2,432,902,008,176,640,000 combinations. Compare this to 10 !=3628800 and you can see why the computation got hard quickly.

So my second incantation of the program was to do the computation randomly. Two integers P and Q are picked randomly such that 1 Pm and 1 Qm where m is the number of models. These random numbers are then used to randomly sample the model variables and construct a precision error difference equation. If the equation is independent from the set currently at hand, it is kept, otherwise discarded.

This second version is taking about ten minutes to produce a result. This made me think about how we perceive randomness as haphazardly: “Oh, you are just randomly trying to guess the right answer.” We perceive randomly as wasteful or misguided. The case presented here is just another example of how random is sometimes faster than systematic.

Precision error vectors are rank-1 tensors

My previous post on precision error tensors was misleading. We tend to think of tensors as complicated mathematical structures. Vectors are rank 1 tensors. Driving home from work today, I realized that I had already shown that precision error vectors can be calculated in our horizontal decorrelation estimation paper. So mathematically speaking, I have already shown that precision error should be treated as tensors. The precision error vector is the rank 1 tensor example. The precision error covariance matrix is the rank-2 tensor. Two examples in the usual tensor progression. At some future time I should calculate the rank-3 tensor. How would one induce representations of the Symmetric group in rank-3 tensors?