The half life of exam question precision error estimates

It sounds strange to think it. But the precision error formalism I have been discussing is a relation between the data and the models. Who is to blame for the variation that is observed? Is it the data? Is the data noisy? Is it the algorithm used to calculate the model? It could be both.

One way to see this dependency of the error estimation on the data is to consider the diagonal terms in the covariance matrix: the self-correlation in the precision error, the terms of the form δ i 2 . Since this involves only one model, we would expect it to be the error of the model. How does this number vary for a question in an exam as we change the set of questions we compare it against? If we use a few questions, the bias in those questions may skew the bias in our estimate. Another set of a few questions could have a wildly different estimate for the self variance of a question.

As the number of questions that are included in the comparison set are increased, what happens to the fluctuations in the estimates of the self-correlations. Do they decrease? How fast do they decrease? Some preliminary experiments with the exam questions show that with three questions, the estimates vary by 51% of the mean. When eighteen questions are used, they vary by 15%. Fitting these two preliminary data points to an exponential decay curve gives a decay constant for the noise in the precisione error estimate (this sounds so weird — the noise of the noise?). The decay constant one gets from the above figures is fourteen questions.

Fourteen questions seem to be enough to have an estimate of the precision error of each question that is within 27% of the estimate. This seems to suggest that twenty questions is about right after one adds extra questions as insurance padding against data variability.

Minimum number of questions needed for an exam

Another application of the precision error covariance matrix is to find out the minimum number of questions needed for an exam. The linear algebra system derived from the precision erorr equations requires at least three scientific models before one can measure the precision error. More is better. But what is enough? How many questions does one need to ask in an exam to be guaranteed that the precision error one measures for the questions is well estimated?

To give an idea of what this number might be, consider first the popular parlor game “Twenty Questions”. One person thinks about something and the rest of the group must guess what it is by asking twenty questions or less. Isn’t it the case that this is almost always possible? Twenty questions is plenty to find out about what someone is thinking even with no prior information. This is, in fact, so automatic that popular toys exist that ask questions and are remarkably good at coming up with what one is thinking in twenty questions or less.

It seems to me, then, that twenty questions is way too many questions to ask in an exam if the purpose of that exam is to figure out if a student is competent in the material covered in class. Using precision error covariance matrices, I can actually calculate what the minimum number of questions are. This can be done by looking at how the precision error estimate varies for a single question as one varies the number of questions it is paired against. As the number of questions it is paired with increase, its precisione error estimate settles down to a value that does not change any further after
a certain threshold is reached. This threshold is the minimum number of questions needed in an exam.

I am now carrying out experiments with the exam data I have to empirically measure the number.

Tags:

Grading mistake detection with precision error

While making a covariance matrix for eighteen questions in an introductory Physics exam I gave in the Spring of 2006, I discovered another use for the precision error measurements: grading mistake detection.

The figure shown first is my initial try. I computed the student score on each question with the function: f(correct)=1.0 ,f(incorrect)=0.0 .Two graded incorrectly. Note the two dark squares at position 5 and 6.

Further investigation showed that I incorrectly scored the two questions. The second figure shows the matrix after I corrected the grading.All graded correctly Note how the squares at position 5 and 6 are now similar to the others.

The precision error covariance matrix also detects grading anomalies!

Tags:

Random faster than systematic

I am writing a Mathematica program to produce the precision error signal and reconstruction matrix for an arbitrary number of models. The maximum number I had tried before was ten models because it corresponded to the number of maps we have for the 29 Palms dataset.

My first try consisted of systematically creating all possible permutations of the precision error equations, squaring them, and then storing the coefficients. The program would then systematically look at the equations and augment an independent set every time it found an equation that could not be written as linear combinations of the previous ones in the set.

This worked okay for ten or so models, but I want to produce the full covariance matrix for twenty questions in a multiple-choice exam. No problem, I was making some grilled lamb for Easter dinner yesterday, so I put the computer to work and walked away. Three hours later, the computer was still trying to finish the list of all possible equation permutations! I confess that I have not worked out the combinatorics for the equations yet so perhaps it is of order 20 !. This would be 2,432,902,008,176,640,000 combinations. Compare this to 10 !=3628800 and you can see why the computation got hard quickly.

So my second incantation of the program was to do the computation randomly. Two integers P and Q are picked randomly such that 1 Pm and 1 Qm where m is the number of models. These random numbers are then used to randomly sample the model variables and construct a precision error difference equation. If the equation is independent from the set currently at hand, it is kept, otherwise discarded.

This second version is taking about ten minutes to produce a result. This made me think about how we perceive randomness as haphazardly: “Oh, you are just randomly trying to guess the right answer.” We perceive randomly as wasteful or misguided. The case presented here is just another example of how random is sometimes faster than systematic.

Half-life of English irregular verbs

I picked up a copy of this month’s Discover magazine and found an interesting news item on the half-life of English irregular verbs. This piqued my interest since I have been doing some studying of Natural Language Processing to see how precision error could be used in the field.

A Student’s Introduction to English Grammar (co-written by one of the principals at the Language Log) defines irregular verbs as those that do not have a well-defined rule to generate their inflectional forms. The preterite form of “walk” is “walked”. “Walk” is a regular verb that uses the “-ed” rule for forming the preterite and past participle inflections. On the other hand, “fly” is an irregular verb since the preterite form is given by “flew” and the past participle by “flown”.

Erez Lieberman and co-authors did a quantitaive study of how often irregular verbs in English turn regular. From historical records (Old English -> Middle English -> Modern English) they were able to determine that the half-life of irregular verbs was proportional to the square root of their frequency. An irregular verb a 100 times less frequent in daily use than another verb will regularize 10 times faster than the frequently used one.

The idea of a half-life comes from nuclear physics. Given a sample of n radioactive atoms, the half-life is the average time you have to wait for half the atoms to decay to another type. The half-life of the uranium isotope U-230 is about 4.5 billion years. This, by the way, explains why we can still find U-238 on Earth (which is, itself, 4 billion years old). If the half life of U-238 was a million years or less, it would all have disappeared by the time we became clever enough to discover radioactivity (about a hundred years ago).

The half life for verbs with a frequency of 1 /100 to 1 /1000 is estimated to be 5,400 years. Examples of verbs in this frequency bin are: “begin” and “help”. “Begin” is still irregular (”began”) but “help” decayed from “holp” to “helped” sometime between Middle English and Modern English. Although the Oxford English Dictionary says “holp” is still used in obscure American dialects. The OED quotes Mark Twain in “The Prince and the Pauper” as saying: “Of a truth I was right — he hath holpen in a kitchen.”

The most common verbs — “be” and “have” — have not been observed to decay but extrapolating using the square root of the frequency rule allows the authors to estimate a half-life of 39,000 thousand years! In other words, English as a language will probably die before “be” becomes regular.

Precision error tensors?

In previous posts I talked about precision error matrices as being tensors. Boy, was I wrong! This is another case of my intuition getting way ahead of my math and science. I know just enough math to shot myself in the foot with these speculations. I’ll explain.

Matrices are multi-dimensional arrays of numbers. A two-dimensional matrix M needs two indices i and j to specify a component M ij. A three-dimensional matrix would need three indices and so on. Tensors can be thought of as matrices but the converse is not true. Not all matrices are tensors. That is where I went wrong.

Tensors are multi-dimensional geometrical objects. Yes, they can be represented by matrices but their true hallmark is that they transform correctly under coordinate transformations. The simplest example of the geometrical nature of tensors can be made with a vector. Take a vector drawn on a sheet of paper. No coordinate system has been drawn on the paper. The vector exists independent of any coordinate system. It has a length, for example, and we need no coordinate system to measure it — just a ruler. Two different coordinate systems can be put on the paper that would result in completely different components for the vector. What makes the vector a tensor is that given a coordinate transformation from one system to the next the vector transforms in such a way that both coordinate systems agree on the length of the vector.

This is the geometrical signature of tensors. Different coordinate systems (observers in the parlance of General Relativity) may have different components for the matrices they use to represent a tensor. But they agree on geometrical properties such as the length of a vector or the area of a polygon.

My claim that precision error matrices can be made into tensors may be correct, but I definitely have not proven it until I can show that the tensors I define transform properly under coordinate transformations.

Tags:

Books of the week

I’ve been nibbling on a bunch of books for the past week. They are, in no particular order:

Mirage: Napoleon’s Scientists and the Unveiling of Egypt deals with the scientific side of Napoleon’s famous imperialistic debacle — the 1798 invasion of Egypt. We tend to think of historical knowledge as continuous in time. If we know something now, everyone in the past must have known it. This book shatters that illusion. Ancient Egypt had been lost to humanity for centuries. The savants in the expedition started the recovery of this lost civilization. One young scientist that participated in the expedition was Joseph Fourier. I have read many biographical sketches of Fourier but I do not recall ever reading that he was part of Napoleon’s Egypt “expedition”. One of the categories in this blog, “Fourier analysis”, is named after him. We can thank Fourier for many things but one that comes immediately to mind is the MP3 music file standard.

The World Without Us has been getting a lot of press. It clearly deserves it. The premise — what would happen to the world if we just disappeared overnight — forms a great hook on which to hang all sorts of scientific observations about biology, the durability of materials, the relentless march of entropy, human evolution, and much more. I highly recommend this book.

Variations in student responses to a multiple exam for latent group discovery

Questions in an exam are detectors of student competency. Students are detectors of the correct answers in a test. What is the variation in the student’s model of the correct exam? The precision error equations can be used to construct a covariance matrix for the students instead of the questions. What makes the difference is what is being averaged. When you want to use a test to tell you something about the questions, you average over the students. The covariance matrix is then indexed by the questions. When you want the test to tell you about the students, you average over the questions. The covariance matrix is indexed by students.

All of this suggests that it would be possible to build a completely parameter-less approach to detecting latent groups in students. This would be a different approach from that involved in topic models which use a specific probability distribution — the Dirichlet distribution. In this approach, you would assume a number of groups and arbitrarily assign students to these groups in a probabilistic fashion (60% group 1, 30% group 2, etc.). One can then see how well this group distribution predicts the observed covariance matrix by use of non-commutative harmonic analysis. Group assignment is thereby completely determined by the data — no parameters are needed.

Tags:

MathML enabled MediaWiki

I have started the update of MediaWiki 1.11.2 to incorporate Blahtex functionality. My goal is to have pages that validate as correct XHTML+MathML. The work is being detailed here.

The update is not trivial. The current hacked version of MediaWiki at BerliOS that incorporates blahtex is based on version 1.7. The MediaWiki installation I am playing with is at the latest version: 1.11.2. To figure out how to concentrate on the relevant parts, I am using Unix utilities diff and wc. Basically, I start by counting lines and try to figure out how many new lines are accounted for by new classes or functions added.

Tags:

Enabling math in MediaWiki

I have started the painful process of enabling MathML support on MediaWiki. This is crucial for my use of wiki technology in my workflow. Check out the first step I have taken: enabling texvc (which produces either HTML or .png output).

The next step is to connect it with blahtex so I produce true MathML and correct XHTML headers with MathML. This is not trivial since the blahtex hacked distribution over at BerliOS is at version 1.7 for MediaWiki but the current version is at number 1.11.2. This means that I have to go thru the list of blahtex modified files in the current version.

I have done something similar for MathML on this blog (using itex2MML). It clearly is the case that the math enabled web is still far from being the default. When I have time, and if I succeed in creating a strict XHTML+MathML MediaWiki version, I’ll document my steps somewhat similarly to the steps I took with WordPress .

Tags: