Posted in Error Theory, Fourier Analysis, Group Theory, Machine Learning | March 13th, 2008 No Comments »
Questions in an exam are detectors of student competency. Students are detectors of the correct answers in a test. What is the variation in the student’s model of the correct exam? The precision error equations can be used to construct a covariance matrix for the students instead of the questions. What makes the difference is what is being averaged. When you want to use a test to tell you something about the questions, you average over the students. The covariance matrix is then indexed by the questions. When you want the test to tell you about the students, you average over the questions. The covariance matrix is indexed by students.
All of this suggests that it would be possible to build a completely parameter-less approach to detecting latent groups in students. This would be a different approach from that involved in topic models which use a specific probability distribution — the Dirichlet distribution. In this approach, you would assume a number of groups and arbitrarily assign students to these groups in a probabilistic fashion (60% group 1, 30% group 2, etc.). One can then see how well this group distribution predicts the observed covariance matrix by use of non-commutative harmonic analysis. Group assignment is thereby completely determined by the data — no parameters are needed.
Tags:
latent labels
Posted in MathML | March 7th, 2008 No Comments »
I have started the update of MediaWiki 1.11.2 to incorporate Blahtex functionality. My goal is to have pages that validate as correct XHTML+MathML. The work is being detailed here.
The update is not trivial. The current hacked version of MediaWiki at BerliOS that incorporates blahtex is based on version 1.7. The MediaWiki installation I am playing with is at the latest version: 1.11.2. To figure out how to concentrate on the relevant parts, I am using Unix utilities diff and wc. Basically, I start by counting lines and try to figure out how many new lines are accounted for by new classes or functions added.
Tags:
MediaWiki
Posted in MathML | March 7th, 2008 No Comments »
I have started the painful process of enabling MathML support on MediaWiki. This is crucial for my use of wiki technology in my workflow. Check out the first step I have taken: enabling texvc (which produces either HTML or .png output).
The next step is to connect it with blahtex so I produce true MathML and correct XHTML headers with MathML. This is not trivial since the blahtex hacked distribution over at BerliOS is at version 1.7 for MediaWiki but the current version is at number 1.11.2. This means that I have to go thru the list of blahtex modified files in the current version.
I have done something similar for MathML on this blog (using itex2MML). It clearly is the case that the math enabled web is still far from being the default. When I have time, and if I succeed in creating a strict XHTML+MathML MediaWiki version, I’ll document my steps somewhat similarly to the steps I took with WordPress .
Tags:
MediaWiki
Posted in Error Theory, Machine Learning | March 5th, 2008 No Comments »
One possible application of the precision error tensors framework is to use it as a criterion for selecting the number of clusters needed to describe a dataset. The number of clusters problem refers to the generic problem of deciding how many clusters describe a dataset. Many clustering algorithms exist. Deciding which one is appropriate in a particular task is up to the investigator. Suppose that one has settled on a clustering algorithm. An algorithm like k-clusters has no natural stopping criterion. You dial in how many clusters you want, i.e. you manually set the value of , and the algorithm gives you the data clustered into groups.
Putting aside the correctness of using a particular algorithm for clustering a specific dataset, we can ask: what number of clusters gives me the smallest precision error? This provides an automatic algorithm for deciding on the optimal number of clusters given the chosen algorithm and the dataset to which it is applied.
Tags:
number of clusters problem :
precision error
Posted in Scientific Readings | March 4th, 2008 No Comments »
Author Stewart Mader makes a convincing case in his wikipatterns book that wikis are a powerful collaboration tool. I have dabbled briefly with wikis. I’ve come to rely more and more on wikipedia to understand technical terms quickly. I just don’t practice collaboration with them.
This may change if a grant that we currently have pending with the NSF is approved. We proposed a collaboration with New Mexico scientists at the Jornada Experimental Range to develop a photogrammetric system for UAV images. Collaboration management was a mandated section of the proposal and we included a slew of tools that we currently use — revision control, emails, issue tracking — as well as the MediaWiki software in a list of tools we intend to use to facilitate managing the work related to the collaboration.
The more tools like Wikis become part of our scientific practice, the more I wondered how anybody got things done in the past. How did scientists communicate before emails? I know letters were written. I wrote a few of them back in graduate school. But it seems so strange now to think of writing a letter to someone instead of sending an email.
Tags:
emails :
wikis
Posted in Randomness | March 4th, 2008 No Comments »
My previous post on precision error tensors was misleading. We tend to think of tensors as complicated mathematical structures. Vectors are rank 1 tensors. Driving home from work today, I realized that I had already shown that precision error vectors can be calculated in our horizontal decorrelation estimation paper. So mathematically speaking, I have already shown that precision error should be treated as tensors. The precision error vector is the rank 1 tensor example. The precision error covariance matrix is the rank-2 tensor. Two examples in the usual tensor progression. At some future time I should calculate the rank-3 tensor. How would one induce representations of the Symmetric group in rank-3 tensors?
Tags:
precision error :
tensors
Posted in Error Theory, Machine Learning | March 1st, 2008 No Comments »
Mathematical objects have dimensions associated with them. The temperature outside my house is measured as a single number or scalar. It is a one-dimensional quantity. This fact can be observed in how mercury thermometers are built: they are a long tube or line. Thermometers are never built as squares.
The position of house in a city is an example of a two dimensional quantity. It requires two numbers to specify and is therefore two-dimensional. This fact is obvious in that maps of cities are usually printed in a sheet of paper not a very thin strip of paper. The position of the house is expressed as a vector. This vector can be expressed as an ordered series of numbers of the form . Another way to represent the vector is just with the single symbol . You tell me the value of and I go down the list and read off the component .
Generalizing further, we can have matrices like the precision error covariance matrix I have been going on and on about all these months. This matrix can be represented by the symbol . You now have to tell me two numbers, ad , for me to read off the correct entry in the matrix.
We can keep playing this game forever. It is possible to invent mathematical quantities of the form . Three “indices” need to be specified to read off an entry. You can think of this as a cube of numbers.
Precision error covariance matrices can also be generalized to precision error tensors. Instead of just asking how are the errors between two models correlated, we can ask how are the errors of three models correlated. We can have a cube of cross-correlations between the different model errors!
Tags:
covariance matrix :
precision error :
tensors
Posted in Error Theory, Machine Learning, Randomness | February 29th, 2008 No Comments »

I have applied the autonomous difference equations to test the quality of ten out of twenty questions I used in a Physics exam I gave in the Spring of 2006 to an introductory class for engineering students. That dark square in position six of the matrix corresponds to the question least likely to be answered correctly. Only 64 students out of 250 answered correctly. The reason this happened was that I gave a very clever wrong answer that attracted most of the students (the correct answer but forgetting to take a square root). I have used the precision error covariance matrix to assess the test maker not the students!
This example also highlights the general applicablity of the precision error covariance matrix. There now exist two experimental verifications of its usefulness: digital elevation maps and test assessment.
Tags:
precision error covariance matrix
Posted in Scientific Readings | February 29th, 2008 1 Comment »
The scientist Faraday was self-educated. As a young man he was an apprentice to a bookbinder. He read many of the scientific works he bound. It is said that his notebooks were beautifully bound by him. I visited the Faraday museum in 2004 but the notebooks were only accessible to scholars and did not form part of the public display. I, too, have notebooks of my scientific work and because of this I have picked up little historical tidbits about the usefulness and devotion of scientist to their notebooks.
I am now reading The Telephone Gambit. At its center is an abrupt time gap in Alexander Graham Bell’s notebook just before his invention of the telephone. I have not finished the book yet. But the author makes a convincing case that Bell stole the idea from a patent by Elisha Gray.
Tags:
scientific notebooks
Posted in Randomness | February 26th, 2008 No Comments »
Today I was able to look at the precision error equations for the digital elevation models and see for the first time that they can be trivially generalized to other machine learning fields like information retrieval, information extraction, and bioinformatics. This is so embarrassingly simple that I cannot stop coming up with new variants every hour or so! The pattern is easy to explain, so I think any reader of this should be able to come up with a variant that applies to their area of work. Please let me know if you do so, I would like to start a catalogue of the many ways this can be done.
Here is the pattern. I’ll start with information retrieval. Assume you have a system that creates a relevance model of a corpus of documents. These relevance models are judgments of the form . The notation is meant to capture that the relevance judgment is a binary decision on whether a particular document is relevant to a query . Maybe you made that relevance model using maximum likelihood estimates, or maybe you used Latent Dirichlet Allocation. Each way you calculate that relevance judgment is a model. Assume that you have used different algorithms, or different parameter settings, or whatever, to come up with different relevance judgments for a set of test queries.
Each relevance judgment of a specific model can be written as
Now consider the following quantities that can be calculated with these many relevance models
These quantities would not include the value. It would cancel out. So the above equation can be written as
These equations would allow you to recover the precision errors for a collection of information retrieval models!
The pattern can now be generalized ad-infinitum to any machine learning task! You pick the model prediction for which you want to measure the precision error.