Archive for the 'Mathematics' Category

ICML accepts precision error via L1 minimization paper

Our technical report on how to recover precision error estimates with 1 -minimization has been accepted by the 2008 International Conference on Machine Learning.

The paper originally got three anonymous reviews. Two were positive, one strongly negative. In our response to the reviews, we agreed with the general criticism by the reviewers that one experimental demonstration is not enough. In our precision error papers so far, we have only been using one dataset — aerial photographs from the Twenty-Nine Palms region in California. So we are going to include some results from North Carolina forest data to show that our technique works for all sorts of images.

Readers of previous posts may note that besides maps, the precision error has been recovered for questions in a multiple-choice-quesiton (MCQ) exam. It would be nice to include this in our ICML paper, but the title of the paper is “Autonomous geometric precision error estimation in low-level computer vision tasks” so it seems incongruous to do so.

The paper was submitted in early January. Afterwards, we realized that our precision error technique for elevation errors in maps applies to any set of models that make scalar predictions about multiple entities. We are now working on a draft for a Science magazine article that will combine the examples from maps and exams to illustrate the wide applicability of our technique.

Precision error for parse trees

The precision error equations require that “ground truth” cancel out. It is easy to see what that means for elevations in a map. What does it mean for parse trees in a natural language processing task like sentence parsing?

One way to define distance between trees is to consider the total number of reverse operations that bring them back to a common ancestor. Is that number equal to the number one would get by comparing everything to the “true” parsing? That is, the observed parse prediction’s distance is equal to the true parse distance plus the distance created by the error-transformations.

Substraction makes sense to me in the context of trees: you take everything after the common ancestor. What is addition of parse trees? The union of all edges and vertices. Parse trees are graphs after all.

This addition and subtraction of graphs means that we can use the precisione error equations. Parse trees are added and substracted. In the end, a score is assigned to the difference by counting the number of operations it would take to collapse the resulting graph to disconnected single ancestors.

How do I get a bunch of parsing models to test this idea out?

Variations in student responses to a multiple exam for latent group discovery

Questions in an exam are detectors of student competency. Students are detectors of the correct answers in a test. What is the variation in the student’s model of the correct exam? The precision error equations can be used to construct a covariance matrix for the students instead of the questions. What makes the difference is what is being averaged. When you want to use a test to tell you something about the questions, you average over the students. The covariance matrix is then indexed by the questions. When you want the test to tell you about the students, you average over the questions. The covariance matrix is indexed by students.

All of this suggests that it would be possible to build a completely parameter-less approach to detecting latent groups in students. This would be a different approach from that involved in topic models which use a specific probability distribution — the Dirichlet distribution. In this approach, you would assume a number of groups and arbitrarily assign students to these groups in a probabilistic fashion (60% group 1, 30% group 2, etc.). One can then see how well this group distribution predicts the observed covariance matrix by use of non-commutative harmonic analysis. Group assignment is thereby completely determined by the data — no parameters are needed.

Geometry in the dark

In a previous post I erroneously claimed that the mathematician Hilbert advocated teaching geometry in the dark. Hilbert’s “Foundations of Geometry” axiomatized the subject and carried out its exposition without a single diagram. I found the correct attribution yesterday while re-reading Hofstadter’s foreword to “King of Infinite Space”, a biography of geometer David Coxeter.

Hofstadter does not say for certain who advocated the practice of geometry in the dark and can only recollect that it was some 19th century geometer — possibly Steiner, Plucker, von Staudt, or Feuerbach.

To err is human, to study your errors is glorious

I’ve been sick all week but today has been the worst. In between my sleeping hallucinations I have been thinking a lot about a proposal I’m currently writing on the use of non-commutative harmonic analysis to study mapping error patterns. It has become clear that the approach we are advocating at the AIRS lab is applicable to other areas of machine learning. Before I describe in additional detail what I mean by this let me present a graphic that abstractly represents the scientific enterprise: An abstract representation of the scientific enterprise
The work I am describing here lies at the bottom of the picture. It is an algorithmic prescription for understanding the precision of our models given a dataset used to construct those models. I present this diagram to make clear the limitations of our work. It is not a description or explanation of errors in general. It is a technique for probing the error patterns in your system. The hard work is still left to you on how to apply it for a specific system that constructs models from data, and its usefulness is not guaranteed. The technique may tell you nothing interesting about your system.

We can view model creation as a black box. It takes data inputs and produces a model. The crucial point is that in some machine learning situations the number of models we can build is very large. Data is model-redundant.The redundancy is sometimes continuous, for example, the initial position of a camera or the weight of a Lagrangian term. But it can also be discrete — a finite set of documents or photographs. Changing the data inputs can then be used to probe the variation of a system’s model predictions. These variations will not be completely random (i.e. patternless). Some documents are more informative, some photographs give us a better view. Therefore we can use the symmetry groups associated with our data inputs to Fourier analyze the model variation. This model variation or model precision is informative about the quality of the data and can be used to reject bad data or discount lower quality inputs.

Diagrams in Greek mathematics

The Archimedes Codex is turning out to be a great read on the importance of diagrams in Greek mathematics, the transmission of ancient knowledge to present times and modern document forensic techniques. The book is written by two principals of the Archimedes Palimpsest Project. I just finished reading Reviel Netz’s explanation of why visual thinking has become so reviled in modern mathematics and it was so simple to understand that I want to share it with readers

As Netz explains it, the problem with diagrammatic proofs hinges on the fact that diagrams do not have the generality of language. For example, if one wants to discuss triangles in general, a diagram of a triangle thwarts that generality since, by construction, it represents a specific triangle. The ambiguity of language is turned to good use by turning it into an encompassing generality.

Netz argues that Greek diagrams are schematic not illustrative. Evidence from the Palimpsest and later medieval documents strongly suggests that Archimedes drew a polygon inscribed inside a circle with circular arcs rather than straight lines in his “Spheres and Cylinders”. He also drew straight lines for sections of a spiral in “On Spirals”. All of this is the reverse of what modern diagrams in editions of Archimedes would do. Our diagrams are illustrative, theirs were schematic.

This schematic versus illustrative distinction is Netz’s explanation for why the Greeks never made a logical mistake in their mathematical works even though the diagrams are central to their exposition.

This bias in 20th-century mathematics to visual proofs (Coxeter mentions that Hilbert thought geometry should be taught in a darkened room!) is now creating a backlash that could bring diagrams back into the heart of proofs. Take a look at Euclid and His Twentieth Century Rivals: Diagrams in the Logic of Euclidean Geometry for how that most non-geometric machine — the computer — is making visual proofs rigorous.

X-raying the geometric precision error of DEMs with Fourier analysis

In a previous post I mentioned a way of Fourier analyzing the geometric precision error of DEMs. Today I realized that the scheme I proposed can only account for part of the error signal. The approach I proposed is correct but it can only capture one particular aspect of the total error. The simplest way of seeing this is to consider the S 2 symmetry group. This would be the one to use for p=2 photographs. From two photographs I can produce two DEMs: AB and BA. The covariance matrix for these two DEMS would be a 2×2 covariance matrix of the form:
(ab bc).
But the representation induced by S 2 on these two DEMs generates the matrices:
(1 0 0 1 )and(0 1 1 0 )
These two matrices cannot capture the three independent degrees of freedom in the 2×2 covariance matrix. Therefore, the induced representation cannot capture all of the possible errors that are observed when two photograps are used to produce two maps. But the representation would allow you to project out that component of the error that is explained by permutations of the images.

You would need at least p=7 photograps to have enough members in S p to completely model the variation in the DEMs observed when you use two photographs to produce a map. To understand the error that cannot be explained by the permutation group you would need to use three photographs to create a DEM. For p photographs this would create p*(p1 )*(p2 ) DEMs. Since we can produce as much or even more than p! DEMs from p photographs, at some point we will always overwhelm the representational power of the symmetry group of p objects (in this case, p photographs). What error remains after we project out the component that can be modelled by S n? I hypothesize that it would be error that can be further Fourier analyzed by using the symmetry group associated with the orientation and positions of the cameras. These parameters are themselves error prone and would, by virtue of their geometry, only induce certain error patterns.

This viewpoint of the errors would therefore view the observed error as one that can be captured by a succesive series of symmetry groups. One component would be that related to the finite group of S p. Another component would be that one induced by translations and rotations of the camera positions and orientations. Like any real theory of errors, this approach would only peel away layers of error — always remaining would be a nugget of error that would require more and more complex models to decompose. The second law of thermodynamics is not violated!

The metaphor to x-raying in the title of this post comes from using Fourier analysis to study X-ray diffraction photographs by crystals. Crystals induce a certain periodicity on the scattered X-rays even when the sample is crushed into a powder. In other words, the randomly scattered blocks of crystal in the powder individually send a perfect difraction pattern. But the X-ray photograph records the mismash of the signals — the picture is blurry. Nonetheless, the bluriness has a symmetry component that comes from the periodic structure of the crystals and therefore Fourier analysis is able to pick the symmetry in the x-ray caused by the crystal periodicity. The Fourier decompositions for geometric errors are doing the same thing. There are many sources of errors in DEMs from aerial photographs. Some come from the fact that you used individual photographs to create the maps. This component of the error can therefore be accounted by studying representations of the symmetry group of p objects. Others come from uncertainty in the position or orientation of the camera when it took the photograph. These are explained by induced representations of non-abelian Lie groups like 3-D rotations in the space of covariance matrices.

Error covariance matrices as images

I submitted my paper on autonomous precision error estimation in 3-D models to the 2008 International Conference on Machine Learning yesterday. One week early, too, a first for me! The format for the paper is the standard double column format and this makes it very hard to have complex equations in the paper. One mathematical object that is hard to display are the covariance matrices for the DEM errors that I keep talking about in these posts. These are nxn matrices of real numbers. One particular example I use comes from images of a desert terrain in the Twenty-Nine Palms area in California. We have four photographs and can therefore produce 12 =4 *3 DEMs. Because of mistakes, two of the DEMs have to be dropped so I end up with 10 DEMs. The resulting covariance matrices are then 10×10 matrices — a hard thing to display in the double-column format since now you have to present 10 numbers in row. So I have hit upon a simple graphical way to present them that saves space but also ends up being more informative to the reader (or me) about the structure of the matrix.

The idea is to turn the 10×10 matrix into a 10×10 pixel image. Each pixel is now a shade of gray. The highest value in the matrix gets the darkest shade, the lowest gets the lighest. Here is an example that illustrates our correlated-pair error modelCovariance matrix for 10 DEMs of a desert terrain in the Twenty-Nine Palms region in California The only terms that are “turned on” are those along the diagonal. In contrast, here is the covariance matrix when you do 1 -minimization and do not assume beforehand that certain DEMs are uncorrelated with each other.Full covariance matrix for 10 DEMs of the Twenty-Nine Palms dataset So the correlated-pair error model is close to the actual covariances but we see that there are some cross-correlations off the diagonal that are on, albeit weaker than those on the block-diagonal defined by the asymmetric DEM pairs.

I apologize for the strange layout of the mages relative to the text of this post but my WordPress instalation does not save changes that I make to the img tag to identify it as requiring it to have text flow around it.
In any case, I hope this illustration makes clear some of the more abstract ideas I have been discussing about errors in DEMs.

Fourier theory of DEM precision errors

I’ve finished the experiments with different reconstruction matrices for the DEM precision error and I get a rock solid result independent of which reconstruction matrix I use. So my hypothesis that randomness may be used to increase the precision error was wrong. In the process, however, I have finally understood how to use the symmetry group S n to Fourier analyze the covariance matrix. This has lead me to consider generalizations of our current approach that rely on the asymmetry of stereo matching algorithms.

The covariance matrix for our current procedure for creating maps is made up of photographic pairs. From two images, A and B, we create DEMs AB and BA. So n photographs lead to n*(n1 ) DEMs. The resulting covariance matrix can be Fourier analyzed by considering the representation induced by the symmetry group for n objects (in our case the photographs) on the n*(n1 ) space. That is, for each element of the group, call it π, we define M AB,CD=1 if π(A)π(B)=CD. This matrix representation can then be decomposed into its irreducible components to carry out the Fourier transform.

The above construction can then, in turn, be generalized by using the asymmetry of stereo matching algorithms. One constructs DEMs of the form ABC. This will not, in general, produce the same DEM as ACB and so on. There will be n*(n1 )*(n2 ) ways of constructing these DEMs. A representation of the group can then be induced by generalizing the rule in the previous paragraph. Bringing in more photographs into the chain will induce higher and higher dimensional representations of the symmetry group. But note that all these representations are, by construction, smaller or equal to the n! dimensionality of the symmetry group itself. Higher dimensional representations could be constructed because an arbitrary DEM like ABAC will not be equivalent to the AC DEM, for example. The matching process being imperfect will not return to the same pixel when the matching chain is of the form ABA.

None of these more complicated DEM production processes will lead to anything interesting if there were no errors in the matching process. If creating a 3-D model from photographs was perfect, all the DEMs would be error free and the covariance matrix would be proportional to the identity matrix. In other words, the Fourier decomposition of the covariance matrix is interesting because there is a symmetry to the errors. I’ll keep readers updated on the results of this line of inquiry as I obtain concrete results.

NIPS interesting paper on group theory and Fourier analysis applied to inference

This week I am at the annual Neural Information Processing Systems Conference a fascinating conference that combines many of my scientific interests on machine learning, computer vision, statistics, and natural language processing. Last night I visited the poster by Jonathan Huang on efficient inference for distributions on permutations.

The paper considers the problem of how to reason probabilistically in a tracking task where you have sporadic tracking information of objects. Juang and co-authors end up using concepts like irreducible representations and Clebsch-Gordan coefficients. This may be unfamiliar concepts to the reader but to me they sound like a distant echo of all my physics training since these concepts are all over quantum mechanics and quantum field theory. What a cool paper! I’ll definitely be studying this paper since I have been interested in the issue of permutations with my work on recognizing answer patterns in multiple-choice exams.