To err is human, to study your errors is glorious
I’ve been sick all week but today has been the worst. In between my sleeping hallucinations I have been thinking a lot about a proposal I’m currently writing on the use of non-commutative harmonic analysis to study mapping error patterns. It has become clear that the approach we are advocating at the AIRS lab is applicable to other areas of machine learning. Before I describe in additional detail what I mean by this let me present a graphic that abstractly represents the scientific enterprise: 
The work I am describing here lies at the bottom of the picture. It is an algorithmic prescription for understanding the precision of our models given a dataset used to construct those models. I present this diagram to make clear the limitations of our work. It is not a description or explanation of errors in general. It is a technique for probing the error patterns in your system. The hard work is still left to you on how to apply it for a specific system that constructs models from data, and its usefulness is not guaranteed. The technique may tell you nothing interesting about your system.
We can view model creation as a black box. It takes data inputs and produces a model. The crucial point is that in some machine learning situations the number of models we can build is very large. Data is model-redundant.The redundancy is sometimes continuous, for example, the initial position of a camera or the weight of a Lagrangian term. But it can also be discrete — a finite set of documents or photographs. Changing the data inputs can then be used to probe the variation of a system’s model predictions. These variations will not be completely random (i.e. patternless). Some documents are more informative, some photographs give us a better view. Therefore we can use the symmetry groups associated with our data inputs to Fourier analyze the model variation. This model variation or model precision is informative about the quality of the data and can be used to reject bad data or discount lower quality inputs.
