NIPS interesting paper on group theory and Fourier analysis applied to inference

This week I am at the annual Neural Information Processing Systems Conference a fascinating conference that combines many of my scientific interests on machine learning, computer vision, statistics, and natural language processing. Last night I visited the poster by Jonathan Huang on efficient inference for distributions on permutations.

The paper considers the problem of how to reason probabilistically in a tracking task where you have sporadic tracking information of objects. Juang and co-authors end up using concepts like irreducible representations and Clebsch-Gordan coefficients. This may be unfamiliar concepts to the reader but to me they sound like a distant echo of all my physics training since these concepts are all over quantum mechanics and quantum field theory. What a cool paper! I’ll definitely be studying this paper since I have been interested in the issue of permutations with my work on recognizing answer patterns in multiple-choice exams.

Compressed sensing, underdetermined linear systems, and autonomous error estimation

I am co-writing a proposal to the National Science Foundation to apply ideas from compressed sensing to the problem of autonomously estimating mapping errors. This has got me thinking about this quote from Dave Donoho’s paper For Most Large Underdetermined Systems of Linear Equations, the minimal $l_1$-norm solution is also the sparsest solution.:

Many situations in science and technology call for solutions to undetermined systems of equations, i.e. systems of linear equations with fewer equations than unknowns. Examples in array signal processing, inverse problems, and genomic data analysis all come to mind. However,any sensible person working in such fields would have been taught to agree with the statement: “you have a system of linear equations with fewer equations than unknowns. There are infinitely many solutions.” And indeed, they would have been taught well. However, the intuition imbued by that teaching would be misleading.

On closer inspection, many of the applications ask for sparse solutions of such systems, i.e. solutions with few nonzero elements; the interpretation being that we are sure that ‘relatively few’ of the candidate sources, pixels, or genes are turned ‘on’, we just don’t know a priori which ones those are. Finding sparse solutions to such systems would better match the real underlying situation. It would also in many cases have important practical benefits, i.e. allowing us to install fewer antenna elements, make fewer measurements, store less data, or investigate fewer genes.

We have been taught to throw up our hands whenever a problem is labeled as undeterminate — it has fewer equations than unknowns. But this is too pessimistic. What indeterminacy really means is something more like an iron-clad guarantee. To say that a linear system $Ax = y$ is undeterminate means that we cannot solve for a unique $x$ given any $y$. In other words, the null space of the linear operator $A$ has a non-zero dimension. But here is the crucial caveat that underlies compressed sensing as stated by Donoho above –most natural signals are sparse. We do not get to see all possible $y$ outputs. The space of all facial images is much smaller than the space of all possible images. This is precisely the point of manifold learning techniques in machine learning. So indeterminacy just means that we do not a 100% guarantee. But in a very practical sense, this does not matter when considering many natural signals. We can recover the input signal in spite of undeterminacy.

Examples of this phenomena are experienced by computer power users all the time. Text compression algorithms, which are lossless, cannot be a 100% percent successful. It can be proven that for some input string the compression will fail: the “compressed” text will be larger in size than the uncompressed one. However, when was the last time someone using gzip saw this happen? It does not happen because natural language text does not span the space of all possible character strings. A thousand monkeys pounding on keyboards produce mostly gibberish. This sparsity of natural language means that we can implement practical text compression algorithms that will compress text with an effective rate of 100%.

Rediscovering the usefulness of Euler’s formula

I have started to work thru Anton Deitmar’s “A First Course in Harmonic Analysis”. This is background reading for my perusal of the Compressed Sensing literature. On the first chapter he wants to prove the convergence in the $L^2$-norm of Fourier series for periodic functions. To do this he asserts (offhandedly to my untrained eyes) the equality
$$\sum_{k=1}^{n} \cos(2 \pi k x) = \frac{\sin((2n+1)\pi x)}{2\sin(\pi x)} – \frac{1}{2}.$$

It is embarrassing how long it has taken me to prove this on my own. In the end, Euler’s formula,
$$e^{i \theta} = \cos \theta + i \sin \theta,$$
ended up being the right thing to use since it turns the proof into checking the equality of polynomials. My proof is inductive. First prove the $n=1$ case,
$$\cos(2 \pi x) = \frac{\sin(3 \pi x)}{2 \sin(\pi x)} – \frac{1}{2},$$
by rewriting it as,
$$\sin(\pi x) \cos(2 \pi x) = \frac{1}{2} \left( \sin(3 \pi x) – \sin(\pi x) \right)$$
$$\left( \frac{e^{i \theta} – e^{-i \theta}}{2 i} \right) \left( \frac{e^{i 2 \theta} + e^{-i 2 \theta}}{2} \right) = \frac{1}{2} \left( \frac{e^{i 3 \theta} – e^{-i 3 \theta}}{2 i} – \frac{e^{i \theta} – e^{-i \theta}}{2i} \right),$$
the demonstration of the equality becomes a polynomial check. Induction to the case $n+1$ by using the same trick with Euler’s formula leads to the general proof of the formula.