Precision error equations for information retrieval, information extraction, and bioinformatics

Today I was able to look at the precision error equations for the digital elevation models and see for the first time that they can be trivially generalized to other machine learning fields like information retrieval, information extraction, and bioinformatics. This is so embarrassingly simple that I cannot stop coming up with new variants every hour or so! The pattern is easy to explain, so I think any reader of this should be able to come up with a variant that applies to their area of work. Please let me know if you do so, I would like to start a catalogue of the many ways this can be done.

Here is the pattern. I’ll start with information retrieval. Assume you have a system that creates a relevance model of a corpus of documents. These relevance models are judgments of the form r(d,q)={0,1 }. The notation is meant to capture that the relevance judgment is a binary decision on whether a particular document d is relevant to a query q. Maybe you made that relevance model using maximum likelihood estimates, or maybe you used Latent Dirichlet Allocation. Each way you calculate that relevance judgment is a model. Assume that you have used different algorithms, or different parameter settings, or whatever, to come up with n different relevance judgments for a set of test queries.

Each relevance judgment of a specific model i can be written as
r estimated(d,q) i=r true(d,q) i+δ(d,q) i

Now consider the following quantities that can be calculated with these many relevance models
q,d(1 E i=1 Er i(d,q))(1 M j=1 Mr j(d,q))
These quantities would not include the r true value. It would cancel out. So the above equation can be written as
q,d(1 E i=1 Eδ i(d,q))(1 M j=1 Mδ j(d,q)). These equations would allow you to recover the precision errors {δ i} for a collection of information retrieval models!

The pattern can now be generalized ad-infinitum to any machine learning task! You pick the model prediction for which you want to measure the precision error.

Leave a Reply

Spam protection by WP Captcha-Free