Who is causing the variations in predictions?

Precision error is a measure of the variation in the predictions of a collection of models. If there are no variations — all the models agree. There is no precision error. One is perfectly in focus as far as one can tell. But, of course, scientific models disagree. So who is to blame? Is it the data or is it the algorithms used to process the data (the models). Having enough models allows you to do decide who is to blame.

Consider the case of a piece of data used to train one of the models that always lead to a disagreement, no matter what algorithm is used to process it. Who is to blame in this case? The optimal choice seems to me to be to decide that the data is bad.

Now consider an algorithm that always disagrees with all the other models no matter what set of predictions are compared. This seems to suggest that the model is wrong, not the data.

In both these cases, the availability of a large number of models is what allows one to distinguish the two cases. Real data will not be as stark as the examples above. Here is where Fourier analysis and probability theory come in. As the number of models increases one is able to disentangle the two. For small number of models, blaming the data or the model would explain the observed variation equally well. As the number of models increase, assigning blame becomes asymmetric!

This is sort of like the “Is it me or is him/her?” question. Comparing ourselves to only one other person does not allow us to decide who is the crazy one. But the more people we interact with, the sooner we realize who is to blame.

I’ll try to come up with a simple example with a few models to illustrate the point mathematically in a later post.

Leave a Reply

Spam protection by WP Captcha-Free