Last year I realized that a classifier with robust error performance could be created by feeding the errors of a front-end algorithm to a back-end one. To prove the point that this has nothing to do with the particular algorithms, I used a well-known computer science trick — I considered how this would apply to two random classifiers. Initial conversations with others about this idea got a cool reception.
Today I realized that part of the problem is that the distinction between the classification probability of a binary random classifier and its classification error on a particular dataset is not clearly understood. This distinction is important. So I’ll try to explain it on this posting.
Suppose you have a training set for a binary label where , and is the observed feature vector for each observation. Furthermore, suppose that for this particular dataset the prevalence of the class is 70% versus 30% for the class. Consider the binary random classifier that selects the class 70% of the time and the class 30%. What is the classification error of this random algorithm for this particular dataset?
Well, it correctly classifies class 49% of the time (). Class is classified correctly 9% of time (). So the total classification error rate is 42% (100% - 49% - 9%). Note that the error rate of this random classifier on this dataset is not equal to 100% - 70% or 100% - 30% — the classification error rate is not equal to the classification probability.
What does this have to do with error robustness? Users have to deploy our classification algorithms in environments that are unknown to us. The 70/30 random classifier could be deployed in a situation where the prevalence of the classes was 60/40, somewhat different than our training dataset. The error rate of the random classifier on this dataset would then be 46%. Our error rate has increased. For some classification applications we care more about the cost caused by errors than by our accuracy. For example, when I step into a minefield I care more about not stepping on a mine than being accurate about finding a mine every time I gently push a stick into the ground! Therefore, any algorithm that stabilizes the error rate in uncertain environments would be extremely useful to such users. This is a very different line of research than the usual one that you see in the computer science literature where all the focus is on increasing accuracy not stabilizing errors. I’ll leave for another post how one could do precisely this — stabilize the error rate in uncertain environments while sacrificing accuracy by concatenating classification algorithms.