Validation of Nearest Neighbor Classifiers

We develop a probabilistic bound on the error rate of the nearest neighbor classifier formed from a set of labelled examples. The bound is computed using only the examples in the set. A subset of the examples is used as a validation set to bound the error rate of the classifier formed from the remai...

Full description

Bibliographic Details
Main Author: Eric Bax
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 1998
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.1800
http://www.math.urich.edu/~ebax/papers/nearest_neighbor.ps
Description
Summary:We develop a probabilistic bound on the error rate of the nearest neighbor classifier formed from a set of labelled examples. The bound is computed using only the examples in the set. A subset of the examples is used as a validation set to bound the error rate of the classifier formed from the remaining examples. Then a bound is computed for the difference in error rates between the original classifier and the reduced classifier. This bound is computed by partitioning the validation set and using each subset to compute bounds for the error rate difference due to the other subsets. 1 Framework Consider the following machine learning framework. There is an unknown boolean-valued target function, and there is a distribution over the input space of the function. For example, the input distribution could consist of typical satellite images of the North Atlantic Ocean, and the target function could be 1 if the image contains a large iceberg and 0 otherwise. We have a set of in-sample data e.