Validation of Nearest Neighbor Classifiers
We develop a probabilistic bound on the error rate of the nearest neighbor classifier formed from a set of labelled examples. The bound is computed using only the examples in the set. A subset of the examples is used as a validation set to bound the error rate of the classifier formed from the remai...
Main Author: | |
---|---|
Other Authors: | |
Format: | Text |
Language: | English |
Published: |
1998
|
Subjects: | |
Online Access: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.1800 http://www.math.urich.edu/~ebax/papers/nearest_neighbor.ps |
Summary: | We develop a probabilistic bound on the error rate of the nearest neighbor classifier formed from a set of labelled examples. The bound is computed using only the examples in the set. A subset of the examples is used as a validation set to bound the error rate of the classifier formed from the remaining examples. Then a bound is computed for the difference in error rates between the original classifier and the reduced classifier. This bound is computed by partitioning the validation set and using each subset to compute bounds for the error rate difference due to the other subsets. 1 Framework Consider the following machine learning framework. There is an unknown boolean-valued target function, and there is a distribution over the input space of the function. For example, the input distribution could consist of typical satellite images of the North Atlantic Ocean, and the target function could be 1 if the image contains a large iceberg and 0 otherwise. We have a set of in-sample data e. |
---|