Similarity of samples and trimming

We say that two probabilities are similar at level a if they are contaminated versions (up to an a fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in t...

Full description

Bibliographic Details
Published in:Bernoulli
Main Authors: Álvarez Esteban, Pedro César, Barrio Tellado, Eustasio del, Cuesta Albertos, Juan Antonio, Matran Bea, Carlos
Other Authors: Universidad de Cantabria
Format: Article in Journal/Newspaper
Language:English
Published: International Statistical Institute; Chapman and Hall 2012
Subjects:
Online Access:https://hdl.handle.net/10902/29685
https://doi.org/10.3150/11-BEJ351
Description
Summary:We say that two probabilities are similar at level a if they are contaminated versions (up to an a fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples. Research partially supported by the Spanish Ministerio de Ciencia e Innovación, Grant MTM2008-06067-C02-01, and 02 and by the Consejería de Educación y Cultura de la Junta de Castilla y León, GR150. The authors would like to thank two anonymous referees for their careful reading of the manuscript, their suggestions and the pointers to relevant references that helped us to greatly improve our original version.