Decontamination of Mutually Contaminated Models

International audience A variety of machine learning problems are characterized by data sets that are drawn from multiple different convex combinations of a fixed set of base distributions. We call this a mutual contamination model. In such problems, it is often of interest to recover these base dis...

Full description

Bibliographic Details
Main Authors:	Blanchard, Gilles, Scott, Clayton
Other Authors:	Institut für Mathematik Potsdam, Universität Potsdam, University of Michigan Ann Arbor, University of Michigan System, Samuel Kaski, Jukka Corander
Format:	Conference Object
Language:	English
Published:	HAL CCSD 2014
Subjects:	[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] Iceland
Online Access:	https://hal.archives-ouvertes.fr/hal-03371264 https://hal.archives-ouvertes.fr/hal-03371264/document https://hal.archives-ouvertes.fr/hal-03371264/file/blanchard14-supp-pdfjam.pdf

id	ftccsdartic:oai:HAL:hal-03371264v1
record_format	openpolar
spelling	ftccsdartic:oai:HAL:hal-03371264v1 2023-05-15T16:50:19+02:00 Decontamination of Mutually Contaminated Models Blanchard, Gilles Scott, Clayton Institut für Mathematik Potsdam Universität Potsdam University of Michigan Ann Arbor University of Michigan System Samuel Kaski, Jukka Corander Reykjavik, Iceland 2014 https://hal.archives-ouvertes.fr/hal-03371264 https://hal.archives-ouvertes.fr/hal-03371264/document https://hal.archives-ouvertes.fr/hal-03371264/file/blanchard14-supp-pdfjam.pdf en eng HAL CCSD hal-03371264 https://hal.archives-ouvertes.fr/hal-03371264 https://hal.archives-ouvertes.fr/hal-03371264/document https://hal.archives-ouvertes.fr/hal-03371264/file/blanchard14-supp-pdfjam.pdf http://creativecommons.org/licenses/by/ info:eu-repo/semantics/OpenAccess Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS 2014) https://hal.archives-ouvertes.fr/hal-03371264 Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS 2014), 2014, Reykjavik, Iceland. pp.1-9 https://proceedings.mlr.press/v33/blanchard14.html [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] info:eu-repo/semantics/conferenceObject Conference papers 2014 ftccsdartic 2021-10-16T22:24:24Z International audience A variety of machine learning problems are characterized by data sets that are drawn from multiple different convex combinations of a fixed set of base distributions. We call this a mutual contamination model. In such problems, it is often of interest to recover these base distributions, or otherwise discern their properties. This work focuses on the problem of classification with multiclass label noise, in a general setting where the noise proportions are unknown and the true class distributions are nonseparable and potentially quite complex. We develop a procedure for decontamination of the contaminated models from data, which then facilitates the design of a consistent discrimination rule. Our approach relies on a novel method for estimating the error when projecting one distribution onto a convex combination of others, where the projection is with respect to a statistical distance known as the separation distance. Under sufficient conditions on the amount of noise and purity of the base distributions, this projection procedure successfully recovers the underlying class distributions. Connections to novelty detection, topic modeling, and other learning problems are also discussed. Conference Object Iceland Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
institution	Open Polar
collection	Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
op_collection_id	ftccsdartic
language	English
topic	[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]
spellingShingle	[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] Blanchard, Gilles Scott, Clayton Decontamination of Mutually Contaminated Models
topic_facet	[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]
description	International audience A variety of machine learning problems are characterized by data sets that are drawn from multiple different convex combinations of a fixed set of base distributions. We call this a mutual contamination model. In such problems, it is often of interest to recover these base distributions, or otherwise discern their properties. This work focuses on the problem of classification with multiclass label noise, in a general setting where the noise proportions are unknown and the true class distributions are nonseparable and potentially quite complex. We develop a procedure for decontamination of the contaminated models from data, which then facilitates the design of a consistent discrimination rule. Our approach relies on a novel method for estimating the error when projecting one distribution onto a convex combination of others, where the projection is with respect to a statistical distance known as the separation distance. Under sufficient conditions on the amount of noise and purity of the base distributions, this projection procedure successfully recovers the underlying class distributions. Connections to novelty detection, topic modeling, and other learning problems are also discussed.
author2	Institut für Mathematik Potsdam Universität Potsdam University of Michigan Ann Arbor University of Michigan System Samuel Kaski, Jukka Corander
format	Conference Object
author	Blanchard, Gilles Scott, Clayton
author_facet	Blanchard, Gilles Scott, Clayton
author_sort	Blanchard, Gilles
title	Decontamination of Mutually Contaminated Models
title_short	Decontamination of Mutually Contaminated Models
title_full	Decontamination of Mutually Contaminated Models
title_fullStr	Decontamination of Mutually Contaminated Models
title_full_unstemmed	Decontamination of Mutually Contaminated Models
title_sort	decontamination of mutually contaminated models
publisher	HAL CCSD
publishDate	2014
url	https://hal.archives-ouvertes.fr/hal-03371264 https://hal.archives-ouvertes.fr/hal-03371264/document https://hal.archives-ouvertes.fr/hal-03371264/file/blanchard14-supp-pdfjam.pdf
op_coverage	Reykjavik, Iceland
genre	Iceland
genre_facet	Iceland
op_source	Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS 2014) https://hal.archives-ouvertes.fr/hal-03371264 Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS 2014), 2014, Reykjavik, Iceland. pp.1-9 https://proceedings.mlr.press/v33/blanchard14.html
op_relation	hal-03371264 https://hal.archives-ouvertes.fr/hal-03371264 https://hal.archives-ouvertes.fr/hal-03371264/document https://hal.archives-ouvertes.fr/hal-03371264/file/blanchard14-supp-pdfjam.pdf
op_rights	http://creativecommons.org/licenses/by/ info:eu-repo/semantics/OpenAccess
_version_	1766040489336766464

Decontamination of Mutually Contaminated Models

Similar Items