Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias

Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics, such as emotio...

Full description

Bibliographic Details
Published in:	2023 ACM Conference on Fairness, Accountability, and Transparency
Main Authors:	Wolfe, Robert, Yang, Yiwei, Howe, Bill, Caliskan, Aylin
Format:	Text
Language:	unknown
Published:	2022
Subjects:	Computer Science - Computers and Society Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Antarctic Antarc*
Online Access:	http://arxiv.org/abs/2212.11261 https://doi.org/10.1145/3593013.3594072

id	ftarxivpreprints:oai:arXiv.org:2212.11261
record_format	openpolar
spelling	ftarxivpreprints:oai:arXiv.org:2212.11261 2023-09-05T13:12:22+02:00 Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias Wolfe, Robert Yang, Yiwei Howe, Bill Caliskan, Aylin 2022-12-21 http://arxiv.org/abs/2212.11261 https://doi.org/10.1145/3593013.3594072 unknown http://arxiv.org/abs/2212.11261 ACM FAccT 2023 doi:10.1145/3593013.3594072 Computer Science - Computers and Society Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning text 2022 ftarxivpreprints https://doi.org/10.1145/3593013.3594072 2023-08-16T17:27:25Z Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics, such as emotions, are disregarded and the person is treated as a body. We replicate three experiments in psychology quantifying sexual objectification and show that the phenomena persist in AI. A first experiment uses standardized images of women from the Sexual OBjectification and EMotion Database, and finds that human characteristics are disassociated from images of objectified women: the model's recognition of emotional state is mediated by whether the subject is fully or partially clothed. Embedding association tests (EATs) return significant effect sizes for both anger (d >0.80) and sadness (d >0.50), associating images of fully clothed subjects with emotions. GRAD-CAM saliency maps highlight that CLIP gets distracted from emotional expressions in objectified images. A second experiment measures the effect in a representative application: an automatic image captioner (Antarctic Captions) includes words denoting emotion less than 50% as often for images of partially clothed women than for images of fully clothed women. A third experiment finds that images of female professionals (scientists, doctors, executives) are likely to be associated with sexual descriptions relative to images of male professionals. A fourth experiment shows that a prompt of "a [age] year old girl" generates sexualized images (as determined by an NSFW classifier) up to 73% of the time for VQGAN-CLIP and Stable Diffusion; the corresponding rate for boys never surpasses 9%. The evidence indicates that language-vision AI models trained on web scrapes learn biases of sexual objectification, which propagate to downstream applications. Comment: 12 pages, 4 figures, 2 tables Text Antarc* Antarctic ArXiv.org (Cornell University Library) Antarctic 2023 ACM Conference on Fairness, Accountability, and Transparency 1174 1185
institution	Open Polar
collection	ArXiv.org (Cornell University Library)
op_collection_id	ftarxivpreprints
language	unknown
topic	Computer Science - Computers and Society Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning
spellingShingle	Computer Science - Computers and Society Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Wolfe, Robert Yang, Yiwei Howe, Bill Caliskan, Aylin Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
topic_facet	Computer Science - Computers and Society Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning
description	Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics, such as emotions, are disregarded and the person is treated as a body. We replicate three experiments in psychology quantifying sexual objectification and show that the phenomena persist in AI. A first experiment uses standardized images of women from the Sexual OBjectification and EMotion Database, and finds that human characteristics are disassociated from images of objectified women: the model's recognition of emotional state is mediated by whether the subject is fully or partially clothed. Embedding association tests (EATs) return significant effect sizes for both anger (d >0.80) and sadness (d >0.50), associating images of fully clothed subjects with emotions. GRAD-CAM saliency maps highlight that CLIP gets distracted from emotional expressions in objectified images. A second experiment measures the effect in a representative application: an automatic image captioner (Antarctic Captions) includes words denoting emotion less than 50% as often for images of partially clothed women than for images of fully clothed women. A third experiment finds that images of female professionals (scientists, doctors, executives) are likely to be associated with sexual descriptions relative to images of male professionals. A fourth experiment shows that a prompt of "a [age] year old girl" generates sexualized images (as determined by an NSFW classifier) up to 73% of the time for VQGAN-CLIP and Stable Diffusion; the corresponding rate for boys never surpasses 9%. The evidence indicates that language-vision AI models trained on web scrapes learn biases of sexual objectification, which propagate to downstream applications. Comment: 12 pages, 4 figures, 2 tables
format	Text
author	Wolfe, Robert Yang, Yiwei Howe, Bill Caliskan, Aylin
author_facet	Wolfe, Robert Yang, Yiwei Howe, Bill Caliskan, Aylin
author_sort	Wolfe, Robert
title	Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
title_short	Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
title_full	Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
title_fullStr	Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
title_full_unstemmed	Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
title_sort	contrastive language-vision ai models pretrained on web-scraped multimodal data exhibit sexual objectification bias
publishDate	2022
url	http://arxiv.org/abs/2212.11261 https://doi.org/10.1145/3593013.3594072
geographic	Antarctic
geographic_facet	Antarctic
genre	Antarc* Antarctic
genre_facet	Antarc* Antarctic
op_relation	http://arxiv.org/abs/2212.11261 ACM FAccT 2023 doi:10.1145/3593013.3594072
op_doi	https://doi.org/10.1145/3593013.3594072
container_title	2023 ACM Conference on Fairness, Accountability, and Transparency
container_start_page	1174
op_container_end_page	1185
_version_	1776200020604747776

Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias

Similar Items