AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images

Processing giga-pixel whole slide histopathology images (WSI) is a computationally expensive task. Multiple instance learning (MIL) has become the conventional approach to process WSIs, in which these images are split into smaller patches for further processing. However, MIL-based techniques ignore...

Full description

Bibliographic Details
Main Authors: Nakhli, Ramin, Moghadam, Puria Azadi, Mi, Haoyang, Farahani, Hossein, Baras, Alexander, Gilks, Blake, Bashashati, Ali
Format: Text
Language:unknown
Published: 2023
Subjects:
Online Access:http://arxiv.org/abs/2303.00865
id ftarxivpreprints:oai:arXiv.org:2303.00865
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:2303.00865 2023-09-05T13:20:41+02:00 AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images Nakhli, Ramin Moghadam, Puria Azadi Mi, Haoyang Farahani, Hossein Baras, Alexander Gilks, Blake Bashashati, Ali 2023-03-01 http://arxiv.org/abs/2303.00865 unknown http://arxiv.org/abs/2303.00865 Computer Science - Computer Vision and Pattern Recognition text 2023 ftarxivpreprints 2023-08-16T17:33:59Z Processing giga-pixel whole slide histopathology images (WSI) is a computationally expensive task. Multiple instance learning (MIL) has become the conventional approach to process WSIs, in which these images are split into smaller patches for further processing. However, MIL-based techniques ignore explicit information about the individual cells within a patch. In this paper, by defining the novel concept of shared-context processing, we designed a multi-modal Graph Transformer (AMIGO) that uses the celluar graph within the tissue to provide a single representation for a patient while taking advantage of the hierarchical structure of the tissue, enabling a dynamic focus between cell-level and tissue-level information. We benchmarked the performance of our model against multiple state-of-the-art methods in survival prediction and showed that ours can significantly outperform all of them including hierarchical Vision Transformer (ViT). More importantly, we show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data. Finally, in two different cancer datasets, we demonstrated that our model was able to stratify the patients into low-risk and high-risk groups while other state-of-the-art methods failed to achieve this goal. We also publish a large dataset of immunohistochemistry images (InUIT) containing 1,600 tissue microarray (TMA) cores from 188 patients along with their survival information, making it one of the largest publicly available datasets in this context. Comment: Accepted at CVPR 2023 Text inuit ArXiv.org (Cornell University Library)
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Computer Vision and Pattern Recognition
spellingShingle Computer Science - Computer Vision and Pattern Recognition
Nakhli, Ramin
Moghadam, Puria Azadi
Mi, Haoyang
Farahani, Hossein
Baras, Alexander
Gilks, Blake
Bashashati, Ali
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
topic_facet Computer Science - Computer Vision and Pattern Recognition
description Processing giga-pixel whole slide histopathology images (WSI) is a computationally expensive task. Multiple instance learning (MIL) has become the conventional approach to process WSIs, in which these images are split into smaller patches for further processing. However, MIL-based techniques ignore explicit information about the individual cells within a patch. In this paper, by defining the novel concept of shared-context processing, we designed a multi-modal Graph Transformer (AMIGO) that uses the celluar graph within the tissue to provide a single representation for a patient while taking advantage of the hierarchical structure of the tissue, enabling a dynamic focus between cell-level and tissue-level information. We benchmarked the performance of our model against multiple state-of-the-art methods in survival prediction and showed that ours can significantly outperform all of them including hierarchical Vision Transformer (ViT). More importantly, we show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data. Finally, in two different cancer datasets, we demonstrated that our model was able to stratify the patients into low-risk and high-risk groups while other state-of-the-art methods failed to achieve this goal. We also publish a large dataset of immunohistochemistry images (InUIT) containing 1,600 tissue microarray (TMA) cores from 188 patients along with their survival information, making it one of the largest publicly available datasets in this context. Comment: Accepted at CVPR 2023
format Text
author Nakhli, Ramin
Moghadam, Puria Azadi
Mi, Haoyang
Farahani, Hossein
Baras, Alexander
Gilks, Blake
Bashashati, Ali
author_facet Nakhli, Ramin
Moghadam, Puria Azadi
Mi, Haoyang
Farahani, Hossein
Baras, Alexander
Gilks, Blake
Bashashati, Ali
author_sort Nakhli, Ramin
title AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
title_short AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
title_full AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
title_fullStr AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
title_full_unstemmed AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
title_sort amigo: sparse multi-modal graph transformer with shared-context processing for representation learning of giga-pixel images
publishDate 2023
url http://arxiv.org/abs/2303.00865
genre inuit
genre_facet inuit
op_relation http://arxiv.org/abs/2303.00865
_version_ 1776201323956404224