UNN-LC High-Resolution Histopathological Lung Tissue Patch Dataset

The UNN-LC High-Resolution Histopathological Lung Tissue Patch Dataset is a collection of image patches designed for computational prognostic evaluation of lung cancer. Compiled from a subset of 194 whole-slide images (WSIs) from the University Hospital of North Norway, this dataset provides a compr...

Full description

Bibliographic Details
Main Authors: Shvetsov, Nikita, Kilvær, Thomas Karsten, Dalen, Stig Manfred
Other Authors: University Hospital of North Norway
Format: Other/Unknown Material
Language:English
Published: DataverseNO 2024
Subjects:
Online Access:https://doi.org/10.18710/ZZASBA
Description
Summary:The UNN-LC High-Resolution Histopathological Lung Tissue Patch Dataset is a collection of image patches designed for computational prognostic evaluation of lung cancer. Compiled from a subset of 194 whole-slide images (WSIs) from the University Hospital of North Norway, this dataset provides a comprehensive representation of various lung tissue conditions. Each 768 x 768 pixel patch contributes to a detailed analysis of tissue morphology. The dataset was annotated by an oncologist (Thomas Kilvær) and a pathologist (Stig Dalen) with a concerted effort to minimize selection and labeling biases. Specifically, patches with predominantly cancer cells, including tumor-infiltrating lymphocytes, were annotated by Stig Dalen. Thomas Kilvær provided annotations for patches representing normal lung tissue. The combined efforts of Stig Dalen and Thomas Kilvær resulted in the annotations for the reactive stroma with tertiary lymphoid structures and necrosis areas data. Annotations were acquired using QuPath software and a custom-developed annotation tool. The dataset categorizes patches into four classes: necrosis, tumor, stroma_tls, and normal_lung. The necrosis class includes patches of tissue associated with tumor regions, while the normal lung class represents areas of healthy lung tissue, inclusive of stromal components. The stroma_tls class is characterized by patches of reactive stroma with dense tissue and lymphocyte aggregates. The tumor tissue class comprises patches with a predominant presence of tumor content and may also include areas with tumor-infiltrating lymphocytes (TILs). For those interested in further expanding the scope and improving the balance of classes within the dataset, additional patches from the LC25000 dataset can be integrated for a more diverse representation of tissue conditions. This approach can enhance the robustness of computational models developed using this data. The dataset is divided into training and testing sets to facilitate and promote reproducibility in the development and ...