Multi-encoder U-Net for Oral Squamous Cell Carcinoma Image Segmentation

Oral tumors are responsible for about 170,000 deaths every year in the World. In this paper, we focus on oral squamous cell carcinoma (OSCC), which represents up to 80-90 % of all malignant neoplasms of the oral cavity. We present a novel deep learning-based method for segmenting whole slide image (...

Full description

Bibliographic Details
Published in:2022 IEEE International Symposium on Medical Measurements and Applications (MeMeA)
Main Authors: Pennisi A., Bloisi D. D., Nardi D., Varricchio S., Merolla F.
Other Authors: Pennisi, A., Bloisi, D. D., Nardi, D., Varricchio, S., Merolla, F.
Format: Conference Object
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2022
Subjects:
Online Access:https://hdl.handle.net/11563/169198
https://doi.org/10.1109/MeMeA54994.2022.9856482
Description
Summary:Oral tumors are responsible for about 170,000 deaths every year in the World. In this paper, we focus on oral squamous cell carcinoma (OSCC), which represents up to 80-90 % of all malignant neoplasms of the oral cavity. We present a novel deep learning-based method for segmenting whole slide image (WSI) samples at the pixel level. The proposed method is a modification of the well-known U-Net architecture through a multi-encoder structure. In particular, our network, called Multi-encoder U-Net, is a multi-encoder single decoder network that takes as input an image and splits it in tiles. For each tile, there is an encoder responsible for encoding it in the latent space, then a convolutional layer is responsible for merging the tiles into a single layer. Each layer of the decoder takes as input the previous up-sampled layer and concatenate it with the layer made by merging the corresponding layers of the multiple encoders. Experiments have been carried out on the publicly available ORal Cancer Annotated (ORCA) dataset, which contains annotated data from the TCGA repository. Quantitative experimental results, obtained using three different quality metrics, demonstrate the effectiveness of the proposed approach, which achieves 82% Pixel-wise Accuracy, 0.82 Dice similarity score, and 0.72 Mean Intersection Over Union.