Pointer-Based Divergence Analysis for OpenCL 2.0 Programs

A modern GPU is designed with many large thread groups to achieve a high throughput and performance. Within these groups, the threads are grouped into fixed-size SIMD batches in which the same instruction is applied to vectors of data in a lockstep. This GPU architecture is suitable for applications...

Full description

Bibliographic Details
Published in:ACM Transactions on Parallel Computing
Main Authors: Wang, Shao-Chung, Yu, Lin-Ya, Her, Li-An, Hwang, Yuan-Shin, Lee, Jenq-Kuen
Other Authors: MediaTek, MOST of Taiwan
Format: Article in Journal/Newspaper
Language:English
Published: Association for Computing Machinery (ACM) 2021
Subjects:
Online Access:http://dx.doi.org/10.1145/3470644
https://dl.acm.org/doi/pdf/10.1145/3470644
id cracm:10.1145/3470644
record_format openpolar
spelling cracm:10.1145/3470644 2024-05-12T08:11:58+00:00 Pointer-Based Divergence Analysis for OpenCL 2.0 Programs Wang, Shao-Chung Yu, Lin-Ya Her, Li-An Hwang, Yuan-Shin Lee, Jenq-Kuen MediaTek MOST of Taiwan 2021 http://dx.doi.org/10.1145/3470644 https://dl.acm.org/doi/pdf/10.1145/3470644 en eng Association for Computing Machinery (ACM) ACM Transactions on Parallel Computing volume 8, issue 4, page 1-23 ISSN 2329-4949 2329-4957 journal-article 2021 cracm https://doi.org/10.1145/3470644 2024-05-01T06:46:28Z A modern GPU is designed with many large thread groups to achieve a high throughput and performance. Within these groups, the threads are grouped into fixed-size SIMD batches in which the same instruction is applied to vectors of data in a lockstep. This GPU architecture is suitable for applications with a high degree of data parallelism, but its performance degrades seriously when divergence occurs. Many optimizations for divergence have been proposed, and they vary with the divergence information about variables and branches. A previous analysis scheme viewed pointers and return values from functions as divergence directly, and only focused on OpenCL 1.x. In this article, we present a novel scheme that reports the divergence information for pointer-intensive OpenCL programs. The approach is based on extended static single assignment (SSA) and adds some special functions and annotations from memory SSA and gated SSA. The proposed scheme first constructs extended SSA, which is then used to build a divergence relation graph that includes all of the possible points-to relationships of the pointers and initialized divergence states. The divergence state of the pointers can be determined by propagating the divergence state of the divergence relation graph. The scheme is further extended for interprocedural cases by considering function-related statements. The proposed scheme was implemented in an LLVM compiler and can be applied to OpenCL programs. We analyzed 10 programs with 24 kernels, with a total analyzed program size of 1,306 instructions in an LLVM intermediate representation, with 885 variables, 108 branches, and 313 pointer-related statements. The total number of divergent pointers detected was 146 for the proposed scheme, 200 for the scheme in which the pointer was always divergent, and 155 for the current LLVM default scheme; the total numbers of divergent variables detected were 458, 519, and 482, respectively, with 31, 34, and 32 divergent branches. These experimental results indicate that the proposed ... Article in Journal/Newspaper The Pointers ACM Publications (Association for Computing Machinery) ACM Transactions on Parallel Computing 8 4 1 23
institution Open Polar
collection ACM Publications (Association for Computing Machinery)
op_collection_id cracm
language English
description A modern GPU is designed with many large thread groups to achieve a high throughput and performance. Within these groups, the threads are grouped into fixed-size SIMD batches in which the same instruction is applied to vectors of data in a lockstep. This GPU architecture is suitable for applications with a high degree of data parallelism, but its performance degrades seriously when divergence occurs. Many optimizations for divergence have been proposed, and they vary with the divergence information about variables and branches. A previous analysis scheme viewed pointers and return values from functions as divergence directly, and only focused on OpenCL 1.x. In this article, we present a novel scheme that reports the divergence information for pointer-intensive OpenCL programs. The approach is based on extended static single assignment (SSA) and adds some special functions and annotations from memory SSA and gated SSA. The proposed scheme first constructs extended SSA, which is then used to build a divergence relation graph that includes all of the possible points-to relationships of the pointers and initialized divergence states. The divergence state of the pointers can be determined by propagating the divergence state of the divergence relation graph. The scheme is further extended for interprocedural cases by considering function-related statements. The proposed scheme was implemented in an LLVM compiler and can be applied to OpenCL programs. We analyzed 10 programs with 24 kernels, with a total analyzed program size of 1,306 instructions in an LLVM intermediate representation, with 885 variables, 108 branches, and 313 pointer-related statements. The total number of divergent pointers detected was 146 for the proposed scheme, 200 for the scheme in which the pointer was always divergent, and 155 for the current LLVM default scheme; the total numbers of divergent variables detected were 458, 519, and 482, respectively, with 31, 34, and 32 divergent branches. These experimental results indicate that the proposed ...
author2 MediaTek
MOST of Taiwan
format Article in Journal/Newspaper
author Wang, Shao-Chung
Yu, Lin-Ya
Her, Li-An
Hwang, Yuan-Shin
Lee, Jenq-Kuen
spellingShingle Wang, Shao-Chung
Yu, Lin-Ya
Her, Li-An
Hwang, Yuan-Shin
Lee, Jenq-Kuen
Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
author_facet Wang, Shao-Chung
Yu, Lin-Ya
Her, Li-An
Hwang, Yuan-Shin
Lee, Jenq-Kuen
author_sort Wang, Shao-Chung
title Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
title_short Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
title_full Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
title_fullStr Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
title_full_unstemmed Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
title_sort pointer-based divergence analysis for opencl 2.0 programs
publisher Association for Computing Machinery (ACM)
publishDate 2021
url http://dx.doi.org/10.1145/3470644
https://dl.acm.org/doi/pdf/10.1145/3470644
genre The Pointers
genre_facet The Pointers
op_source ACM Transactions on Parallel Computing
volume 8, issue 4, page 1-23
ISSN 2329-4949 2329-4957
op_doi https://doi.org/10.1145/3470644
container_title ACM Transactions on Parallel Computing
container_volume 8
container_issue 4
container_start_page 1
op_container_end_page 23
_version_ 1798834217439723520