Sound garbage collection for C using pointer provenance

Garbage collection (GC) support for unmanaged languages can reduce programming burden in reasoning about liveness of dynamic objects. It also avoids temporal memory safety violations and memory leaks. Sound GC for weakly-typed languages such as C/C++, however, remains an unsolved problem. Current va...

Full description

Bibliographic Details
Published in:Proceedings of the ACM on Programming Languages
Main Authors: Banerjee, Subarno, Devecsery, David, Chen, Peter M., Narayanasamy, Satish
Other Authors: NSF
Format: Article in Journal/Newspaper
Language:English
Published: Association for Computing Machinery (ACM) 2020
Subjects:
Online Access:http://dx.doi.org/10.1145/3428244
https://dl.acm.org/doi/pdf/10.1145/3428244
id cracm:10.1145/3428244
record_format openpolar
spelling cracm:10.1145/3428244 2024-05-12T08:11:58+00:00 Sound garbage collection for C using pointer provenance Banerjee, Subarno Devecsery, David Chen, Peter M. Narayanasamy, Satish NSF 2020 http://dx.doi.org/10.1145/3428244 https://dl.acm.org/doi/pdf/10.1145/3428244 en eng Association for Computing Machinery (ACM) http://www.acm.org/publications/policies/copyright_policy#Background Proceedings of the ACM on Programming Languages volume 4, issue OOPSLA, page 1-28 ISSN 2475-1421 journal-article 2020 cracm https://doi.org/10.1145/3428244 2024-05-01T06:46:49Z Garbage collection (GC) support for unmanaged languages can reduce programming burden in reasoning about liveness of dynamic objects. It also avoids temporal memory safety violations and memory leaks. Sound GC for weakly-typed languages such as C/C++, however, remains an unsolved problem. Current value-based GC solutions examine values of memory locations to discover the pointers, and the objects they point to. The approach is inherently unsound in the presence of arbitrary type casts and pointer manipulations, which are legal in C/C++. Such language features are regularly used, especially in low-level systems code. In this paper, we propose Dynamic Pointer Provenance Tracking to realize sound GC. We observe that pointers cannot be created out-of-thin-air, and they must have provenance to at least one valid allocation. Therefore, by tracking pointer provenance from the source (e.g., malloc) through both explicit data-flow and implicit control-flow, our GC has sound and precise information to compute the set of all reachable objects at any program state. We discuss several static analysis optimizations, that can be employed during compilation aided with profiling, to significantly reduce the overhead of dynamic provenance tracking from nearly 8× to 16% for well-behaved programs that adhere to the C standards. Pointer provenance based sound GC invocation is also 13% faster and reclaims 6% more memory on average, compared to an unsound value-based GC. Article in Journal/Newspaper The Pointers ACM Publications (Association for Computing Machinery) Proceedings of the ACM on Programming Languages 4 OOPSLA 1 28
institution Open Polar
collection ACM Publications (Association for Computing Machinery)
op_collection_id cracm
language English
description Garbage collection (GC) support for unmanaged languages can reduce programming burden in reasoning about liveness of dynamic objects. It also avoids temporal memory safety violations and memory leaks. Sound GC for weakly-typed languages such as C/C++, however, remains an unsolved problem. Current value-based GC solutions examine values of memory locations to discover the pointers, and the objects they point to. The approach is inherently unsound in the presence of arbitrary type casts and pointer manipulations, which are legal in C/C++. Such language features are regularly used, especially in low-level systems code. In this paper, we propose Dynamic Pointer Provenance Tracking to realize sound GC. We observe that pointers cannot be created out-of-thin-air, and they must have provenance to at least one valid allocation. Therefore, by tracking pointer provenance from the source (e.g., malloc) through both explicit data-flow and implicit control-flow, our GC has sound and precise information to compute the set of all reachable objects at any program state. We discuss several static analysis optimizations, that can be employed during compilation aided with profiling, to significantly reduce the overhead of dynamic provenance tracking from nearly 8× to 16% for well-behaved programs that adhere to the C standards. Pointer provenance based sound GC invocation is also 13% faster and reclaims 6% more memory on average, compared to an unsound value-based GC.
author2 NSF
format Article in Journal/Newspaper
author Banerjee, Subarno
Devecsery, David
Chen, Peter M.
Narayanasamy, Satish
spellingShingle Banerjee, Subarno
Devecsery, David
Chen, Peter M.
Narayanasamy, Satish
Sound garbage collection for C using pointer provenance
author_facet Banerjee, Subarno
Devecsery, David
Chen, Peter M.
Narayanasamy, Satish
author_sort Banerjee, Subarno
title Sound garbage collection for C using pointer provenance
title_short Sound garbage collection for C using pointer provenance
title_full Sound garbage collection for C using pointer provenance
title_fullStr Sound garbage collection for C using pointer provenance
title_full_unstemmed Sound garbage collection for C using pointer provenance
title_sort sound garbage collection for c using pointer provenance
publisher Association for Computing Machinery (ACM)
publishDate 2020
url http://dx.doi.org/10.1145/3428244
https://dl.acm.org/doi/pdf/10.1145/3428244
genre The Pointers
genre_facet The Pointers
op_source Proceedings of the ACM on Programming Languages
volume 4, issue OOPSLA, page 1-28
ISSN 2475-1421
op_rights http://www.acm.org/publications/policies/copyright_policy#Background
op_doi https://doi.org/10.1145/3428244
container_title Proceedings of the ACM on Programming Languages
container_volume 4
container_issue OOPSLA
container_start_page 1
op_container_end_page 28
_version_ 1798834221039484928