Sound garbage collection for C using pointer provenance

Garbage collection (GC) support for unmanaged languages can reduce programming burden in reasoning about liveness of dynamic objects. It also avoids temporal memory safety violations and memory leaks. Sound GC for weakly-typed languages such as C/C++, however, remains an unsolved problem. Current va...

Full description

Bibliographic Details
Published in:Proceedings of the ACM on Programming Languages
Main Authors: Banerjee, Subarno, Devecsery, David, Chen, Peter M., Narayanasamy, Satish
Other Authors: NSF
Format: Article in Journal/Newspaper
Language:English
Published: Association for Computing Machinery (ACM) 2020
Subjects:
Online Access:http://dx.doi.org/10.1145/3428244
https://dl.acm.org/doi/pdf/10.1145/3428244
Description
Summary:Garbage collection (GC) support for unmanaged languages can reduce programming burden in reasoning about liveness of dynamic objects. It also avoids temporal memory safety violations and memory leaks. Sound GC for weakly-typed languages such as C/C++, however, remains an unsolved problem. Current value-based GC solutions examine values of memory locations to discover the pointers, and the objects they point to. The approach is inherently unsound in the presence of arbitrary type casts and pointer manipulations, which are legal in C/C++. Such language features are regularly used, especially in low-level systems code. In this paper, we propose Dynamic Pointer Provenance Tracking to realize sound GC. We observe that pointers cannot be created out-of-thin-air, and they must have provenance to at least one valid allocation. Therefore, by tracking pointer provenance from the source (e.g., malloc) through both explicit data-flow and implicit control-flow, our GC has sound and precise information to compute the set of all reachable objects at any program state. We discuss several static analysis optimizations, that can be employed during compilation aided with profiling, to significantly reduce the overhead of dynamic provenance tracking from nearly 8× to 16% for well-behaved programs that adhere to the C standards. Pointer provenance based sound GC invocation is also 13% faster and reclaims 6% more memory on average, compared to an unsound value-based GC.