Summary: | The serious bugs and security vulnerabilities that result from C's lack of bounds checking and unsafe manual memory management are well known, yet C remains in widespread use. Unfortunately, C's arbitrary pointer arithmetic, conflation of pointers and arrays, and programmer-visible memory layout make retrofitting C with memory safety guarantees challenging. Existing approaches suffer from incompleteness, have high runtime overhead, or require non-trivial changes to the C source code. Thus far, these deficiencies have prevented widespread adoption of such techniques. This dissertation proposes mechanisms to provide comprehensive memory safety that works with mostly unmodified C code with a low performance overhead. We use a pointer-based approach where we maintain metadata with pointers and check every pointer dereference. To enable compatibility with existing code, we maintain the metadata for the pointers in memory in a disjoint metadata space leaving the memory layout of the program intact. For detecting spatial violations, we maintain bounds metadata with every pointer. For detecting temporal violations, we also maintain a unique identifier metadata with each pointer. This pointer metadata is propagated with pointer operations and checked on pointer dereferences. Coupling disjoint metadata with a pointer-based approach enables comprehensive detection of all memory safety violations in unmodified C programs. This dissertation demonstrates the compatibility of this approach by hardening legacy C/C++ code with minimal source code changes. Further, this dissertation shows the effectiveness of the approach by detecting new memory safety errors and previously known memory safety errors in large code bases. To attain low performance overheads, this dissertation proposes efficient instantiations of this approach (1) within a compiler, (2) within hardware, and (3) with a hybrid hardware accelerated compiler instrumentation that reduces the overhead of enforcing memory safety, and thereby enabling their use in deployed systems.
|