32. Stack and register scanning

32.1. Introduction

.intro: This is the design of the stack and register scanning module.

.readership: Any MPS developer; anyone porting the MPS to a new platform.

.overview: This module locates and scans references in the control stack and registers of the current thread (the one that has called in to the MPS).

.other: The thread manager module is responsible for scanning the control stack and registers of other threads. See design.mps.thread-manager.if.scan.

.origin: This design was originally proposed in mail.richard.2012-08-03.14-36.

32.2. Requirements

.req.stack.hot: Must locate the hot end of the mutator’s stack. (This is needed for conservative garbage collection of uncooperative code, where references might be stored by the mutator on its stack.)

.req.stack.cold.not: There is no requirement to locate the cold end of the stack. (The mutator supplies this as an argument to mps_root_create_thread().)

.req.stack.platform: Must support the platform’s stack conventions.

.req.stack.platform.full-empty: The implementation must take into account whether the stack is full (the stack pointer points to the last full location) or empty (the stack pointer points to the first empty location).

.req.stack.platform.desc-asc: The implementation must take into account whether the stack is descending (the hot end of the stack is at a lower address than the cold end) or ascending (the hot end of the stack is at a higher address than the cold end).

.req.registers: Must locate and scan all references in the mutator’s root registers, the subset of registers which might contain references that do not also appear on the stack. (This is needed for conservative garbage collection of uncooperative code, where references might appear in registers.)

.req.entry: Should save the mutator’s context (stack and registers) at the point where it enters the MPS. (This avoids scanning registers and stack that belong to the MPS rather than the mutator, leading to unnecessary pinning and zone pollution; see job003525.)

.req.setjmp: The implementation must follow the C Standard in its use of the setjmp() macro. (So that it is reliable and portable.)

.req.assembly.not: The implementation should not use assembly language. (So that it can be developed in tools like Microsoft Visual Studio that don’t support this.)

32.3. Design

.sol.entry-points: To meet .req.entry, the mutator’s registers and stack must be recorded when the mutator enters the MPS, if there is a possibility that the MPS might need to know the mutator context.

.sol.entry-points.fragile: The analysis of which entry points might need to save the context (see .anal.entry-points below) is fragile. It might be incorrect now, or become incomplete if we refactor the internals of tracing and polling. As a defence against errors of this form, StackScan() asserts that the context was saved, but if the client program continues from the assertion, it saves the context anyway and continues.

.sol.registers: Implementations spill the root registers onto the stack so that they can be scanned there.

.sol.registers.root: The root registers are the subset of the callee-save registers that may contain pointers.

.sol.registers.root.justify: The caller-save registers will have been spilled onto the stack by the time the MPS is entered, so will be scanned by the stack scan.

.sol.setjmp: The values in callee-save registers can be found by invoking setjmp(). This forces any of the caller’s callee-save registers into either the jmp_buf or the current stack frame.

.sol.setjmp.scan: Although we might be able to decode the jump buffer in a platform-dependent way, it’s hard to guarantee that an uncooperative compiler won’t temporarily store a reference in any register or stack location. We must conservatively scan the whole of both.

.sol.setjmp.justify: The [C1990] standard specifies that jmp_buf:

is an array type suitable for holding the information needed to restore a calling environment. The environment of a call to the setjmp() macro consists of information sufficient for a call to the longjmp() function to return execution to the correct block and invocation of that block, were it called recursively.

We believe that any reasonable implementation of setjmp() must copy the callee-save registers either into the jump buffer or into the stack frame that invokes it in order to work as described. Otherwise, once the callee-save registers have been overwritten by other function calls, a longjmp() would result in the callee-save registers having the wrong values. A longjmp() can come from anywhere, and so the function using setjmp() can’t rely on callee-save registers being saved by callees.

.sol.stack.hot: We could decode the frame of the function that invokes setjmp() from the jump buffer in a platform-specific way, but we can do something simpler (if more hacky) by calling the stub function StackHot() which takes the address of its argument. So long as this stub function is not inlined into the caller, then on all supported platforms this yields a pointer that is pretty much at the hot end of the frame.

.sol.stack.hot.noinline: The reason that StackHot() must not be inlined is that after inlining, the compiler might place stackOut at a colder stack address than the StackContextStruct, causing the latter not to be scanned. See mail.gdr.2018-07-11.09-48.

.sol.stack.nest: We can take care of scanning the jump buffer itself by storing it in the same stack frame. That way a scan from the hot end determined by .sol.stack.hot to the cold end will contain all of the roots.

.sol.stack.platform: As of version 1.115, all supported platforms are full and descending so the implementation in StackScan() assumes this. New platforms must check this assumption.

.sol.xc.alternative: On macOS, we could use getcontext() from libunwind (see here), but that produces deprecation warnings and introduces a dependency on that library.

32.4. Analysis

.anal.setjmp: The [C1990] standard says:

An invocation of the setjmp macro shall appear only in one of the following contexts:

  • the entire controlling expression of a selection or iteration statement;

  • one operand of a relational or equality operator with the other operand an integral constant expression, with the resulting expression being the entire controlling expression of a selection or iteration statement;

  • the operand of a unary ! operator with the resulting expression being the entire controlling expression of a selection or iteration statement; or

  • the entire expression of an expression statement (possibly cast to void).

And the [C1999] standard adds:

If the invocation appears in any other context, the behavior is undefined.

.anal.entry-points: Here’s a reverse call graph (in the master sources at changelevel 189652) showing which entry points might call StackScan() and so need to record the stack context:

StackScan
 └ThreadScan
   └RootScan
     ├traceScanRootRes
     │ └traceScanRoot
     │   └rootFlip
     │     └traceFlip
     │       └TraceStart
     │         ├PolicyStartTrace
     │         │ └TracePoll
     │         │   ├ArenaStep
     │         │   │ └mps_arena_step
     │         │   └ArenaPoll
     │         │     ├mps_alloc
     │         │     ├mps_ap_fill
     │         │     ├mps_ap_fill_with_reservoir_permit
     │         │     ├mps_ap_alloc_pattern_end
     │         │     ├mps_ap_alloc_pattern_reset
     │         │     └ArenaRelease
     │         │       ├mps_arena_release
     │         │       └ArenaStartCollect
     │         │         ├mps_arena_start_collect
     │         │         └ArenaCollect
     │         │           └mps_arena_collect
     │         └TraceStartCollectAll
     │           ├ArenaStep [see above]
     │           ├ArenaStartCollect [see above]
     │           └PolicyStartTrace [see above]
     └rootsWalk
       └ArenaRootsWalk
         └mps_arena_roots_walk

So the entry points that need to save the stack context are mps_arena_step(), mps_alloc(), mps_ap_fill(), mps_ap_fill_with_reservoir_permit(), mps_ap_alloc_pattern_end(), mps_ap_alloc_pattern_reset(), mps_arena_release(), mps_arena_start_collect(), mps_arena_collect(), and mps_arena_roots_walk().

32.5. Interface

StackContextStruct *StackContext

.if.sc: A structure encapsulating the mutator context.

Res StackScan(ScanState ss, void *stackCold, mps_area_scan_t scan_area, void *closure)

.if.scan: Scan the stack of the current thread, between stackCold and the hot end of the mutator’s stack that was recorded by STACK_CONTEXT_SAVE() when the arena was entered. This will include any roots which were in the mutator’s callee-save registers on entry to the MPS (see .sol.setjmp and .sol.stack.nest). Return ResOK if successful, or another result code if not.

.if.scan.begin-end: This function must be called between STACK_CONTEXT_BEGIN() and STACK_CONTEXT_END().

STACK_CONTEXT_SAVE(StackContext sc)

.if.save: Store the mutator context in the structure sc.

.if.save.macro: This must be implemented as a macro because it needs to run in the stack frame of the entry point (if it runs in some other function it does not necessarily get the mutator’s registers). This necessity to have the definition in scope in mpsi.c, while also having different definitions on different platforms, requires a violation of design.mps.config.no-spaghetti in ss.h.

STACK_CONTEXT_BEGIN(Arena arena)

.if.begin: Start an MPS operation that may need to know the mutator context (see .sol.entry-points). This macro must be used like this:

Res res;
ArenaEnter(arena);
STACK_CONTEXT_BEGIN(arena) {
  res = ArenaStartCollect(...);
} STACK_CONTEXT_END(arena);
ArenaLeave(arena);
return res;

That is, it must be paired with STACK_CONTEXT_END(), and there must be no return between the two macro invocations.

This macro stores the mutator context in a StackContext structure allocated on the stack, and sets arena->stackWarm to the hot end of the current frame (using .sol.stack.hot).

STACK_CONTEXT_END(Arena arena)

.if.end: Finish the MPS operation that was started by STACK_CONTEXT_BEGIN().

This macro sets arena->stackWarm to NULL.

32.6. Implementations

.impl: Generic implementation of StackScan() in ss.c scans the whole area between arena->stackWarm and the cold end of the mutator’s stack, implementing .sol.stack.nest and also the backup strategy in .sol.entry-points.fragile.

Diagram: scanned areas of the stack.

32.7. References

C1990(1,2)

International Standard ISO/IEC 9899:1990. “Programming languages — C”.

C1999

International Standard ISO/IEC 9899:1999. “Programming languages — C”.

Fog

Agner Fog; “Calling conventions for different C++ compilers and operating systems”; Copenhagen University College of Engineering; 2014-08-07.

x86_64_registers

Microsoft Corporation; “Caller/Callee Saved Registers”.