Stack probe

author Gareth Rees
date 2014-10-23
index terms pair: stack probe; design
revision //info.ravenbrook.com/project/mps/custom/cet/main/design/sp.txt#3
status complete design
tag design.mps.sp

Introduction

.intro: This is the design of the stack probe module.

.readership: Any MPS developer; anyone porting the MPS to a new platform.

.overview: This module ensures that the stack cannot overflow while the MPS is holding a lock, so that a mutator can handle stack overflow faults and call into the MPS from the handler.

Requirements

.req.overflow: The mutator should be able to call into the MPS from a stack overflow fault handler. (This is a convenient way to handle stack overflows in dynamic language implementations: if the stack overflow exception and associated backtrace are to be represented as objects, this may require allocation, and hence a call into the MPS.)

.req.complete: In an application where the mutator might call into the MPS from a stack overflow fault handler, then whenever the MPS takes a lock, it must complete the operation and release the lock without running out of stack. (This is because running out of stack would cause a stack overflow fault, causing the mutator to enter the MPS recursively, which would fail because the lock is held.)

Design

.sol.probe: Before taking the arena lock in ArenaEnterLock(), the MPS probes the stack: that is, it checks whether there are at least StackProbeDEPTH words available, and provokes a stack overflow fault if there are not. (This ensures that the fault occurs outside of the arena lock where it can be handled safely.)

.sol.depth: The configuration parameter StackProbeDEPTH specifies the maximum number of words of stack that the MPS might use. (It is simpler, faster, and more reliable, to determine this globally than to try to figure it out dynamically.)

.sol.depth.constraint: Operating systems typically use a single "guard page" to detect stack overflow and grow the stack. (See for example the documentation for Windows.) This means that the probe will be ineffective if it skips over the guard page into the memory beyond. If StackProbeDEPTH is greater than or equal to the number of words per page, the implementation might need to carry out multiple probes. (This constraint is checked in MPMCheck().)

.sol.depth.no-recursion: In order to implement this design, the MPS must have constant bounded stack depth, and therefore, no recursion.

.sol.depth.analysis: Here's a table showing a deep call into the MPS (in the master sources at changelevel 187378), starting in ArenaAccess() at the point where the arena ring lock is taken. The access forces a scan of a segment in an AMC pool, which fixes a reference to an object in an AMC pool's oldspace, which has to be forwarded, and this overflows the forwarding buffer, which requires the arena to allocate a new buffer in an appropriate zone, by searching the splay tree representing free memory.

The "Args" column gives the number of arguments to the function (all arguments to functions in the MPS are word-sized or smaller, since we prohibit passing structures by value), and the "Locals" column gives the number of words in local variables. The value "≤64" for the stack usage of the object format's scan method is the limit that's documented in the manual.

Args Locals Function
5 0 SegAccess()
5 0 SegWholeAccess()
3 8 TraceSegAccess()
4 1 traceScanSeg()
4 9 traceScanSegRes()
4 0 SegScan()
4 5 amcSegScan()
4 0 FormatScan()
3 ≤64 format->scan()
3 0 SegFix()
4 15 amcSegFix()
3 5 BufferFill()
5 11 AMCBufferFill()
5 73 PoolGenAlloc()
6 5 SegAlloc()
4 4 ArenaAlloc()
5 6 PolicyAlloc()
6 10 ArenaFreeLandAlloc()
7 1 LandFindInZones()
7 16 cbsFindInZones()
5 3 cbsFindFirst()
6 7 SplayFindFirst()
3 7 SplaySplay()
4 8 SplaySplitDown()
3 0 SplayZig()
112 ≤258 Total

We expect that a compiler will not need to push all local variables onto the stack, but even in the case where it pushes all of them, this call requires no more than 370 words of stack space.

This isn't necessarily the deepest call into the MPS (the MPS's modular design and class system makes it hard to do a complete analysis using call graph tools), but it's probably close. The value for StackProbeDEPTH is thus chosen to be a round number that's comfortably larger than this.

Interface

void StackProbe(Size depth)

.if.probe: If there are at least depth words of stack available, return. If not, provoke a stack overflow fault.

Issues

.issue.an: The generic implementation is non-functional. This means that it is only suitable for use with programs that do not handle stack overflow faults, or do not call into the MPS from the handler. This is because our customers have only required .req.overflow on Windows so far. If this becomes a requirement on other platforms, the following Standard C implementation might work:

void StackProbe(Size depth) {
  volatile Word w;
  Word *p = &w - depth;
  w = *p;
}

The use of volatile here is to prevent compilers from warning about the variable w being written but never read, or worse, optimizing away the whole statement under the "as if" rule.

Implementations

.impl.an: Generic implementation in span.c. This implementation does nothing. See .issue.an.

.impl.w3i3: Implementation for Windows on IA-32 in spw3i3.c. This uses assembly to get the stack pointer (from the ESP register) and to read the location depth words below the stack pointer.

.impl.w3i6: Implementation for Windows on x86-64 in spw3i6.c. This passes the argument depth*sizeof(Word) to the Windows function _alloca(), for which the documentation says, "A stack overflow exception is generated if the space cannot be allocated."

Document History

  • 2014-10-23 GDR Initial draft.