28. Shield

28.1. Introduction

.intro: This document contains a guide to the MPS Shield. There is no historical initial design, but in its place there are some early ideas and discussions: see .ideas.

.readership: Any MPS developer. Not confidential.

28.2. Overview

.overview: The MPS implements incremental garbage collection using memory barriers implemented by a combination of hardware memory protection and thread control. The MPS needs separate control of collector access and mutator (client) access to memory: the collector must be able to incrementally scan objects, without the mutator being able to see them yet.

Unfortunately common operating systems do not support different access levels (protection maps) for different parts of the same process.

The MPS Shield is an abstraction that does extra work to overcome this limitation, and give the rest of the MPS the illusion that we can control collector and mutator access separately.

28.3. Interface

28.3.1. Mutator access

The shield provides ShieldRaise() and ShieldLower() to forbid or permit the mutator access to object memory segments. Between these two, a segment is said to have the shield raised (.def.raised).

void ShieldRaise(Arena arena, Seg seg, AccessSet mode)

Prevent the mutator accessing the memory segment in the specified mode (AccessREAD, AccessWRITE, or both).

void ShieldLower(Arena arena, Seg seg, AccessSet mode)

Allow the mutator to access the memory segment in the specified mode (AccessREAD, AccessWRITE, or both).

If the mutator attempts an access that hits the shield, the MPS gets an OS-specific hardware protection fault which reaches ArenaAccess(), does whatever work is necessary, then lowers the shield and returns to the mutator.

ShieldRaise() and ShieldLower() do not nest.

28.3.2. Entering the shield

The MPS can only gain exclusive access from inside the shield (.def.inside). To enter the shield, the MPS must call ShieldEnter(), and to leave it, the MPS must call ShieldLeave().

ShieldEnter() and ShieldLeave() are called by ArenaEnter() and ArenaLeave() so almost all of the MPS is is inside the shield.

28.3.3. Collector access to segments

When the MPS wants to access object memory segments from inside the shield, it must wrap any accesses with a ShieldExpose() and ShieldCover() pair. These calls nest. After a call to ShieldExpose() a segment is said to be exposed until the last nested call to ShieldCover(). The shield arranges that the MPS can access the memory while it is exposed.

A segment might for example be exposed during:

  • format-scan (when scanning);

  • format-skip (when marking grains in a non-moving fix);

  • format-isMoved and AddrCopy() (during a copying fix);

  • format-pad (during reclaim).

Note that there is no need to call ShieldExpose() when accessing pool management memory such as bit tables. This is not object memory, is never (legally) accessed by the mutator, and so is never shielded.

Similarly, a pool class that never raises the shield on its segments need never expose them to gain access.

28.3.4. Collector access to the unprotectable

When the MPS wants to access an unprotectable object from inside the shield, it must wrap any accesses with a ShieldHold() and ShieldRelease() pair. This allows access to objects which cannot be shielded by ShieldRaise(), such as:

  • the stack and registers of mutator threads,

  • lockless allocation point structures,

  • areas of memory that can’t be protected by operating system calls,

  • unprotectable roots.

void ShieldHold(Arena arena)

Get exclusive access to the unprotectable.

void ShieldRelease(Arena arena)

Declare that exclusive access is no longer needed.

28.4. Mechanism

On common operating systems, the only way to allow the MPS access is to allow access from the whole process, including the mutator. So ShieldExpose() will suspend all mutator threads to prevent any mutator access, and so will ShieldRaise() on an unexposed segment. The shield handles suspending and resuming threads, and so the rest of the MPS does not need to worry about it.

The MPS can make multiple sequential, overlapping, or nested calls to ShieldExpose() on the same segment, as long as each is balanced by a corresponding ShieldCover() before ShieldLeave() is called. A usage count is maintained on each segment in seg->depth. When the usage count reaches zero, there is no longer any reason the segment should be unprotected, and the shield may reinstate hardware protection at any time.

However, as a performance-improving hysteresis, the shield defers re-protection, maintaining a queue of segments that require attention before mutator threads are resumed (.impl.delay). While a segment is in the queue, it has seg->queued set true.

This hysteresis allows the MPS to proceed with garbage collection during a pause without actually setting hardware protection until it returns to the mutator. This is particularly important on operating systems where the protection is expensive and poorly implemented, such as macOS.

The queue also ensures that no memory protection system calls will be needed for incremental garbage collection if a complete collection cycle occurs during one pause.

28.5. Implementation

.impl.delay: The implementation of the shield avoids suspending threads for as long as possible. When threads are suspended, it maintains a queue of segments where the desired and actual protection do not match. This queue is flushed on leaving the shield.

28.5.1. Definitions

.def.raised: A segment has the shield raised for an access mode after a call to ShieldRaise() and before a call to ShieldLower() with that mode.

.def.exposed: A segment is exposed after a call to ShieldExpose() and before a call to ShieldLower().

.def.synced: A segment is synced if the prot and shield modes are the same, and unsynced otherwise.

.def.depth: The depth of a segment is defined as:

depth ≔ #exposes − #covers, where
#exposes = the number of calls to ShieldExpose() on the segment
#covers = the number of calls to ShieldCover() on the segment

ShieldCover() must not be called without a matching ShieldExpose(), so this figure must always be non-negative.

.def.total.depth: The total depth is the sum of the depth over all segments.

.def.outside: Being outside the shield is being between calls to ShieldLeave() and ShieldEnter(), and similarly .def.inside: being inside the shield is being between calls to ShieldEnter() and ShieldLeave(). [In a multi-threaded MPS this would be per-thread. RB 2016-03-18]

.def.shielded: A segment is shielded if the shield mode is non-zero. [As set by ShieldRaise.]

28.5.2. Properties

.prop.outside.running: The mutator may not be suspended while outside the shield.

.prop.mutator.access: An attempt by the mutator to access shielded memory must be pre-empted by a call to ArenaAccess().

.prop.inside.access: Inside the shield the MPS must be able to access all unshielded segments and all exposed segments.

28.5.3. Invariants

.inv.outside.running: The mutator is not suspended while outside the shield.

.inv.unsynced.suspended: If any segment is not synced, the mutator is suspended.

.inv.unsynced.depth: All unsynced segments have positive depth or are in the queue.

.inv.outside.depth: The total depth is zero while outside the shield.

.inv.prot.shield: The prot mode is never more than the shield mode.

.inv.expose.depth: An exposed segment’s depth is greater than zero.

.inv.expose.prot: An exposed segment is not protected in the mode it was exposed with.

28.5.4. Proof Hints

Hints at proofs of properties from invariants.

.proof.outside: .inv.outside.running directly ensures .prop.outside.running.

.proof.sync: As the depth of a segment cannot be negative

total depth = 0
⇒ for all segments, depth = 0
⇒ all segments are synced (by .inv.unsynced.depth)

.proof.access: If the mutator is running then all segments must be synced (.inv.unsynced.suspend). Which means that the hardware protection (protection mode) must reflect the software protection (shield mode). Hence all shielded memory will be hardware protected while the mutator is running. This ensures .prop.mutator.access.

.proof.inside: .inv.prot.shield and .inv.expose.prot ensure .prop.inside.access.

28.6. Initial ideas

.ideas: There never was an initial design document, but [RB_1995-11-29] and [RB_1995-11-30] contain some initial ideas.

28.7. Improvement Ideas

28.7.1. Mass exposure

.improv.mass-expose: If protection calls have a high overhead it might be good to pre-emptively unprotect large ranges of memory when we expose one segment. With the current design this would mean discovering adjacent shielded segments and adding them to the queue. The collector should take advantage of this by preferentially scanning exposed segments during a pause.

28.7.2. Segment independence

.improv.noseg: The shield is implemented in terms of segments, using fields in the segment structure to represent its state. This forces us to (for example) flush the shield queue when deleting a segment. The shield could keep track of protection and shielding independently, possibly allowing greater coalescing and more efficient and flexible use of system calls (see .improv.mass-expose).

28.7.3. Concurrent collection

.improv.concurrent: The MPS currently does not collect concurrently, however the only thing that makes it not-concurrent is a critical point in the Shield abstraction where the MPS seeks to gain privileged access to memory (usually in order to scan it). The critical point is where ShieldExpose() in shield.c has to call ShieldHold() to preserve the shield invariants. This is the only point in the MPS that prevents concurrency, and the rest of the MPS is designed to support it.

The restriction could be removed if either:

  • the MPS could use a different set of protections to the mutator program

  • the mutator program uses a software barrier

The first one is tricky, and the second one just hasn’t come up in any implementation we’ve been asked to make yet. Given a VM, it could happen, and the MPS would be concurrent.

So, I believe there’s nothing fundamentally non-concurrent about the MPS design. It’s kind of waiting to happen.

(Originally written at <http://news.ycombinator.com/item?id=4524036>.)

28.7.4. Early Resume

.improv.resume: There is a tradeoff between delaying flushing the shield queue (preventing unnecessary protection and allowing us to coalesce) and resuming mutator threads. We could resume threads earlier under some circumstances, such as before reclaim (which does not need to interact with the mutator). Basically, it might be worth resuming the mutator early in a pause if we know that we’re unlikely to suspend it again (no more calls to ShieldRaise() or ShieldExpose() on shielded segments).

28.7.5. Expose modes

.improv.expose-modes: Would it be a good idea for ShieldExpose() to take an AccessSet? It might be good if we didn’t have to raise a write barrier unless we want to write. When scanning (for instance), we may not need to write, so when scanning a segment behind a write barrier we shouldn’t have to call mprotect(). That’s a bit speculative: how often do we scan a segment and not write to it. Alternatively, and more speculatively, we could keep the write barrier up, handle the (possibly nested) trap and then expose the shield. I’m just scraping around for ways to reduce calls to mprotect().

Theoretically we can do this, but:

  1. We’re mostly a moving collector so we’ll almost always want to write to segments we scan. That could change if we do more non-moving collection.

  2. The main cost of protection is changing it at all, not whether we change just read or write. On macOS, the main cost seems to be the TLB flush, which affects wall-clock time of everything on the processor!

28.8. References

RB_1995-11-29

Richard Brooksby. Harlequin. 1995-11-29. “Shield protocol for barriers”.

RB_1995-11-30

Richard Brooksby. Harlequin. 1995-11-30. “Exegesis of Incremental Tracing”.