31. Tracer¶
31.1. Introduction¶
Warning
This document is currently a mixture of very old design notes (the preformatted section immediately following) and some newer stuff. It doesn’t yet form anything like a complete picture.
31.2. Architecture¶
.instance.limit: There is a limit on the number of traces that can
be created at any one time. This limits the number of concurrent
traces. This limitation is expressed in the symbol TraceLIMIT
.
Note
TraceLIMIT
is currently set to 1 as the MPS assumes in various
places that only a single trace is active at a time. See
request.mps.160020 “Multiple traces would not work”. David Jones,
1998-06-15.
.rate: See mail.nickb.1997-07-31.14-37.
Note
Now revised? See request.epcore.160062 and change.epcore.minnow.160062. David Jones, 1998-06-15.
.exact.legal: Exact references must either point outside the arena
(to non-managed address space) or to a tract allocated to a pool.
Exact references that are to addresses which the arena has reserved
but hasn’t allocated memory to are illegal (such a reference cannot
possibly refer to a real object, and so cannot be exact). We check
that this is the case in TraceFix()
.
Note
Depending on the future semantics of PoolDestroy()
we might
need to adjust our strategy here. See mail.dsm.1996-02-14.18-18
for a strategy of coping gracefully with PoolDestroy()
.
.fix.fixed.all: ss->fixedSummary
is accumulated (in
TraceFix()
) for all pointers, whether or not they are genuine
references. We could accumulate fewer pointers here; if a pointer
fails the TractOfAddr()
test then we know it isn’t a reference, so
we needn’t accumulate it into the fixed summary. The design allows
this, but it breaks a useful post-condition on scanning (if the
accumulation of ss->fixedSummary
was moved the accuracy of
ss->fixedSummary
would vary according to the “width” of the white
summary). See mail.pekka.1998-02-04.16-48 for improvement suggestions.
31.3. Analysis¶
.fix.copy-fail: Fixing can always succeed, even if copying the referenced object has failed (due to lack of memory, for example), by backing off to treating a reference as ambiguous. Assuming that fixing an ambiguous reference doesn’t allocate memory (which is no longer true for AMC for example). See request.dylan.170560 for a slightly more sophisticated way to proceed when you can no longer allocate memory for copying.
31.4. Ideas¶
.flip.after: To avoid excessive barrier impact on the mutator immediately after flip, we could scan during flip other objects which are “near” the roots, or otherwise known to be likely to be accessed in the near future.
31.5. Implementation¶
31.5.1. Speed¶
.fix: The function implementing the fix operation should be called
TraceFix()
and this name is pervasive in the MPS and its documents
to describe this function. Nonethless, optimisation and strict
aliasing rules have meant that we need to use the external name for
it, _mps_fix2()
.
.fix.speed: The fix path is critical to garbage collection speed. Abstractly, the fix operation is applied to all references in the non-white heap and all references in the copied heap. Remembered sets cut down the number of segments we have to scan. The zone test cuts down the number of references we call fix on. The speed of the remainder of the fix path is still critical to system performance. Various modifications to and aspects of the system are concerned with maintaining the speed along this path. See design.mps.critical_path.
.fix.tractofaddr: A reference that passes the zone test is then
looked up to find the tract it points to, an operation equivalent to
calling TractOfAddr()
.
.fix.tractofaddr.inline: TraceFix()
doesn’t actually call
TractOfAddr()
. Instead, it expands this operation inline (calling
ChunkOfAddr()
, then INDEX_OF_ADDR()
, checking the appropriate
bit in the chunk’s allocTable
, and finally looking up the tract in
the chunk’s page table). The reason for inlining this code is that we
need to know whether the reference points to a chunk (and not just
whether it points to a tract) in order to check the .exact.legal
condition.
.fix.whiteseg: The reason for looking up the tract is to determine whether the reference is to a white segment.
Note
It is likely to be more efficient to maintain a separate lookup table from address to white segment, rather than indirecting through the chunk and the tract. See job003796.
.fix.noaver: AVER()
statements in the code add bulk to the code
(reducing I-cache efficacy) and add branches to the path (polluting
the branch pedictors) resulting in a slow down. Replacing the
AVER()
statements with AVER_CRITICAL()
on the critical path
improves the overall speed of the Dylan compiler by as much as 9%. See
design.mps.critical_path.
.fix.nocopy: amcSegFix()
used to copy objects by using the
format’s copy method. This involved a function call (through an
indirection) and in dylan_copy
a call to dylan_skip
(to
recompute the length) and call to memcpy
with general parameters.
Replacing this with a direct call to memcpy
removes these
overheads and the call to memcpy
now has aligned parameters. The
call to memcpy
is inlined by the C compiler. This change results
in a 4–5% speed-up in the Dylan compiler.
.reclaim: Because the reclaim phase of the trace (implemented by
TraceReclaim()
) examines every segment it is fairly time
intensive. Richard Tucker’s profiles presented in
request.dylan.170551 show a gap between the two varieties variety.hi
and variety.wi.
.reclaim.noaver: Accordingly, reclaim methods use
AVER_CRITICAL()
instead of AVER()
.
31.6. Life cycle of a trace object¶
TraceCreate()
creates a trace in state TraceINIT
Some segments get condemned (made white).
TraceStart()
gets called which:
Derives an initial reference partition based on the existing white set. The white zone set and the segments’ summaries are used to create an initial grey set.
Emits a
GCStart()
message.Initialises
trace->rate
by estimating the required scanning rate.Moves the trace into the state
TraceUNFLIPPED
.Immediately calls
traceFlip
which flips the trace and moves it into stateTraceFLIPPED
.
Whilst a trace is alive every so often its TraceAdvance()
method
gets invoked (via TracePoll()
) in order to do a step of tracing
work. TraceAdvance()
is responsible for ticking through the trace’s
top-level state machine. Most of the interesting work, the tracing,
happens in the TraceFLIPPED
state.
The trace transitions through its states in the following sequence:
TraceINIT
→ (TraceUNFLIPPED
) → TraceFLIPPED
→
TraceRECLAIM
→ TraceFINISHED
.
Whilst TraceUNFLIPPED
appears in the code, no trace does any work
in this state; all traces are immediately flipped to be in the
TraceFLIPPED
state (see above).
Once the trace is in the TraceFINISHED
state it performs no more
work and it can be safely destroyed. Generally the callers of
TraceAdvance()
will destroy the trace.
31.6.1. Making progress: scanning grey segments¶
Most of the interesting work of a trace, the actual tracing, happens
in the TraceFLIPPED
state (work would happen in the
TraceUNFLIPPED
state, but that is not implemented).
The tracer makes progress by choosing a grey segment to scan, and scanning it. The actual scanning is performed by pools.
Note that at all times a reference partition is maintained.
The order in which the trace scans things determines the semantics of certain types of references (in particular, weak and final references). Or, to put it another way the desired semantics of weak and final references impose certain restrictions on the order in which the trace can scan things.
The tracer uses a system of reference ranks (or just ranks) so that it can impose an order on its scanning work. The ranks are ordered.
The tracer proceeds band by band. The first band is all objects it can reach by following references of the first rank. The second band is all subsequent objects it can reach by following references of the second and first ranks. The third band is all subsequent objects it can reach by following references of the third, second, and first ranks. And so on. The description of the tracer working like this originated in [RHSK_2007-06-25].
A trace keeps track of which band it is tracing. This is returned by
the TraceBand()
method. Keeping this band information helps it
implement the semantics of finalization and weakness. The band used to
not be explicitly stored, but this hindered the implementation of good
finalization semantics (in some circumstances finalization messages
were delayed by at least one collection cycle: see job001658).
The band is used when selecting a grey segment to scan (the selection
occurs in traceFindGrey()
). The tracer attempts to first find
segments whose rank is the current band, then segments whose rank is
previous to the current band, and so on. If there are no segments
found then the current band is exhausted and the current band is
incremented to the next rank. When the current band is moved through
all the ranks in this fashion there is no more tracing to be done.
31.7. References¶
- RHSK_2007-06-25
Richard Kistruck. Ravenbrook Limited. 2007-06-25. “The semantics of rank-based tracing”.