This document is currently a mixture of very old design notes (the preformatted section immediately following) and some newer stuff. It doesn't yet form anything like a complete picture.
TRACER design.mps.trace incomplete design drj 1996-09-25 ARCHITECTURE: .instance.limit: There will be a limit on the number of traces that can be created at any one time. This effectively limits the number of concurrent traces. This limitation is expressed in the symbol TRACE_MAX [currently set to 1, see request.mps.160020 "Multiple traces would not work" drj 1998-06-15]. .rate: [see mail.nickb.1997-07-31.14-37]. [Now revised? See request.epcore.160062 and change.epcore.minnow.160062. drj 1998-06-15] .exact.legal: Exact references should either point outside the arena (to non-managed address space) or to a tract allocated to a pool. Exact references that are to addresses which the arena has reserved but hasn't allocated memory to are illegal (the exact reference couldn't possibly refer to a real object). Depending on the future semantics of PoolDestroy we might need to adjust our strategy here. See mail.dsm.1996-02-14.18-18 for a strategy of coping gracefully with PoolDestroy. We check that this is the case in the fixer. It may be sensible to make this check CRITICAL in certain configurations. .fix.fixed.all: ss->fixedSummary is accumulated (in the fixer) for all the pointers whether or not they are genuine references. We could accumulate fewer pointers here; if a pointer fails the TractOfAddr test then we know it isn't a reference, so we needn't accumulate it into the fixed summary. The design allows this, but it breaks a useful post-condition on scanning (if the accumulation of ss->fixedSummary was moved the accuracy of ss->fixedSummary would vary according to the "width" of the white summary). See mail.pekka.1998-02-04.16-48 for improvement suggestions. ANALYSIS: .fix.copy-fail: Fixing can always succeed, even if copying the referenced object has failed (due to lack of memory, for example), by backing off to treating a reference as ambiguous. Assuming that fixing an ambiguous reference doesn't allocate memory (which is no longer true for AMC for example). See request.dylan.170560 for a slightly more sophisticated way to proceed when you can no longer allocate memory for copying. IDEAS: .flip.after: To avoid excessive barrier impact on the mutator immediately after flip, we could scan during flip other objects which are "near" the roots, or otherwise known to be likely to be accessed in the near future. IMPLEMENTATION: Speed .fix: The fix path is critical to garbage collection speed. Abstractly fix is applied to all the references in the non-white heap and all the references in the copied heap. Remembered sets cut down the number of segments we have to scan. The zone test cuts down the number of references we call fix on. The speed of the remainder of the fix path is still critical to system performance. Various modifications to and aspects of the system are concerned with maintaining the speed along this path. .fix.tractofaddr: TractOfAddr is called on every reference that passes the zone test and is on the critical path, to determine whether the segment is white. There is no need to examine the segment to perform this test, since whiteness information is duplicated in tracts, specifically to optimize this test. TractOfAddr itself is a simple class dispatch function (which dispatches to the arena class's TractOfAddr method). Inlining the dispatch and inlining the functions called by VMTractOfAddr makes a small but noticable difference to the speed of the dylan compiler. .fix.noaver: AVERs in the code add bulk to the code (reducing I-cache efficacy) and add branches to the path (polluting the branch pedictors) resulting in a slow down. Removing all the AVERs from the fix path improves the overall speed of the dylan compiler by as much as 9%. .fix.nocopy: AMCFix used to copy objects by using the format's copy method. This involved a function call (through an indirection) and in dylan_copy a call to dylan_skip (to recompute the length) and call to memcpy with general parameters. Replacing this with a direct call to memcpy removes these overheads and the call to memcpy now has aligned parameters. The call to memcpy is inlined by the (C) compiler. This change results in a 4-5% speed-up in the dylan compiler. .reclaim: Because the reclaim phase of the trace (implemented by TraceReclaim) examines every segment it is fairly time intensive. rit's profiles presented in request.dylan.170551 show a gap between the two varieties variety.hi and variety.wi. .reclaim.noaver: Converting AVERs in the loops of TraceReclaim, PoolReclaim, AMCReclaim (LOReclaim? AWLReclaim) will result in a noticeable speed improvement [insert actual speed improvement here].
TraceCreate creates a trace in state TraceINIT
Some segments get condemned (made white).
TraceStart gets called which:
Derives an initial reference partition based on the existing white set. The white zone set and the segments' summaries are used to create an initial grey set.
emits a GCStart message.
initialises trace->rate by estimating the required scanning rate
moves the trace into the state TraceUNFLIPPED
immediately calls traceFlip which flips the trace and moves it into state TraceFLIPPED.
Whilst a trace is alive every so often its
traceQuantum
method gets invoked (via
TracePoll
) in order to do a quantum of tracing work.
traceQuantum
is responsible for ticking through the trace's
top-level state machine. Most of the interesting work, the tracing,
happens in the TraceFLIPPED
state.
The trace transitions through its states in the following sequence:
TraceINIT
->
(TraceUNFLIPPED
) -> TraceFLIPPED
->
TraceRECLAIM
-> TraceFINISHED
.
Whilst TraceUNFLIPPED
appears in the code, no trace does
any work in this state; all traces are immediately flipped to be in the
TraceFLIPPED
state (see above).
Once the trace is in the TraceFINISHED
state it performs
no more work and it can be safely destroyed. Generally the callers of
traceQuantum
will destroy the trace.
Most of the interesting work of a trace, the actual tracing, happens
in the TraceFLIPPED
state (work would happen in
the TraceUNFLIPPED
state, but that is not implemented).
The tracer makes progress by choosing a grey segment to scan, and scanning it. The actual scanning is performed by pools.
Note that at all times a reference partition is maintained.
The order in which the trace scans things determines the semantics of certain types of references (in particular, weak and final references). Or, to put it another way the desired semantics of weak and final references impose certain restrictions on the order in which the trace can scan things.
The tracer uses a system of reference ranks (or just ranks) so that it can impose an order on its scanning work. The ranks are ordered.
The tracer proceeds band by band. The first band is all objects it can reach by following references of the first rank. The second band is all subsequent objects it can reach by following references of the second and first ranks. The third band is all subsequent objects it can reach by following references of the third, second, and first ranks. And so on. The description of the tracer working like this originated in an e-mail message from RHSK [RHSK 2007-06-25].
A trace keep track of which band it is tracing. This is returned by
the TraceBand
method. Keeping this band information helps
it implement the semantics of finalization and weakness. The band used
to not be explicitly stored, but this hindered the implementation of
good finalization semantics (essentially in some circumstances
finalization messages were delayed by at least one collection cycle, see
job001658).
The band is used when selecting a grey segment to scan (the selection
occurs in traceFindGrey
). The tracer attempts to first
find segments whose rank is the current band, then segments whose rank
is previous to the current band, and so on. If there are no segments
found then the current band is exhausted and the current band is
incremented to the next rank. When the current band is moved through
all the ranks in this fashion there is no more tracing to be done.
[RHSK 2007-06-25] | "the semantics of rank-based tracing"; RHSK; <URL: http://info.ravenbrook.com/mail/2007/06/25/11-35-57/0.txt>; 2007-06-25. |
2002-06-07 | RB | Converted from MMInfo database design document. |
2007-07-02 | DRJ | Added notes on tracer progress |
This document is copyright © 1995-2002 Ravenbrook Limited. All rights reserved. This is an open source license. Contact Ravenbrook for commercial licensing options.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement, are disclaimed. In no event shall the copyright holders and contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.
$Id: //info.ravenbrook.com/project/mps/version/1.111/design/trace/index.html#2 $
Ravenbrook / Projects / Memory Pool System / Version 1.111 Product Sources / Design Documents