.. mode: -*- rst -*-
The critical path through the MPS
=================================
:Author: Richard Brooksby
:Organization: Ravenbrook Limited
:Date: 2012-09-07
:Revision: $Id: //info.ravenbrook.com/project/mps/custom/cet/version/1.114/design/critical-path.txt#1 $
:Copyright: See section `Copyright and License`_.
:Index terms:
single: critical path
single: path; critical
single: Memory Pool System; critical path
Introduction
------------
The critical path is a key concept in the design of the `Memory Pool
System `_. Code on the critical
path is usually executed more than any other code in the process. A
change of just one instruction on the critical path can make as much as
a 1% difference in overall run-time. A lot of the design of the MPS is
arranged around making the critical path as short and fast as possible.
This document describes the critical path and explains some of that
design, with reference to more detailed documents.
What makes the critical path critical
-------------------------------------
In order to determine which object can be recycled, the garbage
collector has to frequently examine a very large number of pointers in
the program's objects. It does this by scanning_ memory, both
allocated objects and roots (such as the thread stacks).
This means that the scanning functions must loop over pretty much *every
word in memory* sooner or later. The MPS takes great pains to avoid
scanning memory which does not need scanning, but to get good
performance, scanning must be highly optimised.
What's more, the scanning functions apply an operation called "fix" to
every pointer (or potential pointer) that they find in the objects in
memory. Fixing also attempts to eliminate uninteresting pointers as
fast as possible, but it has to do some work on every object that is
being considered for recycling, and that can be a large proportion of
the objects in existence. The path through fixing must also be highly
optimised, especially in the early stages.
How the MPS avoids scanning and fixing
--------------------------------------
This is just a brief overview of how the MPS is designed to reduce
unnecessary scanning and fixing.
Firstly, the MPS must occasionally decide which objects to try to
recycle. It does this using various facts it knows about the objects,
primarily their age and whether they've survived previous attempts at
recycling them. It then `"condemns"`_ a large number of objects
at once, and each of these objects must be "preserved" by fixing
references to them.
When the MPS condemns objects it chooses sets of objects in a small set
of "zones" in memory (preferably a single zone). The zone of an object
can be determined extremely quickly from its address, without looking at
the object or any other data structure.
The MPS arranges that objects which will probably die at the same time
are in the same zones.
The MPS allocates in "segments". Each segment is of the order of one
"tract" of memory (generally the same as the operating system page
size, usually 4KiB or 8KiB) but may be larger if there are large
objects inside. The MPS maintains a "summary" of the zones pointed to
by all the pointers in a segment from previous scans.
So, once the MPS has decided what to condemn, it can quickly eliminate
all segments which definitely do not point to anything in those zones.
This avoids a large amount of scanning. It is an implementation of a
`remembered set`_, though it is unlike that in most other GCs.
In addition, the fix operation can quickly ignore pointers to the wrong
zones. This is called the "zone check" and is a BIBOP_ technique.
Even if a pointer passes the zone check, it may still not point to a
segment containing condemned objects. The next stage of the fix
operation is to look up the segment pointed to by the pointer and see if
it was condemned. This is a fast lookup.
After that, each pool class must decide whether the pointer is to a
condemned object and do something to preserve it. This code is still
critical. The MPS will have tried to condemn objects that are dead, but
those objects are still likely to be in segments with other objects that
must be preserved. The pool class fix method must quickly distinguish
between them.
Furthermore, many objects will be preserved at least once in their
lifetime, so even the code that preserves an object needs to be highly
efficient. (Programs in languages like ML might not preserve 95% of
their objects even once, but many other programs will preserve nearly
all of theirs many times.)
Where to find the critical path
-------------------------------
Very briefly, the critical path consists of five stages:
#. The scanner, which iterates over pointers in objects. The MPS has
several internal scanners, but the most important ones will be format
scanners in client code registered through ``mps_format_create()``
functions.
.. note::
There needs to be a chapter in the manual explaining how to
write a good scanner. Then that could be linked from here.
#. The first-stage fix, which filters out pointers inline in the
scanner. This is implemented in ``MPS_FIX()`` macros in
mps.h_.
.. _mps.h: ../code/mps.h
#. The second-stage fix, which filters out pointers using general
information about segments. This is ``_mps_fix2`` in
`trace.c <../code/trace.c>`_.
#. The third-stage fix, which filters out pointers using pool-specific
information. Implemented in pool class functions called ``AMCFix()``,
``LOFix()``, etc. in pool*.c.
#. Preserving the object, which might entail
- marking_ it to prevent it being recycled; and/or
- copying_ it and updating the original pointer (or just
updating the pointer, if the object has previously been
copied); and/or
- adding it to a queue of objects to be scanned later, if it
contains pointers.
Found in or near the pool class fix functions.
The format scanner
------------------
The critical path starts when a format scan method is called. That is a
call from the MPS to a client function of type ``mps_fmt_scan_t``
registered with one of the ``mps_format_create()`` functions in mps.h_.
Here is an example of part of a format scanner for scanning contiguous
runs of pointers, from `fmtdy.c <../code/fmtdy.c>`_, the scanner for the `Open Dylan`_
runtime::
static mps_res_t dylan_scan_contig(mps_ss_t mps_ss,
mps_addr_t *base, mps_addr_t *limit)
{
mps_res_t res;
mps_addr_t *p; /* reference cursor */
mps_addr_t r; /* reference to be fixed */
MPS_SCAN_BEGIN(mps_ss) {
p = base;
loop: if(p >= limit) goto out;
r = *p++;
if(((mps_word_t)r&3) != 0) /* pointers tagged with 0 */
goto loop; /* not a pointer */
if(!MPS_FIX1(mps_ss, r)) goto loop;
res = MPS_FIX2(mps_ss, p-1);
if(res == MPS_RES_OK) goto loop;
return res;
out: assert(p == limit);
} MPS_SCAN_END(mps_ss);
return MPS_RES_OK;
}
(To help with understanding optimisation of this code, it's written in
a pseudo-assembler style, with one line roughly corresponding to each
instruction of an idealized intermediate code.)
The MPS C interface provides macros to try to help optimise this code.
The ``mps_ss`` object is a "scan state" and contains data that is used
to eliminate uninteresting pointers now, and record information which
will be used to reduce scanning in future by maintaining the
remembered set.
The macros ``MPS_SCAN_BEGIN()`` and ``MPS_SCAN_END()`` load key data
from the scan state into local variables, and hopefully into processor
registers. This avoids aliasing values that we know won't change when
calls are made to ``_mps_fix2`` later, and so allows the compiler to
keep the scan loop small and avoid unnecessary memory references.
This scanner knows that words not ending in 0b00 aren't pointers to
objects, so it eliminates them straight away. This is a kind of
`reference tag`_ chosen by the client for its object representation.
Next, the pointer is tested using ``MPS_FIX1()``. This performs fast
tests on the pointer without using any other memory. In particular, it
does the "zone check" described in section 3. If a pointer fails these
tests, it isn't interesting and can be skipped. It is very important
to proceed to the next pointer as fast as possible in this case.
Having passed these tests, we need to fix the pointer using other data
in memory, and possibly call the MPS to preserve the object. This is
what ``MPS_FIX2()`` does. The important distinction here is that
``MPS_FIX2()`` can fail and return an error code, which must be
propagated without ado by returning from the scanner. Separating
``MPS_FIX1()`` from ``MPS_FIX2()`` helps keep the error handling code
away from the tight loop with the zone check.
``MPS_FIX*``, the macro/inline part of the fix operation, are referred
to as "fix stage 1" or "the first stage fix" in other documents and
comments.
If these inline checks pass, ``_mps_fix2`` is called. If the MPS has
been built as a separate object file or library, this is where the
function call out of the scan loop happens. Since version 1.110 of the
MPS, we encourage clients to compile the MPS in the same translation
unit as their format code, so that the compiler can be intelligent
about inlining parts of ``_mps_fix2`` in the format scanner. The
instructions for doing this are in `Building the Memory Pool System
<../manual/build.txt>`_, part of the manual.
The second stage fix in the MPM
-------------------------------
If a pointer gets past the first-stage fix filters, it is passed to
``_mps_fix2``, the "second stage fix". The second stage can filter out
yet more pointers using information about segments before it has to
consult the pool class.
The first test is to determine if the address points to a *chunk* (a
contiguous regions of address space managed by the arena). Addresses
that do not point to any chunk (for example, ambiguous references that
are not in fact pointers) are rejected immediately.
When there are many chunks (that is, when the arena has been extended
many times), this test can consume the majority of the garbage
collection time. This is the reason that it's important to give a good
estimate of the amount of address space you will ever occupy with
objects when you initialize the arena.
The second test applied is the "tract test". The MPS looks up the
tract containing the address in the tract table, which is a simple
linear table indexed by the address shifted -- a kind of flat page
table.
If the pointer is in an allocated tract, then the table also contains
a cache of the "white set" -- the set of garbage collection traces for
which the tract is "interesting". If a tract isn't interesting, then
we know that it contains no condemned objects, and we can filter out
the pointer.
If the tract is interesting them it's part of a segment containing
objects that have been condemned. The MPM can't know anything about
the internal layout of the segment, so at this point we dispatch to
the third stage fix.
This dispatch is slightly subtle. We have a cache of the function to
dispatch to in the scan state, which has recently been looked at and
is with luck still in the processor cache. The reason there is a
dispatch at all is to allow for a fast changeover to emergency garbage
collection, or overriding of garbage collection with extra operations.
Those are beyond the scope of this document. Normally, ``ss->fix``
points at ``PoolFix()``, and we rely somewhat on modern processor
`branch target prediction
`_).
``PoolFix()`` is passed the pool, which is fetched from the tract
table entry, and that should be in the cache.
``PoolFix()`` itself dispatches to the pool class. Normally, a
dispatch to a pool class would indirect through the pool class object.
That would be a double indirection from the tract, so instead we have
a cache of the pool's fix method in the pool object. This also allows
a pool class to vary its fix method per pool instance, a fact that is
exploited to optimize fixing in the AMC Pool depending on what kind of
object format it is managing.
The third stage fix in the pool class
-------------------------------------
The final stage of fixing is entirely dependent on the pool class. The
MPM can't, in general, know how the objects within a pool are arranged,
so this is pool class specific code.
Furthermore, the pool class must make decisions based on the "reference
rank" of the pointer. If a pointer is ambiguous (``RankAMBIG``) then it
can't be changed, so even a copying pool class can't move an object.
On the other hand, if the pointer is weak (``RankWEAK``) then the pool fix
method shouldn't preserve the object at all, even if it's condemned.
The exact details of the logic that the pool fix must implement in
order to co-operate with the MPM and other pools are beyond the scope
of this document, which is about the critical path. Since it is on
the critical path, it's important that whatever the pool fix does is
simple and fast and returns to scanning as soon as possible.
The first step, though, is to further filter out pointers which aren't
to objects, if that's its policy. Then, it may preserve the object,
according to its policy, and possibly ensure that the object gets
scanned at some point in the future, if it contains more pointers.
If the object is moved to preserve it (for instance, if the pool class
implements a copying GC), or was already moved when fixing a previous
reference to it, the reference being fixed must be updated (this is
the origin of the term "fix").
As a simple example, ``LOFix()`` is the pool fix method for the Leaf
Only pool class. It implements a marking garbage collector, and does
not have to worry about scanning preserved objects because it is used
to store objects that don't contain pointers. (It is used in compiler
run-time systems to store binary data such as character strings, thus
avoiding any scanning, decoding, or remembered set overhead for them.)
``LOFix()`` filters any ambiguous pointers that aren't aligned, since
they can't point to objects it allocated. Otherwise it subtracts the
segment base address and shifts the result to get an index into a mark
bit table. If the object wasn't marked and the pointer is weak, then
it sets the pointer to zero, since the object is about to be recycled.
Otherwise, the mark bit is set, which preserves the object from
recycling when `LOReclaim` is called later on. `LOFix` illustrates
about the minimum and most efficient thing a pool fix method can do.
Other considerations
--------------------
So far this document has described the ways in which the garbage
collector is designed around optimising the critical path. There are a
few other things that the MPS does that are important.
Firstly, inlining is very important. The first stage fix is inlined
into the format scanner by being implemented in macros in
mps.h_. And to get even better inlining, `we
recommend <../manual/build.txt>`_ that the whole MPS is compiled in a
single translation unit with the client format and that strong global
optimisation is applied.
Secondly, we are very careful with code annotations on the critical
path. Assertions, statistics, and telemetry are all disabled on the
critical path in "hot" (production) builds. (In fact, it's because the
critical path is critical that we can afford to leave annotations
switched on elsewhere.)
Last, but by no means least, we pay a lot of brainpower and measurement
to the critical path, and are very very careful about changing it. Code
review around the critical path is especially vigilant.
And we write long documents about it.
References
----------
.. [MMRef] "The Memory Management Reference"; .
.. _scanning: http://www.memorymanagement.org/glossary/s.html#scan
.. _marking: http://www.memorymanagement.org/glossary/m.html#marking
.. _copying: http://www.memorymanagement.org/glossary/c.html#copying.garbage.collection
.. _`"condemns"`: http://www.memorymanagement.org/glossary/c.html#condemned.set
.. _BIBOP: http://www.memorymanagement.org/glossary/b.html#bibop
.. _`remembered set`: http://www.memorymanagement.org/glossary/r.html#remembered.set
.. _`reference tag`: http://www.memorymanagement.org/glossary/t.html#tag
.. _`Open Dylan`: http://opendylan.org/
Document History
----------------
- 2012-09-07 RB_ First draft.
- 2013-05-10 RB_ Converted to reStructuredText.
.. _RB: http://www.ravenbrook.com/consultants/rb/
Copyright and License
---------------------
Copyright © 2012-2014 Ravenbrook Limited. All rights reserved.
. This is an open source license. Contact
Ravenbrook for commercial licensing options.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
#. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
#. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
#. Redistributions in any form must be accompanied by information on how
to obtain complete source code for this software and any
accompanying software that uses this software. The source code must
either be included in the distribution or be available for no more than
the cost of distribution plus a nominal fee, and must be freely
redistributable under reasonable conditions. For an executable file,
complete source code means the source code for all modules it contains.
It does not include source code for modules or files that typically
accompany the major components of the operating system on which the
executable file runs.
**This software is provided by the copyright holders and contributors
"as is" and any express or implied warranties, including, but not
limited to, the implied warranties of merchantability, fitness for a
particular purpose, or non-infringement, are disclaimed. In no event
shall the copyright holders and contributors be liable for any direct,
indirect, incidental, special, exemplary, or consequential damages
(including, but not limited to, procurement of substitute goods or
services; loss of use, data, or profits; or business interruption)
however caused and on any theory of liability, whether in contract,
strict liability, or tort (including negligence or otherwise) arising in
any way out of the use of this software, even if advised of the
possibility of such damage.**