.. mode: -*- rst -*-
The critical path through the MPS
=================================
:Author: Richard Brooksby
:Organization: Ravenbrook Limited
:Date: 2012-09-07
:Revision: $Id: //info.ravenbrook.com/project/mps/save-errno-win32/design/critical-path.txt#1 $
:Copyright: See section `Copyright and License`_.
:Index terms:
single: critical path
single: path; critical
single: Memory Pool System; critical path
Introduction
------------
The critical path is a key concept in the design of the `Memory Pool
System `_. Code on the critical
path is usually executed more than any other code in the process. A
change of just one instruction on the critical path can make as much as
a 1% difference in overall run-time. A lot of the design of the MPS is
arranged around making the critical path as short and fast as possible.
This document describes the critical path and explains some of that
design, with reference to more detailed documents.
What makes the critical path critical
-------------------------------------
In order to determine which objects can be recycled, the garbage
collector has to frequently examine a very large number of pointers in
the program's objects. It does this by scanning_ memory, both
allocated objects and roots (such as the thread stacks).
This means that the scanning functions must loop over pretty much *every
word in memory* sooner or later. The MPS takes great pains to avoid
scanning memory which does not need scanning, but to get good
performance, scanning must be highly optimised.
What's more, the scanning functions apply an operation called "fix" to
every pointer (or potential pointer) that they find in the objects in
memory. Fixing also attempts to eliminate uninteresting pointers as
fast as possible, but it has to do some work on every object that is
being considered for recycling, and that can be a large proportion of
the objects in existence. The path through fixing must also be highly
optimised, especially in the early stages.
How the MPS avoids scanning and fixing
--------------------------------------
This is just a brief overview of how the MPS is designed to reduce
unnecessary scanning and fixing.
Firstly, the MPS must occasionally decide which objects to try to
recycle. It does this using various facts it knows about the objects,
primarily their age and whether they've survived previous attempts at
recycling them. It then "`condemns`_" a large number of objects
at once, and each of these objects must be "preserved" by fixing
references to them.
When the MPS condemns objects it chooses sets of objects in a small set
of "zones" in memory (preferably a single zone). The zone of an object
can be determined extremely quickly from its address, without looking at
the object or any other data structure.
The MPS arranges that objects which will probably die at the same time
are in the same zones.
The MPS allocates in "segments". Each segment is of the order of one
"tract" of memory (generally the same as the operating system page
size, usually 4 KiB or 8 KiB) but may be larger if there are large
objects inside. The MPS maintains a "summary" of the zones pointed to
by all the pointers in a segment from previous scans.
So, once the MPS has decided what to condemn, it can quickly eliminate
all segments which definitely do not point to anything in those zones.
This avoids a large amount of scanning. It is an implementation of a
`remembered set`_, though it is unlike that in most other garbage
collectors.
In addition, the fix operation can quickly ignore pointers to the wrong
zones. This is called the "zone check" and is a BIBOP_ technique.
Even if a pointer passes the zone check, it may still not point to a
segment containing condemned objects. The next stage of the fix
operation is to look up the segment pointed to by the pointer and see if
it was condemned. This is a fast lookup.
After that, each pool class must decide whether the pointer is to a
condemned object and do something to preserve it. This code is still
critical. The MPS will have tried to condemn objects that are dead,
but those objects are still likely to be in segments with other
objects that must be preserved. The segment class fix method must
quickly distinguish between them.
Furthermore, many objects will be preserved at least once in their
lifetime, so even the code that preserves an object needs to be highly
efficient. (Programs in languages like ML might not preserve 95% of
their objects even once, but many other programs will preserve nearly
all of theirs many times.)
Where to find the critical path
-------------------------------
Very briefly, the critical path consists of five stages:
#. The scanner, which iterates over pointers in objects. The MPS has
several internal scanners, but the most important ones will be
format scanners in client code registered through
``mps_fmt_create_k()``.
.. note::
There needs to be a chapter in the manual explaining how to
write a good scanner. Then that could be linked from here.
#. The first-stage fix, which filters out pointers inline in the
scanner. This is implemented in the ``MPS_FIX1()`` macro in
mps.h_.
.. _mps.h: ../code/mps.h
#. The second-stage fix, which filters out pointers using general
information about segments. This is ``_mps_fix2()`` in trace.c_.
.. _trace.c: ../code/trace.c
#. The third-stage fix, which filters out pointers using
segment-specific information. Implemented in segment class
functions called ``amcSegFix()``, ``loSegFix()``, etc. in pool*.c.
#. Preserving the object, which might entail:
- marking_ it to prevent it being recycled; and/or
- copying_ it and updating the original pointer (or just
updating the pointer, if the object has previously been
copied); and/or
- adding it to a queue of objects to be scanned later, if it
contains pointers.
The format scanner
------------------
The critical path starts when a format scan method is called. That is
a call from the MPS to a client function of type ``mps_fmt_scan_t``
registered with ``mps_fmt_create_k()``.
Here is an example of part of a format scanner for scanning contiguous
runs of pointers, from fmtdy.c_, the scanner for the `Open Dylan`_
runtime::
static mps_res_t dylan_scan_contig(mps_ss_t mps_ss,
mps_addr_t *base, mps_addr_t *limit)
{
mps_res_t res;
mps_addr_t *p; /* reference cursor */
mps_addr_t r; /* reference to be fixed */
MPS_SCAN_BEGIN(mps_ss) {
p = base;
loop: if(p >= limit) goto out;
r = *p++;
if(((mps_word_t)r&3) != 0) /* pointers tagged with 0 */
goto loop; /* not a pointer */
if(!MPS_FIX1(mps_ss, r)) goto loop;
res = MPS_FIX2(mps_ss, p-1);
if(res == MPS_RES_OK) goto loop;
return res;
out: assert(p == limit);
} MPS_SCAN_END(mps_ss);
return MPS_RES_OK;
}
.. _fmtdy.c: ../code/fmtdy.c
(To help with understanding optimisation of this code, it's written in
a pseudo-assembler style, with one line roughly corresponding to each
instruction of an idealized intermediate code.)
The MPS C interface provides macros to try to help optimise this code.
The ``mps_ss`` object is a "scan state" and contains data that is used
to eliminate uninteresting pointers now, and record information which
will be used to reduce scanning in future by maintaining the
remembered set.
The macros ``MPS_SCAN_BEGIN()`` and ``MPS_SCAN_END()`` load key data
from the scan state into local variables, and hopefully into processor
registers. This avoids aliasing values that we know won't change when
calls are made to ``_mps_fix2()`` later, and so allows the compiler to
keep the scan loop small and avoid unnecessary memory references.
This scanner knows that words not ending in 0b00 aren't pointers to
objects, so it eliminates them straight away. This is a kind of
`reference tag`_ chosen by the client for its object representation.
Next, the pointer is tested using ``MPS_FIX1()``. This performs fast
tests on the pointer without using any other memory. In particular, it
does the "zone check" described in section 3. If a pointer fails these
tests, it isn't interesting and can be skipped. It is very important
to proceed to the next pointer as fast as possible in this case.
Having passed these tests, we need to fix the pointer using other data
in memory, and possibly call the MPS to preserve the object. This is
what ``MPS_FIX2()`` does. The important distinction here is that
``MPS_FIX2()`` can fail and return an error code, which must be
propagated without ado by returning from the scanner. Separating
``MPS_FIX1()`` from ``MPS_FIX2()`` helps keep the error handling code
away from the tight loop with the zone check.
``MPS_FIX*``, the macro/inline part of the fix operation, are referred
to as "fix stage 1" or "the first stage fix" in other documents and
comments.
If these inline checks pass, ``_mps_fix2()`` is called. If the MPS has
been built as a separate object file or library, this is where the
function call out of the scan loop happens. Since version 1.110 of the
MPS, we encourage clients to compile the MPS in the same translation
unit as their format code, so that the compiler can be intelligent
about inlining parts of ``_mps_fix2()`` in the format scanner. The
instructions for doing this are in `Building the Memory Pool System
<../manual/build.txt>`_, part of the manual.
The second stage fix in the MPM
-------------------------------
If a pointer gets past the first-stage fix filters, it is passed to
``_mps_fix2()``, the "second stage fix". The second stage can filter
out yet more pointers using information about segments before it has
to consult the pool class.
The first test is to determine if the address points to a *chunk* (a
contiguous region of address space managed by the arena). Addresses
that do not point to any chunk (for example, ambiguous references that
are not in fact pointers) are rejected immediately. See
``ChunkOfAddr()``.
When there are many chunks (that is, when the arena has been extended
many times), this test can consume the majority of the garbage
collection time. This is the reason that it's important to give a good
estimate of the amount of address space you will ever occupy with
objects when you initialize the arena.
The second test applied is the "tract test". The MPS looks up the
tract containing the address in the tract table, which is a simple
linear table indexed by the address shifted---a kind of flat page
table. See ``TractOfAddr()``.
If the pointer is in a tract allocated with garbage collected objects,
then the table also contains a pointer to a "segment", which contains
a bitfield representing the "white set"---the set of garbage
collection traces for which the tract is "interesting". If a segment
isn't interesting, then we know that it contains no condemned objects,
and we can filter out the pointer.
The MPM can't know anything about the internal layout of the segment,
so at this point we dispatch to the third stage fix.
This dispatch is slightly subtle. We have a cache of the function to
dispatch to in the scan state, which has recently been looked at and
is with luck still in the processor cache. The reason there is a
dispatch at all is to allow for a fast changeover to emergency garbage
collection, or overriding of garbage collection with extra operations.
Those are beyond the scope of this document. Normally, ``ss->fix``
points at ``SegFix()``.
``SegFix()`` is passed the segment, which is fetched from the tract
table entry, and that should be in the cache. ``SegFix()`` itself
dispatches to the segment class.
The third stage fix in the segment class
----------------------------------------
The final stage of fixing is entirely dependent on the segment class.
The MPM can't, in general, know how the objects within a segment are
arranged, so this is segment class specific code.
Furthermore, the segment class must make decisions based on the
"reference rank" of the pointer. If a pointer is ambiguous
(``RankAMBIG``) then it can't be changed, so even a copying segment
class can't move an object. On the other hand, if the pointer is weak
(``RankWEAK``) then the segment fix method shouldn't preserve the
object at all, even if it's condemned.
The exact details of the logic that the segment fix must implement in
order to co-operate with the MPM and other pools are beyond the scope
of this document, which is about the critical path. Since it is on
the critical path, it's important that whatever the segment fix does is
simple and fast and returns to scanning as soon as possible.
The first step, though, is to further filter out pointers which aren't
to objects, if that's its policy. Then, it may preserve the object,
according to its policy, and possibly ensure that the object gets
scanned at some point in the future, if it contains more pointers.
If the object is moved to preserve it (for instance, if the pool class
implements a copying collector), or was already moved when fixing a
previous reference to it, the reference being fixed must be updated
(this is the origin of the term "fix").
As a simple example, ``loSegFix()`` is the segment fix method for
segments belonging to the LO (Leaf Object) pool class. It implements a
marking garbage collector, and does not have to worry about scanning
preserved objects because it is used to store objects that don't
contain pointers. (It is used in compiler run-time systems to store
binary data such as character strings, thus avoiding any scanning,
decoding, or remembered set overhead for them.)
``loSegFix()`` filters any ambiguous pointers that aren't aligned,
since they can't point to objects it allocated. Otherwise it subtracts
the segment base address and shifts the result to get an index into a
mark bit table. If the object wasn't marked and the pointer is weak,
then it sets the pointer to zero, since the object is about to be
recycled. Otherwise, the mark bit is set, which preserves the object
from recycling when ``loSegReclaim()`` is called later on.
``loSegFix()`` illustrates about the minimum and most efficient thing
a segment fix method can do.
Other considerations
--------------------
So far this document has described the ways in which the garbage
collector is designed around optimising the critical path. There are a
few other things that the MPS does that are important.
Firstly, inlining is very important. The first stage fix is inlined
into the format scanner by being implemented in macros in
mps.h_. And to get even better inlining, `we
recommend <../manual/build.txt>`_ that the whole MPS is compiled in a
single translation unit with the client format and that strong global
optimisation is applied.
Secondly, we are very careful with code annotations on the critical
path. Assertions, statistics, and telemetry are all disabled on the
critical path in "hot" (production) builds. (In fact, it's because the
critical path is critical that we can afford to leave annotations
switched on elsewhere.)
Last, but by no means least, we pay a lot of brainpower and measurement
to the critical path, and are very very careful about changing it. Code
review around the critical path is especially vigilant.
And we write long documents about it.
References
----------
.. _scanning: https://www.memorymanagement.org/glossary/s.html#scan
.. _marking: https://www.memorymanagement.org/glossary/m.html#marking
.. _copying: https://www.memorymanagement.org/glossary/c.html#copying.garbage.collection
.. _condemns: https://www.memorymanagement.org/glossary/c.html#condemned.set
.. _BIBOP: https://www.memorymanagement.org/glossary/b.html#bibop
.. _remembered set: https://www.memorymanagement.org/glossary/r.html#remembered.set
.. _reference tag: https://www.memorymanagement.org/glossary/t.html#tag
.. _Open Dylan: https://opendylan.org/
Document History
----------------
- 2012-09-07 RB_ First draft.
- 2013-05-10 RB_ Converted to reStructuredText.
.. _RB: https://www.ravenbrook.com/consultants/rb/
Copyright and License
---------------------
Copyright © 2012–2020 `Ravenbrook Limited `_.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.