www.ravenbrook.com - collection.rst

.. highlight:: none


.. index::
   pair: collection framework; design

.. _design-collection:


Collection framework
====================

.. mps:prefix:: design.mps.collection


Introduction
------------

:mps:tag:`intro` This document describes the Collection Framework. It's a
framework for implementing garbage collection techniques and
integrating them into a system of collectors that all cooperate in
recycling garbage.


Overview
--------

:mps:tag:`framework` MPS provides a framework that allows the integration of
many different types of GC strategies and provides many of the basic
services that those strategies use.

:mps:tag:`framework.cover` The framework subsumes most major GC strategies
and allows many efficient techniques, like in-line allocation or
software barriers.

:mps:tag:`framework.overhead` The overhead due to cooperation is low.

.. note::

    But not non-existent. Can we say something useful about it?

:mps:tag:`framework.benefits` The ability to combine collectors contributes
significantly to the flexibility of the system. The reduction in code
duplication contributes to reliability and integrity. The services of
the framework make it easier to write new MM strategies and
collectors.

:mps:tag:`framework.mpm` The Collection Framework is merely a part of the
structure of the MPM. See design.mps.architecture and design.mps.arch
for the big picture.

.. note::

    Those two documents should be combined into one. Pekka P. Pirinen,
    1998-01-15.

Other notable components that the MPM manages to integrate into a
single framework are manually-managed memory and finalization services
(see design.mps.finalize_).

.. _design.mps.finalize: finalize.html

.. note::

    A document describing the design of manually-managed memory is
    missing. Pekka P. Pirinen, 1998-01-15.

:mps:tag:`see-also` This document assumes basic familiarity with the ideas
of pool (see design.mps.arch.pools) and segment (see
design.mps.seg.over).


Collection abstractions
-----------------------

Colours, scanning and fixing
............................

:mps:tag:`state` The framework knows about the three colours of the
tri-state abstraction and free blocks. Recording the state of each
object is the responsibility of the pool, but the framework gets told
about changes in the states and keeps track of colours in each
segment. Specifically, it records whether a segment might contain
white, grey and black objects with respect to each active trace (see
:mps:ref:`.tracer`)

.. note::

    Black not currently implemented. Pekka P. Pirinen, 1998-01-04.

(A segment might contain objects of all colours at once, or none.)
This information is approximate, because when an object changes
colour, or dies, it usually is too expensive to determine if it was
the last object of its former colour.

:mps:tag:`state.transitions` The possible state transitions are as follows::

    free   ---alloc--> black (or grey) or white or none
    none   --condemn-> white
    none   --refine--> grey
    grey   ---scan---> black
    white  ----fix---> grey (or black)
    black  --revert--> grey
    white  --reclaim-> free
    black  --reclaim-> none

:mps:tag:`none-is-black` Outside of a trace, objects don't really have
colour, but technically, the colour is black. Objects are only
allocated grey or white during a trace, and by the time the trace has
finished, they are either dead or black, like the other surviving
objects. We might then reuse the colour field for another trace, so
it's convenient to set the colour to black when allocating outside a
trace. This means that refining the foundation
(analysis.tracer.phase.condemn.refine), actually turns black segments
grey, rather than vice versa, but the principle is the same.

:mps:tag:`scan-fix` "Scanning" an object means applying the "fix" function
to all references in that object. Fixing is the generic name for the
operation that takes a reference to a white object and makes it
non-white (usually grey, but black is a possibility, and so is
changing the reference as we do for weak references). Typical examples
of fix methods are copying the object into to-space or setting its
mark bit.

:mps:tag:`cooperation` The separation of scanning and fixing is what allows
different GC techniques to cooperate. The scanning is done by a method
on the pool that the scanned object resides in, and the fixing is done
by a method on the pool that the reference points to.

:mps:tag:`scan-all` Pools provide a method to scan all the grey objects in a
segment.


Reference sets
..............

:mps:tag:`refsets` The cost of scanning can be significantly reduced by
storing remembered sets. We have chosen a very compact and efficient
implementation, called reference sets, or refsets for short (see
idea.remember).

.. note::

    design.mps.refset is empty! Perhaps some of this should go there.
    Pekka P. Pirinen, 1998-02-19.

This makes the cost of maintaining them low, so we maintain them for
all references out of all scannable segments.

:mps:tag:`refsets.approx` You might describe refsets as summaries of all
references out of an area of memory, so they are only approximations
of remembered sets. When a refset indicates that an interesting
reference might be present in a segment, we still have to scan the
segment to find it.

:mps:tag:`refsets.scan` The refset information is collected during scanning.
The scan state protocol provides a way for the pool and the format
scan methods to cooperate in this, and to pass this information to the
tracer module which checks it and updates the segment (see
design.mps.scan_).

.. _design.mps.scan: scan.html

.. note::

    Actually, there's very little doc there. Pekka P. Pirinen,
    1998-02-17.

:mps:tag:`refsets.maintain` The MPS tries to maintain the refset information
when it moves or changes object.

:mps:tag:`refsets.pollution` Ambiguous references and pointers outside the
arena will introduce spurious zones into the refsets. We put up with
this to keep the scanning costs down. Consistency checks on refsets
have to take this into account.

:mps:tag:`refsets.write-barrier` A write-barrier are needed to keep the
mutator from invalidating the refsets when writing to a segment. We
need one on any scannable segment whose refset is not a superset of
the mutator's (and that the mutator can see). If we know what the
mutator is writing and whether it's a reference, we can just add that
reference to the refset (figuring out whether anything can be removed
from the refset is too expensive). If we don't know or if we cannot
afford to keep the barrier up, the framework can union the mutator's
refset to the segment's refset.

:mps:tag:`refset.mutator` The mutator's refset could be computed during root
scanning in the usual way, and then kept up to date by using a
read-barrier. It's not a problem that the mutator can create new
pointers out of nothing behind the read-barrier, as they won't be real
references. However, this is probably not cost-effective, since it
would cause lots of barrier hits. We'd need a read-barrier on every
scannable segment whose refset is not a subset of the mutator's (and
that the mutator can see). So instead we approximate the mutator's
refset with the universal refset.


The tracer
----------

:mps:tag:`tracer` The tracer is an engine for implementing multiple garbage
collection processes. Each process (called a "trace") proceeds
independently of the others through five phases as described in
analysis.tracer. The following sections describe how the action of
each phase fits into the framework. See design.mps.trace_ for details

.. _design.mps.trace: trace.html

.. note::

    No, there's not much there, either. Possibly some of this section
    should go there. Pekka P. Pirinen, 1998-02-18.

:mps:tag:`combine` The tracer can also combine several traces for some
actions, like scanning a segment or a root. The methods the tracer
calls to do the work get an argument that tells them which traces they
are expected to act for.

.. note::

    Extend this.

:mps:tag:`trace.begin` Traces are started by external request, usually from
a client function or an action (see design.mps.action).

:mps:tag:`trace.progress` The tracer gets time slices from the arena to work
on a given trace.

.. note::

    This is just a provisional arrangement, in lieu of real progress
    control. Pekka P. Pirinen, 1998-02-18.

In each slice, it selects a small amount of work to do, based on the
state of the trace, and does it, using facilities provided by the
pools.

:mps:tag:`trace.scan` A typical unit of work is to scan a single segment.
The tracer can choose to do this for multiple traces at once, provided
the segment is grey for more than one trace.

:mps:tag:`trace.barrier` Barrier hits might also cause a need to scan :mps:a
segment (see :mps:ref:`.hw-barriers.hit`). Again, the tracer can
:mps:choose to combine traces, when it does this.

:mps:tag:`mutator-colour` The framework keeps track of the colour of the
mutator separately for each trace.


The condemn phase
.................

:mps:tag:`phase.condemn` The agent that creates the trace (see
:mps:ref:`.trace.begin`) determines the condemned set and colours it white.
The tracer then examines the refsets on all scannable segments, and if
it can deduce some segment cannot refer to the white set, it's
immediately coloured black, otherwise the pool is asked to grey any
objects in the segment that might need to be scanned (in copying
pools, this is typically the whole segment).

:mps:tag:`phase.condemn.zones` To get the maximum benefit from the refsets,
we try to arrange that the zones are a minimal superset (for example,
generations uniquely occupy zones) and a maximal subset (there's
nothing else in the zone) of the condemned set. This needs to be
arranged at allocation time (or when copying during collection, which
is much like allocation)

.. note::

    Soon, this will be handled by segment loci, see design.mps.locus_.

    .. _design.mps.locus: locus.html

:mps:tag:`phase.condemn.mutator` At this point, the mutator might reference
any objects, that is, it is grey. Allocation can be in any colour,
most commonly white.

.. note::

    More could be said about this.


The grey mutator phase
......................

:mps:tag:`phase.grey-mutator` Grey segments are chosen according to some
sort of progress control and scanned by the pool to make them black.
Eventually, the tracer will decide to flip or it runs out of grey
segments, and proceeds to the next phase.

.. note::

    Currently, this phase has not been implemented; all traces flip
    immediately after condemn. Pekka P. Pirinen, 1998-02-18.

:mps:tag:`phase.grey-mutator.copy` At this stage, we don't want to copy
condemned objects, because we would need an additional barrier to keep
the mutator's view of the heap consistent (see
analysis.async-gc.copied.pointers-and-new-copy).

:mps:tag:`phase.grey-mutator.ambig` This is a good time to get all ambiguous
scanning out of the way, because we usually can't do any after the
flip and because it doesn't cause any copying.

.. note::

     Write a detailed explanation of this some day.


The flip phase
..............

:mps:tag:`phase.flip` The roots (see design.mps.root_) are scanned. This has
to be an atomic action as far as the mutator is concerned, so all
threads are suspended for the duration.

.. _design.mps.root: root.html

:mps:tag:`phase.flip.mutator` After this, the mutator is black: if we use a
strong barrier (analysis.async-gc.strong), this means it cannot refer
to white objects. Allocation will be in black (could be grey as well,
but there's no point to it).


The black mutator phase
.......................

:mps:tag:`phase.black-mutator` Grey segments are chosen according to some
sort of progress control and scanned by the pool to make them black.
Eventually, the tracer runs out of segments that are grey for this
trace, and proceeds to the next phase.

:mps:tag:`phase.black-mutator.copy` At this stage white objects can be
relocated, because the mutator cannot see them (as long as a strong
barrier is used, as we must do for a copying collection, see
analysis.async-gc.copied.pointers).


The reclaim phase
.................

:mps:tag:`phase.reclaim` The tracer finds the remaining white segments and
asks the pool to reclaim any white objects in them.

:mps:tag:`phase.reclaim.barrier` Once a trace has started reclaiming
objects, the others shouldn't try to scan any objects that are white
for it, because they might have dangling pointers in them.

.. note::

    Needs cross-reference to document that is yet to be written.

    Currently, we reclaim atomically, but it could be incremental, or
    even overlapped with a new trace on the same condemned set.
    Pekka P. Pirinen, 1997-12-31.


Barriers
--------

.. note::

    An introduction and a discussion of general principles should go
    here. This is a completely undesigned area.


Hardware barriers
.................

:mps:tag:`hw-barriers` Hardware barrier services cannot, by their very
nature, be independently provided to each trace. A segment is either
protected or not, and we have to set the protection on a segment if
any trace needs a hardware barrier on it.

:mps:tag:`hw-barriers.supported` The framework currently supports
segment-oriented Appel-Ellis-Li barriers
(analysis.async-gc.barrier.appel-ellis-li), and write-barriers for
keeping the refsets up-to-date. It would not be hard to add Steele
barriers (analysis.async-gc.barrier.steele.scalable).

:mps:tag:`hw-barriers.hit` When a barrier hit happens, the arena determines
which segment it was on. The segment colour info is used to determine
whether it had trace barriers on it, and if so, the appropriate
barrier action is performed, using the methods of the owning pool. If
the segment was write-protected, its refset is unioned with the refset
of the mutator.

.. note:: In practice this is ``RefSetUNIV``.

:mps:tag:`hw-barriers.hit.multiple` Fortunately, if we get a barrier hit on
a segment with multiple trace barriers on it, we can scan it for all
the traces that it had a barrier for.

.. note:: Needs link to unwritten section under :mps:ref:`.combine`.


Software barriers
.................

.. note::

    Write something about software barriers.