www.ravenbrook.com - object-debug.rst

.. highlight:: none


.. index::
   pair: debugging; design

.. _design-object-debug:


Debugging features for client objects
=====================================

.. mps:prefix:: design.mps.object-debug


Introduction
------------

:mps:tag:`intro` This is the design for all the various debugging features
that MPS clients (and sometimes MPS developers) can use to discover
what is happening to their objects and the memory space.

:mps:tag:`readership` MPS developers.


Overview
--------

:mps:tag:`over.fenceposts` In its current state, this document mostly talks
about fenceposts, straying a little into tagging where theses features
have an effect on each other.

.. note::

    There exist other documents that list other required features, and
    propose interfaces and implementations. These will eventually be
    folded into this one. Pekka P. Pirinen, 1998-09-10.


Requirements
------------

:mps:tag:`req.fencepost` Try to detect overwrites and underwrites of
allocated blocks by adding fenceposts (source req.product.??? VC++,
req.epcore.fun.debug.support).  [TODO: Locate the relevant product
requirement.  RB 2023-02-23]

:mps:tag:`req.fencepost.size` The fenceposts should be at least 4 bytes on
either side or 8 bytes if on one side only, with an adjustable content
(although VC++ only has 4 bytes with pattern 0xFDFDFDFD, having
unwisely combined the implementation with other debug features).

:mps:tag:`req.fencepost.check` There should be a function to check all the
fenceposts (source req.epcore.fun.debug.support).

:mps:tag:`req.free-block` Try to detect attempts to write and read free
blocks.

:mps:tag:`req.walk` There should be a way to map ("walk") a user function
over all allocated objects (except PS VM objects), possibly only in a
separate debugging variety/mode (source req.epcore.fun.debug.support).

:mps:tag:`req.tag` There should be a way to store at least a word of user
data (a "tag", borrowing the SW term) with every object in debugging
mode, to be used in memory dumps (source req.product.??? VC++).
[TODO: Locate the relevant product requirement.  RB 2023-02-23]

:mps:tag:`req.tag.walk` The walking function (as required by :mps:ref:`.req.walk`)
should have access to this data (source req.epcore.fun.debug.support).

:mps:tag:`req.dump.aver` It must be possible to perform a memory dump after
an :c:func:`AVER()` has fired. Naturally, if the information required for
the dump has been corrupted, it will fail, as softly as possible
(source @@@@).

:mps:tag:`req.portable` Client code that uses these features must be easily
portable to all the supported platforms. (Source: job003749_.)

.. _job003749: https://www.ravenbrook.com/project/mps/issue/job003749/

.. note::

    There are more requirements, especially about memory dumps and
    allocation locations. Pekka P. Pirinen, 1998-09-10.


Solution ideas
--------------

:mps:tag:`note.assumptions` I've tried not to assume anything about the
coincidence of manual/automatic, formatted/unformatted, and
ap/mps_alloc. I think those questions deserve to be decided on their
own merits. instead of being constrained by a debug feature.

:mps:tag:`fence.content.repeat` The content of a fencepost could be
specified as a byte/word which used repeatedly to fill the fencepost.

:mps:tag:`fence.content.template` The content could be given as a template
which is of the right size and is simply copied onto the fencepost.

:mps:tag:`fence.content.template.repeat` The content could be given as a
template which is copied repeatedly until the fencepost is full. (This
would avoid the need to specify different templates on different
architectures, and so help meet :mps:ref:`.req.portable`.)

:mps:tag:`fence.walk` :mps:ref:`.req.fencepost.check` requires the ability to find
all the allocated objects. In formatted pools, this is not a problem.
In unformatted pools, we could use the walker. It's a feasible
strategy to bet that any pool that might have to support fenceposting
will also have a walking requirement.

:mps:tag:`fence.tag` Fenceposting also needs to keep track which objects
have fenceposts. unless we manage to do them all. It would be easiest
to put this in the tags.

:mps:tag:`fence.check.object` A function to check the fenceposts on a given
object would be nice.

:mps:tag:`fence.ap` AP's could support fenceposting transparently by having
a mode where :c:func:`mps_reserve()` always goes out-of-line and fills in the
fenceposts (the pool's :c:func:`BufferFill()` method isn't involved). This
would leave the MPS with more freedom of implementation, especially
when combined with some of the other ideas. We think doing a function
call for every allocation is not too bad for debugging.

:mps:tag:`fence.outside-ap` We could also let the client insert their own
fenceposts outside the MPS allocation mechanism. Even if fenceposting
were done like this, we'd still want it to be an MPS feature, so we'd
offer sample C macros for adding the size of the fencepost and filling
in the fencepost pattern. Possibly something like this (while we could
still store the parameters in the pool or allocation point, there
seems little point in doing so in this case, and having them as
explicit parameters to the macros allows the client to specify
constants to gain effiency)::

    #define mps_add_fencepost(size, fp_size)
    #define mps_fill_fenceposts(obj, size, fp_size, fp_pattern)

The client would need to supply their own fencepost checking function,
obviously, but again we could offer one that matches the sample
macros.

:mps:tag:`fence.tail-only` In automatic pools, the presence of a fencepost
at the head of the allocated block results in the object reference
being an internal pointer. This means that the format or the pool
would need to know about fenceposting and convert between references
and pointers. This would slow down the critical path when fenceposting
is used. This can be ameliorated by putting a fencepost at the tail of
the block only: this obviates the internal pointer problem and could
provide almost the same degree of checking (provided the size was
twice as large), especially in copying pools, where there are normally
no gaps between allocated blocks. In addition to the inescapable
effects on allocation and freeing (including copying and reclaim
thereunder), only scanning would have to know about fenceposts.

:mps:tag:`fence.tail-only.under` Walking over all the objects in the pool
would be necessary to detect underwrites, as one couldn't be sure that
there is a fencepost before any given object (or where it's located
exactly). If the pool were doing the checking, it could be sure: it
would know about alignments and it could put fenceposts in padding
objects (free blocks will have them because they were once allocated)
so there'd be one on either side of any object (except at the head of
a segment, which is not a major problem, and could be fixed by adding
a padding object at the beginning of every segment). This requires
some cleverness to avoid splinters smaller than the fencepost size,
but it can be done.

:mps:tag:`fence.wrapper` On formatted pools, fenceposting could be
implemented by "wrapping" the client-supplied format at creation time.
The wrapper can handle the conversion from the fenceposted object and
back. This will be invisible to the client and gives the added benefit
that the wrapper can validate fenceposts on every format operation,
should it desire. That is, the pool would see the fenceposts as part
of the client object, but the client would only see its object; the
format wrapper would translate between the two. Note that hiding the
fenceposts from scan methods, which are required to take a contiguous
range of objects, is a bit complicated.

:mps:tag:`fence.client-format` The MPS would supply such a wrapper, but
clients could also be allowed to write their own fenceposted formats
(provided they coordinate with allocation, see below). This would make
scanning fenceposted segments more efficient.

:mps:tag:`fence.wrapper.variable` Furthermore, you could create different
classes of fencepost within a pool, because the fencepost itself could
have a variable format. For instance, you might choose to have the
fencepost be minimal (one to two words) for small objects, and more
detailed/complex for large objects (imagining that large objects are
likely vector-ish and subject to overruns). You could get really fancy
and have the fencepost class keyed to the object class (for example,
different allocation points create different classes of fenceposting).

:mps:tag:`fence.wrapper.alloc` Even with a wrapped format, allocation and
freeing would still have know about the fenceposts. If allocation
points are used, either MPS-side (:mps:ref:`.fence.ap`) or client-side
(:mps:ref:`.fence.outside-ap`) fenceposting could be used, with the obvious
modifications.

:mps:tag:`fence.wrapper.alloc.format` We could add three format methods, to
adjust the pointer and the size for alloc and free, to put down the
fenceposts during alloc, and to check them; to avoid slowing down all
allocation, this would require some MOPping to make the format class
affect the choice of the alloc and free methods (see
`mail.pekka.1998-06-11.18-18`_).

.. _mail.pekka.1998-06-11.18-18: https://info.ravenbrook.com/project/mps/mail/1998/06/11/18-18/0.txt

:mps:tag:`fence.wrapper.alloc.size` We could just communicate the size of
the fenceposts between the format and the allocation routines, but
then you couldn't use variable fenceposts
(:mps:ref:`.fence.wrapper.variable`).

.. note::

    All this applies to copying and reclaim in a straight-forward
    manner, I think.

:mps:tag:`fence.pool.wrapper` Pools can be wrapped as well. This could be a
natural way to represent/implement the fenceposting changes to the
Alloc and Free methods. [@@@@alignment]

:mps:tag:`fence.pool.new-class` We could simply offer a debugging version of
each pool class (e.g., :c:func:`mps_pool_class_mv_debug()`). As we have seen,
debugging features have synergies which make it advantageous to have a
coordinated implementation, so splitting them up would not just
complicate the client interface, it would also be an implementation
problem; we can turn features on or off with pool init parameters.

:mps:tag:`fence.pool.abstract` We could simply use pool init parameters only
to control all debugging features (optargs would be useful here).
While there migh be subclasses and wrappers internally, the client
would only see a single pool class; in the internal view, this would
be an abstract class, and the parameters would determine which
concrete class actually gets instantiated.

:mps:tag:`tag.out-of-line` It would be nice if tags were stored out-of-line,
so they can be used to study allocation patterns and fragmentation
behaviours. Such an implementation of tagging could also easily be
shared among several pools.


Architecture
------------

:mps:tag:`pool` The implementation is at the pool level, because pools
manage allocated objects. A lot of the code will be generic,
naturally, but the data structures and the control interfaces attach
to pools. In particular, clients will be able to use tagging and
fenceposting separately on each pool.

:mps:tag:`fence.size` Having fenceposts of adjustable size and pattern is
useful. Restricting the size to an integral multiple of the [pool or
format?] alignment would simplify the implementation but breaks
:mps:ref:`.req.portable`.

:mps:tag:`fence.template` We use templates (:mps:ref:`.fence.content.template`) to
fill in the fenceposts, but we do not give any guarantees about the
location of the fenceposts. This leaves us the opportunity to do
tail-only fenceposting, if we choose.

:mps:tag:`fence.slop` [see impl.c.dbgpool.FenceAlloc @@@@]

:mps:tag:`fence.check.free` We check the fenceposts when freeing an object.

:mps:tag:`unified-walk` Combine the walking and tagging requirements
(:mps:ref:`.req.tag.walk` and @@@@) into a generic facility for walking and
tagging objects with just one interface and one name: tagging. Also
combine the existing formatted object walker into this metaphor, but
allowing the format and tag parameters of the step function be
optional.

.. note::

    This part has not been implemented yet Pekka P. Pirinen,
    1998-09-10.

:mps:tag:`init` It simplifies the implementation of both tagging and
fenceposting if they are always on, so that we don't have to keep
track of which objects have been fenceposted and which have not, and
don't have to have three kinds of tags: for user data, for
fenceposting, and for both. So we determine this at pool init time
(and let fenceposting turn on tagging, if necessary).

:mps:tag:`pool-parameters` Fencepost templates and tag formats are passed in
as pool parameters.

:mps:tag:`modularity` While a combined generic implementation of tags and
fenceposts is provided, it is structured so that each part of it could
be implemented by a pool-specific mechanism with a minimum of new
protocol.

.. note::

    This will be improved, when we figure out formatted pools -- they
    don't need tags for fenceposting.

:mps:tag:`out-of-space` If there's no room for tags, we just fail to
allocate the tag. We free the block allocated for the object and fail
the allocation, so that the client gets a chance to do whatever
low-memory actions they might want to do.

This breaks the one-to-one relationship between tags and objects, so
some checks cannot be made, but we do count the "lost" tags.

.. note::

    Need to hash out how to do fenceposting in formatted pools.


Client interface
----------------

:mps:tag:`interface.fenceposting.check`
:c:func:`mps_pool_check_fenceposts()` is a function to check all
fenceposts in a pool (:c:func:`AVER()` if a problem is found)

.. note::

    From here on, these are tentative and incomplete.

.. c:function:: mps_res_t mps_fmt_fencepost_wrap(mps_fmt_t *format_return, mps_arena_t arena, mps_fmt_t format, ...)

:mps:tag:`interface.fenceposting.format` A function to wrap a format
(class) to provide fenceposting.

.. c:type:: void (*mps_fmt_adjust_fencepost_t)(size_t *size_io)

:mps:tag:`interface.fenceposting.adjust` A format method to adjust size of a
block about to be allocted to allow for fenceposts.

.. c:type:: void (*mps_fmt_put_fencepost_t)(mps_addr_t * addr_io, size_t size)

:mps:tag:`interface.fenceposting.add` A format method to add a fencepost
around a block about to be allocated. The :c:macro:`NULL` method adds a tail
fencepost.

.. c:type:: mps_bool_t (*mps_fmt_check_fenceposts_t)(mps_addr_t)

:mps:tag:`interface.fenceposting.checker` A format method to check the
fenceposts around an object. The :c:macro:`NULL` method checks tails.

.. c:function:: mps_res_t mps_alloc_dbg(mps_addr_t *, mps_pool_t, size_t, ...)
.. c:function:: mps_res_t mps_alloc_dbg_v(mps_addr_t *, mps_pool_t, size_t, va_list)

:mps:tag:`interface.tags.alloc` Two functions to extend the existing
:c:func:`mps_alloc()` (request.???.??? proposes to remove the varargs)
[TODO: Locate the relevant Harlequin request.  RB 2023-02-23]

.. c:type:: void (*mps_objects_step_t)(mps_addr_t addr, size_t size, mps_fmt_t format, mps_pool_t pool, void *tag_data, void *p)

:mps:tag:`interface.tags.walker.type` Type of walker function for
:c:func:`mps_pool_walk()` and :c:func:`mps_arena_walk()`.

:mps:tag:`interface.tags.walker` Functions to walk all the allocated
objects in an arena (only client pools in this case),
``format`` and ``tag_data`` can be :c:macro:`NULL` (``tag_data`` really wants
to be ``void *``, not :c:type:`mps_addr_t`, because it's stored
together with the internal tag data in an MPS internal pool)


Examples
--------

:mps:tag:`example.debug-alloc` ::

    #define MPS_ALLOC_DBG(res_io, addr_io, pool, size)
      MPS_BEGIN
        static mps_tag_A_s _ts = { __FILE__, __LINE__ };

        *res_io = mps_alloc(addr_io, pool, size, _ts_)
      MPS_END


Implementation
--------------

:mps:tag:`new-pool` The client interface to control fenceposting
consists of the new classes :c:func:`mps_pool_class_mv_debug()`,
:c:func:`mps_pool_class_epdl_debug()`, and
:c:func:`mps_pool_class_epdr_debug()`, and their new init parameter of
type :c:type:`mps_pool_debug_option_s`.

.. note::

    This is a temporary solution, to get it out without writing lots
    of new interface. Pekka P. Pirinen, 1998-09-10.

:mps:tag:`new-pool.impl` The debug pools are implemented using the "class
wrapper" :c:func:`EnsureDebugClass()`, which produces a subclass with
modified ``init``, ``finish``, ``alloc``, and ``free`` methods. These
methods are implemented in the generic debug class code
(``impl.c.dbgpool``), and are basically wrappers around the superclass
methods (invoked through the ``pool->class->super`` field). To find
the data stored in the class for the debugging features, they use the
``debugMixin`` method provided by the subclass. So to make a debug
subclass, three things should be provided: a structure definition of
the instance containing a :c:type:`PoolDebugMixinStruct`, a pool class
function that uses :c:func:`EnsureDebugClass()`, and a ``debugMixin`` method
that locates the :c:type:`PoolDebugMixinStruct` within an instance.

:mps:tag:`tags.splay` The tags are stored in a splay tree of tags
allocated from a subsidiary MFS pool. The client needs to specify the
(maximum) size of the client data in a tag, so that the pool can be
created.

.. note::

    Lots more should be said, eventually. Pekka P. Pirinen,
    1998-09-10.