Ravenbrook / Projects / Memory Pool System / Master Product Sources / Design Documents

Memory Pool System Project


         THE DESIGN OF A NEW MANUAL-VARIABLE MEMORY POOL CLASS
                           design.mps.poolmvt
                              draft design
                       P T Withington 1998-02-13


INTRODUCTION:

This is a second-generation design for a pool that manually manages
variable-sized objects. It is intended as a replacement for poolmv (except in
its control pool role) and poolepdl, and it is intended to satisfy the
requirements of the Dylan "misc" pool and the product malloc/new drop-in
replacement.

[This form should include these fields, rather than me having to create them
"by hand"]

.readership: MM developers

.source: req.dylan(6), req.epcore(16), req.product(2)

.background: design.mps.poolmv(0), design.mps.poolepdl(0),
design.product.soft.drop(0), paper.wil95(1), paper.vo96(0), paper.grun92(1),
paper.beck82(0), mail.ptw.1998-02-25.22-18(0)

.hist.-1: Initial email discussion mail.ptw.1998-02-04.21-27(0), ff.
.hist.0: Draft created 1998-02-13 by P. T. Withington from email RFC
mail.ptw.1998-02-12.03-36, ff.
.hist.1: Revised 1998-04-01 in response to email RFC
mail.ptw.1998-03-23.20-43(0), ff.
.hist.2: Revised 1998-04-15 in response to email RFC
mail.ptw.1998-04-13.21-40(0), ff.
.hist.3: Erroneously incremented version number
.hist.4: Revised 1998-05-06 in response to review review.design.mps.poolmv2.2
(0)


DEFINITIONS

.def.alignment: Alignment is a constraint on an object's address, typically to
be a power of 2 (see also, glossary.alignment )

.def.bit-map: A bitmap is a boolean-valued vector (see also, glossary.bitmap ).

.def.block: A block is a contiguous extent of memory.  In this document, block
is used to mean a contiguous extent of memory managed by the pool for the pool
client, typically a subset of a segment (compare with .def.segment).

.def.cartesian-tree: A cartesian tree is a binary tree ordered by two keys
(paper.stephenson83(0)).

.def.crossing-map: A mechanism that supports finding the start of an object
from any address within the object, typically only required on untagged
architectures (see also, glossary.crossing.map ).

.def.footer: A block of descriptive information describing and immediately
following another block of memory (see also .def.header).

.def.fragmentation: Fragmented memory is memory reserved to the program but not
usable by the program because of the arrangement of memory already in use (see
also, glossary.fragmentation ).

.def.header: A block of descriptive information describing and immediately
preceding another block of memory (see also, glossary.in-band.header ).

.def.in-band: From "in band signalling", when descriptive information about a
data structure is stored in the data structure itself (see also,
glossary.in-band.header ).

.def.out-of-band: When descriptive information about a data structure is stored
separately from the structure itself (see also, glossary.out-of-band.header ).

.def.refcount: A refcount is a count of the number of users of an object (see
also, glossary.reference.count ).

.def.segment: A segment is a contiguous extent of memory.  In this document,
segment is used to mean a contiguous extent of memory managed by the MPS arena
(design.mps.arena(1)) and subdivided by the pool to provide blocks (see
.def.block) to its clients.

.def.splay-tree: A splay tree is a self-adjusting binary tree (paper.st85(0),
paper.sleator96(0)).

.def.splinter: A splinter is a fragment of memory that is too small to be
useful (see also, glossary.splinter )

.def.subblock: A subblock is a contiguous extent of memory.  In this document,
subblock is used to mean a contiguous extent of memory manage by the client for
its own use, typically a subset of a block (compare with .def.block).


ABBREVIATIONS

.abbr.abq: ABQ = Available Block Queue

.abbr.ap: AP = Allocation Point

.abbr.cbs: CBS = Coalescing Block Structure

.abbr.mps: MPS = Memory Pool System

.abbr.mv: MV = Manual-Variable

.abbr.ps: PS = PostScript



OVERVIEW:

mvt is intended to satisfy the requirements of the clients that need
manual-variable pools, improving on the performance of the existing
manual-variable pool implementations, and reducing the duplication of code that
currently exists. The expected clients of mvt are: Dylan (currently for its
misc pool), EP (particularly the dl pool, but all pools other than the PS
object pool), and Product (initially the malloc/new pool, but also other manual
pool classes).


REQUIREMENTS:

.req.cat: Requirements are categorized per guide.req(2).

.req.risk: req.epcore(16) is known to be obsolete, but the revised document has
not yet been accepted.


Critical Requirements

.req.fun.man-var: The pool class must support manual allocation and freeing of
variable-sized blocks (source: req.dylan.fun.misc.alloc,
req.epcore.fun.{dl,gen,tmp,stat,cache,trap}.{alloc,free},
req.product.fun.{malloc,new,man.man}).

.non-req.fun.gc: There is not a requirement that the pool class support
formatted objects, scanning, or collection objects; but it should not be
arbitrarily precluded.

.req.fun.align: The pool class must support aligned allocations to
client-specified alignments.  An individual instance need only support a single
alignment; multiple instances may be used to support more than one alignment
(source: req.epcore.attr.align).

.req.fun.reallocate: The pool class must support resizing of allocated blocks
(source req.epcore.fun.dl.promise.free, req.product.dc.env.{ansi-c,cpp}).

.non-req.fun.reallocate.in-place: There is not a requirement blocks must be
resized in place (where possible); but it seems like a good idea.

.req.fun.thread: Each instance of the pool class must support multiple threads
of allocation (source req.epcore.fun.dl.multi, req.product.dc.env.{ansi-c,cpp}).

.req.attr.performance: The pool class must meet or exceed performance of
"competitive" allocators (source: rec.epcore.attr.{run-time,tp},
req.product.attr.{mkt.eval, perform}).  [Dylan does not seem to have any
requirement that storage be allocated with a particular response time or
throughput, just so long as we don't block for too long. Clearly there is a
missing requirement.]

.req.attr.performance.time: By inference, the time overhead must be competetive.

.req.attr.performance.space: By inference, the space overhead must be
competetive.

.req.attr.reliability: The pool class must have "rock-solid reliability"
(source: req.dylan.attr.rel.mtbf, req.epcore.attr.rel, req.product.attr.rel).

.req.fun.range: The pool class must be able to manage blocks ranging in size
from 1 byte to all of addressable memory
(req.epcore.attr.{dl,gen,tmp,stat,cache,trap}.obj.{min,max}.  The range
requirement may be satisfied by multiple instances each managing a particular
client-specified subrange of sizes.  [Dylan has requirements
req.dylan.attr.{capacity,obj.max}, but no requirement that such objects reside
in a manual pool.]

.req.fun.debug: The pool class must support debugging erroneous usage by client
programs (source: req.epcore.fun.{dc.variety, debug.support},
req.product.attr.{mkt.eval,perform}).  Debugging is permitted to incur
additional overhead.

.req.fun.debug.boundaries: The pool class must support checking for accesses
outside the boundaries of live objects.

.req.fun.debug.log: The pool class must support logging of all allocations and
deallocations.

.req.fun.debug.enumerate: The pool class must support examining all allocated
objects.

.req.fun.debug.free: The pool class must support detecting incorrect,
overlapping, and double frees.

.req.fun.tolerant: The pool class must support tolerance of erroneous usage
(source req.product.attr.use.level.1).


Essential Requirements

.req.fun.profile: The pool class should support memory usage profiling (source:
req.product.attr.{mkt.eval, perform}).

.req.attr.flex: The pool class should be flexible so that it can be tuned to
specific allocation and freeing patterns (source:
req.product.attr.flex,req.epcore.attr.{dl,cache,trap}.typ).  The flexibility
requirement may be satisfied by multiple instances each optimizing a specific
pattern.

.req.attr.adapt: The pool class should be adaptive so that it can accommodate
changing allocation and freeing patterns (source:
req.epcore.fun.{tmp,stat}.policy, req.product.attr.{mkt.eval,perform}).


Nice Requirements

.req.fun.suballocate: The pool class may support freeing of any aligned,
contiguous subset of an allocated block (source req.epcore.fun.dl.free.any,
req.product.attr.{mkt.eval,perform}).




ARCHITECTURE:

.arch.overview: The pool has several layers: client allocation is by Allocation
Points (APs). .arch.overview.ap: APs acquire storage from the pool
available-block queue (ABQ). .arch.overview.abq: The ABQ holds blocks of a
minimum configurable size: "reuse size".  .arch.overview.storage: The ABQ
acquires storage from the arena or from the coalescing-block structure (CBS).
.arch.overview.storage.contiguous: The arena storage is requested to be
contiguous to maximize opportunities for coalescing (Loci will be used when
available). .arch.overview.cbs: The CBS holds blocks freed by the client until,
through coalescing, they have reached the reuse size, at which point they are
made available on the ABQ.

.arch.ap: The pool will use allocation points as the allocation interface to
the client. .arch.ap.two-phase: Allocation points will request blocks from the
pool and suballocate those blocks (using the existing AP, compare and
increment, 2-phase mechanism) to satisfy client requests. .arch.ap.fill: The
pool will have a configurable "fill size" that will be the preferred size block
used to fill the allocation point. .arch.ap.fill.size: The fill size should be
chosen to amortize the cost of refill over a number of typical reserve/commit
operations, but not so large as to exceed the typical object population of the
pool. .arch.ap.no-fit: When an allocation does not fit in the remaining space
of the allocation point, there may be a remaining fragment.
.arch.ap.no-fit.sawdust: If the fragment is below a configurable threshold
(minimum size), it will be left unused (but returned to the CBS so it will be
reclaimed when adjacent objects are freed); .arch.ap.no-fit.splinter:
otherwise, the remaining fragment will be (effectively) returned to the head of
the available-block queue, so that it will be used as soon as possible (i.e.,
by objects of similar birthdate). .arch.ap.no-fit.oversize: If the requested
allocation exceeds the fill size it is treated exceptionally (this may indicate
the client has either misconfigured or misused the pool and should either
change the pool configuration or create a separate pool for these exceptional
objects for best performance). .arch.ap.no-fit.oversize.policy: Oversize blocks
are assumed to have exceptional lifetimes, hence are allocated to one side and
do not participate in the normal storage recycling of the pool.
.arch.ap.refill.overhead: If reuse size is small, or becomes small due to
.arch.adapt, all allocations will effectively be treated exceptionally (the AP
will trip and a oldest-fit block will be chosen on each allocation).  This mode
will be within a constant factor in overhead of an unbuffered pool.

.arch.abq: The available block queue holds blocks that have coalesced
sufficiently to reach reuse size.  arch.abq.reuse.size: A multiple of the
quantum of virtual memory is used as the reuse size (.anal.policy.size).
.arch.abq.fifo: It is a FIFO queue (recently coalesced blocks go to the tail of
the queue, blocks are taken from the head of the queue for reuse).
.arch.abq.delay-reuse: By thus delaying reuse, coalescing opportunities are
greater. .arch.abq.high-water: It has a configurable high water mark, which
when reached will cause blocks at the head of the queue to be returned to the
arena, rather than reused. .arch.abq.return: When the MPS supports it, the pool
will be able to return free blocks from the ABQ to the arena on demand.
.arch.return.segment: arch.abq.return can be guaranteed to be able to return a
segment by setting reuse size to twice the size of the segments the pool
requests from the arena.

.arch.cbs: The coalescing block structure holds blocks that have been freed by
the client. .arch.cbs.optimize: The data structure is optimized for coalescing.
.arch.cbs.abq: When a block reaches reuse size, it is added to the ABQ.
.arch.cbs.data-structure: The data structures are organized so that a block can
be on both the CBS and ABQ simultaneously to permit additional coalescing, up
until the time the block is removed from the ABQ and assigned to an AP.

.arch.fragmentation.internal: Internal fragmentation results from The pool will
request large segments from the arena to minimize the internal fragmentation
due to objects not crossing segment boundaries.

.arch.modular: The architecture will be modular, to allow building variations
on the pool by assembling different parts. .arch.modular.example: For example,
it should be possible to build pools with any of the freelist mechanisms, with
in-band or out-of-band storage (where applicable), that do or do not support
derived object descriptions, etc.

.arch.modular.initial: The initial architecture will use
.mech.freelist.splay-tree for the CBS, .sol.mech.storage.out-of-band,
.sol.mech.desc.derived, and .sol.mech.allocate.buffer.

.arch.segregate: The architecture will support segregated allocation through
the use of multiple allocation points. The client will choose the appropriate
allocation point either at run time, or when possible, at compile time.

.arch.segregate.initial: The initial architecture will segregate allocations
into two classes: large and small. This will be implemented by creating two
pools with different parameters.

.arch.segregate.initial.choice: The initial architecture will provide glue code
to choose which pool to allocate from at run time. If possible this glue code
will be written in a way that a good compiler can optimize the selection of
pool at compile time. Eventually this glue code should be subsumed by the
client or generated automatically by a tool.

.arch.debug: Debugging features such as tags, fenceposts, types, creators will
be implemented in a layer above the pool and APs.  A generic pool debugging
interface will be developed to support debugging in this outer layer.

.arch.debug.initial: The initial architecture will have counters for
objects/bytes allocated/freed and support for detecting overlapping frees.

.arch.dependency.loci: The architecture depends on the arena being able to
efficiently provide segments of varying sizes without excessive fragmentation.
The locus mechanism should satisfy this dependency. (See .anal.strategy.risk)

.arch.dependency.mfs: The architecture internal data structures depend on
efficient manual management of small, fixed-sized objects (2 different sizes).
The MFS pool should satisfy this dependency.

.arch.contingency: Since the strategy we propose is new, it may not work.
.arch.contingency.pathalogical: In particular, pathological allocation patterns
could result in fragmentation such that no blocks recycle from the CBS to ABQ.
.arch.contingency.fallback: As a fallback, there will be a pool creation
parameter for a high water mark for the CBS.
.arch.contingency.fragmentation-limit: When the free space in the CBS as a
percentage of all the memory managed by the pool (a measure of fragmentation)
reaches that high water mark, the CBS will be searched oldest-fit before
requesting additional segments from the arena. .arch.contingency.alternative:
We also plan to implement .mech.freelist.cartesian-tree as an alternative CBS,
which would permit more efficient searching of the CBS.

.arch.parameters: The architecture supports several parameters so that multiple
pools may be instantiated and tuned to support different object cohorts. The
important parameters are: reuse size, minimum size, fill size, ABQ high water
mark, CBS fragmentation limit (see .arch.contingency.fragmentation-limit).
.arch.parameters.client-visible: The client-visible parameters of the pool are
the minimum object size, the mean object size, the maximum object size, the
reserve depth and fragmentation limit.  The minimum object size determines when
a splinter is kept on the head of the ABQ (.arch.ap.no-fit.splinter).  The
maximum object size determines the fill size (.arch.ap.fill-size) and hence
when a block is allocated exceptionally (.arch.ap.no-fit.oversize). The mean
object size is the most likely object size.  The reserve depth is a measure of
the hysteresis of the object population.  The mean object size, reserve depth
and, maximum object size are used to determine the size of the ABQ
(.arch.abq.high-water).  The fragmentation limit is used to determine when
contingency mode is used to satisfy an allocation request (.arch.contingency).

.arch.adapt: We believe that an important adaptation to explore is tying the
reuse size inversely to the fragmentation (as measured in
.arch.contingency.fragmentation-limit). .arch.adapt.reuse: By setting reuse
size low when fragmentation is high, smaller blocks will be available for
reuse, so fragmentation should diminish. .arch.adapt.overhead: This will result
in higher overhead as the AP will need to be refilled more often, so reuse size
should be raised again as fragmentation diminishes. .arch.adapt.oldest-fit: In
the limit, if reuse size goes to zero, the pool will implement a "oldest-fit"
policy: the oldest free block of sufficient size will be used for each
allocation.

.arch.adapt.risk: This adaptation is an experimental policy and should not be
delivered to clients until thoroughly tested.





ANALYSIS:

.anal.discard: We have discarded many traditional solutions based on experience
and analysis in paper.wil95(1). In particular, managing the free list as a
linear list arranged by address or size and basing policy on searching such a
linear list in a particular direction, from a particular starting point, using
fit and/or immediacy as criteria. We believe that none of these solutions is
derived from considering the root of the problem to be solved (as described in
.strategy), although their behavior as analyzed by Wilson gives several
insights.

.anal.strategy: For any program to run in the minimum required memory (with
minimal overhead -- we discard solutions such as compression for now),
fragmentation must be eliminated. To eliminate fragmentation, simply place
blocks in memory so that they die "in order" and can be immediately coalesced.
This ideal is not achievable, but we believe we can find object attributes that
correlate with deathtime and exploit them to approximate the ideal. Initially
we believe birth time and type (as approximated by size) will be useful
attributes to explore.

.anal.strategy.perform: To meet .req.attr.performance, the implementation of
.sol.strategy must be competitive in both time and space.

.anal.strategy.risk: The current MPS segment substrate can cause internal
fragmentation which an individual pool can do nothing about. We expect that
request.epcore.170193.sugg.loci will be implemented to remove this risk.

.anal.policy: Deferred coalescing, when taken to the extreme will not minimize
the memory consumption of a program, as no memory would ever be reused. Eager
reuse appears to lead to more fragmentation, whereas delayed reuse appears to
reduce fragmentation (paper.wil95(1)). The systems studied by Wilson did not
directly address deferring reuse. Our proposed policy is to reuse blocks when
they reach a (configurable) size. We believe that this policy along with the
policy of segregating allocations by death time, will greatly reduce
fragmentation. .anal.policy.risk: This policy could lead to pathologic behavior
if allocations cannot be successfully segregated.

.anal.policy.allocate.segregate: This policy has some similarities to
CustomAlloc (paper.grun92(1)). CustomAlloc segregates objects by size classes,
and then within those classes chooses a different allocator depending on
whether that size class has a stable or unstable population. Classes with
stable population recycle storage within the class, whereas classes with
unstable populations return their storage to the general allocation pool for
possible reuse by another class. CustomAlloc, however, requires profiling the
application and tuning the allocator according to those profiles. Although we
intend to support such tuning, we do not want to require it.

.anal.policy.reallocate: For reallocation, .fun.suballocate can be used to free
the remainder if a block is made smaller. Doing so will cause the freed block
to obey .sol.policy.allocate [i.e., the freed block will not be treated
specially, it will be subject to the normal policy on reuse]. Copying can be
used if a block is made larger. paper.vo96(0) reports success in
over-allocating a block the first time it is resized larger, presumably because
blocks that are resized once tend to be resized again and over-allocating may
avoid a subsequent copy. If each object that will be reallocated can be given
its own allocation point until its final reallocation, the allocation point can
be used to hold released or spare storage.

.anal.policy.size: We believe that this will take advantage of the underlying
virtual memory system's ability to compact the physical memory footprint of the
program by discarding free fragments that align with the virtual memory
quantum.  (In a VM system one can approximate compaction by sparse mapping.  If
every other page of a segment is unused, the unused pages can be unmapped,
freeing up physical memory that can be mapped to a new contiguous vm range.)

.anal.mech.freelist: The literature (paper.grun92(1), paper.vo96(0)) indicate
that .sol.freelist.cartesian-tree provides a space-efficient implementation at
some cost in speed. .sol.freelist.splay-tree is faster but less
space-efficient. .sol.freelist.bitmap is unstudied. Many of the faster
allocators maintain caches of free blocks by size to speed allocation of
"popular" sizes. We intend to initially explore not doing so, as we believe
that policy ultimately leads to fragmentation by mixing objects of varying
death times. Instead we intend to use a free list mechanism to support fast
coalescing, deferring reuse of blocks until a minimum size has been reached.

anal.mech.allocate.optimize-small: Wilson (paper.wil95(1)) notes that small
blocks typically have short lifetimes and that overall performance is improved
if you optimize the management of small blocks, e.g.,
sol.mech.allocate.lookup-table for all small blocks.  We believe that
.sol.mech.allocate.buffer does exactly that.

.anal.mech.allocate.optimize-new: Wilson (paper.wil95(1)) reports some benefit
from "preserving wilderness", that is, when a block of memory must be requested
from the system to satisfy an allocation, only the minimum amount of that block
is used, the remainder is preserved (effectively by putting it at the tail of
the free list). This mechanism may or may not implement .sol.policy.allocate.
We believe a better mechanism is to choose to preserve or not, based on
.sol.policy.allocate.




IDEAS:

.sol: Many solution ideas for manual management of variable-sized memory blocks
are enumerated by paper.wil95(1). Here we list the most promising, and some of
our own.


Strategy

.sol.strategy: To run a program in the minimal required memory, with minimal
overhead, utilize memory efficiently. Memory becomes unusable when fragmented.
Strategy is to minimize fragmentation. So place blocks where they won't cause
fragmentation later.

.sol.strategy.death: objects that will die together (in time) should be
allocated together (in space); thus they will coalesce, reducing fragmentation.

.sol.strategy.death.birth: assume objects allocated near each other in time
will have similar deathtimes (paper.beck82(0))
.sol.strategy.death.type: assume objects of different type may have different
deathtimes, even if born together
.sol.strategy.death.predict: find and use program features to predict deathtimes

.sol.strategy.reallocate: reallocation implies rebirth, or at least a change in
lifetime

.sol.strategy.debug: as much of the debugging functionality as possible should
be implemented as a generally available MPS utility; the pool will provide
support for debugging that would be expensive or impossible to allocate outside
the pool


Policy

[Policy is an implementable decision procedure, hopefully approximating the
strategy.]

.sol.policy.reuse: defer reusing blocks, to encourage coalescing
.sol.policy.split: when a block is split to satisfy an allocation, use the
remainder as soon as possible
.sol.policy.size: prevent .policy.reuse from consuming all of memory by
choosing a (coalesced) block for reuse when it reaches a minimum size
.sol.policy.size.fixed: use the quantum of virtual memory (e.g., one page) as
minimum size
.sol.policy.size.tune: allow tuning minimum size
.sol.policy.size.adapt: adaptively change minimum size
.sol.policy.allocate: allocate objects with similar birthdate and lifetime
together
.sol.policy.allocate.segregate: segregate allocations by type
.sol.policy.allocate.segregate.size: use size as a substitute for type
.sol.policy.allocate.segregate.tune: permit tuning of segregation
.sol.policy.allocate.segregate.adapt: adaptively segregate allocations

.sol.policy.reallocate: implement reallocation in a central mechanism outside
of the pool, create a generic pool interface in support of same.

.sol.policy.debug: implement a pool debugging interface
.sol.policy.debug.counters: implement debugging counters in the pool that are
queried with a generic interface
.sol.policy.debug.verify: implement debugging error returns on overlapping frees


Mechanism

[Mechanisms are algorithms or data structures used to implement policy.]

.sol.mech.free-list: mechanisms that can be used to describe the free list

.sol.mech.free-list.cartesian-tree: Using address and size as keys supports
fast coalescing of adjacent blocks and fast searching for optimal-sized blocks.
Unfortunately, because the shape of the tree is constrained by the second key,
it can become unbalanced. This data structure is used in the SunOS 4.1 malloc
(paper.grun92(1)).
.sol.mech.free-list.splay-tree: The amortized cost of a splay tree is
competitive with balanced binary trees in the worst case, but can be
significantly better for regular patterns of access because recently-accessed
keys are moved to the root of the tree and hence can be re-accessed quickly.
This data structure is used in the System Vr4 malloc (paper.vo96(0)). (For a
complete analysis of the splay tree algorithm time bounds see paper.st85(0).)
.sol.mech.free-list.bit-map: Using address as an index and fix-sized blocks,
the booleans can represent whether a block is free or not. Adjacent blocks can
be used to construct larger blocks. Efficient algorithms for searching for runs
in a vector are known. This data structure is used in many file system disk
block managers.
.sol.mech.free-list.refcount: A count of the number of allocated but not freed
subblocks of a block can be used to determine when a block is available for
reuse. This is an extremely compact data structure, but does not support
subblock reuse.
.sol.mech.free-list.hybrid: Bitmaps appear suited particularly to managing
small, contiguous blocks. The tree structures appear suited particularly to
managing varying-sized, discontiguous blocks. A refcount can be very efficient
if objects can be placed accurately according to death time. A hybrid mechanism
may offer better performance for a wider range of situations.

.sol.mech.storage: methods that can be used to store the free list description

.sol.mech.storage.in-band: The tree data structures are amenable to being
stored in the free blocks themselves, minimizing the space overhead of
management. To do so imposes a minimum size on free blocks and reduces the
locality of the data structure.
.sol.mech.storage.out-of-band: The bit-map data structure must be stored
separately.

.sol.mech.desc: for an allocated block to be freed, its base and bound must be
known

.sol.mech.desc.derived: Most clients can supply the base of the block. Some
clients can supply the bound.
.sol.mech.desc.in-band: When the bound cannot be supplied, it can be stored as
an in-band "header". If neither the base nor bound can be supplied (e.g., the
client may only have an interior pointer to the block), a header and footer may
be required.
.sol.mech.desc.out-of-band: In un-tagged architectures, it may be necessary to
store the header and footer out-of-band to distinguish them from client data.
Out-of-band storage can improve locality and reliability. Any of the free-list
structures can also be used to describe allocated blocks out-of-band.
.sol.mech.desc.crossing-map: An alternative for untagged architectures is to
store a "crossing map" which records an encoding of the start of objects and
then store the descriptive information in-band.

.sol.mech.allocate: mechanisms that can be used to allocate blocks (these
typically sit on top of a more general free-list manager)

.sol.mech.allocate.lookup-table: Use a table of popular sizes to cache free
blocks of those sizes.
.sol.mech.allocate.buffer: Allocate from contiguous blocks using compare and
increment.
.sol.mech.allocate.optimize-small: Use a combination of techniques to ensure
the time spent managing a block is small relative to the block's lifetime;
assume small blocks typically have short lifetimes.
.sol.mech.allocate.optimize-new: When "virgin" memory is acquired from the
operating system to satisfy a request, try to preserve it (i.e., use only what
is necessary)
.sol.mech.allocate.segregate.size: use size as a substitute for type

.sol.mech.reallocate: use .req.fun.suballocate to return unused memory when a
block shrinks, but differentiate this from an erroneous overlapping free by
using separate interfaces.




IMPLEMENTATION:

The implementation consists of the following separable modules:


Coalescing Block Structure

.impl.c.cbs: The initial implementation will use .sol.mech.free-list.splay-tree
and sol.mech.storage.out-of-band. For locality, this storage should be managed
as a linked free list of splay nodes suballocated from blocks acquired from a
pool shared by all CBS's. Must support creation and destruction of an empty
tree. Must support search, insert and delete by key of type Addr.  Must support
finding left and right neighbors of a failed search for a key. Must support
iterating over the elements of the tree with reasonable efficiency. Must
support storing and retrieving a value of type Size associated with the key.
Standard checking and description should be provided. See design.mps.splay(0)
and design.mps.cbs(0).


Available Block Queue

.impl.c.abq: The initial implementation will be a queue of fixed size
(determined at pool creation time from the high water mark). Must support
creation and destruction of an empty queue. Must support insertion at the head
or tail of the queue (failing if full), peeking at the head of the queue, and
removal of the head (failing if empty) or any element of the queue (found by a
search). Standard checking and description should be provided.


Pool Implementation

.impl.c: The initial implementation will use the above modules to implement a
buffered pool. Must support creation and destruction of the pool. Creation
takes parameters: minimum size, mean size, maximum size, reserve depth and
fragmentation limit. Minimum, mean, and maximum size are used to calculate the
internal fill and reuse sizes. Reserve depth and mean size are used to
calculate the ABQ high water mark.  Fragmentation limit is used to set the CBS
contingency mode.  Must support buffer initialization, filling and emptying.
Must support freeing. Standard checking and description should be provided.
[Eventually, it should support scanning, so it can be used with collected
pools, but no manual pool currently does.]

.impl.c.future: The implementation should not preclude "buffered free"
(mail.ptw.1997-12-05.19-07(0), ff.) being added in the future.

.impl.c.parameters: The pool parameters are calculated as follows from the
input parameters: minimum, mean, and maximum size are taked directly from the
parameters.  .impl.c.parameter.fill-size: The fill size is set to the maximum
size times the reciprocal of the fragmentation limit, aligned to the arena
alignment.  .imple.c.parameter.reuse-size: The reuse size is set to twice the
fill size (see .arch.abq.return.segment, .impl.c.free.merge.segment).
.impl.c.parameter.abq-limit: The ABQ high-water limit is set to the reserve
depth times the mean size (that is, the queue should hold as many reuse blocks
as would take to cover the population hysteresis if the population consisted
solely of mean-sized blocks, see .arch.abq.high-water).
.impl.c.parameter.avail-limit: The CBS high-water limit is implemented by
comparing the available free space to an "available limit".  The available
limit is updated each time a segment is allocated from or returned to the arena
by setting it to the total size of the pool times the fragmentation limit
divide vy 100 (see .arch.contingency.fallback).

.impl.c.ap.fill: An AP fill request will be handled as follows:
o If the request is larger than fill size, attempt to request a segment from
the arena sufficient to satisfy the request
o Use any previously returned splinter (from .impl.c.ap.empty), if large enough
o Attempt to retrieve a free block from the head of the ABQ (removing it from
ABQ and CBS if found).
o If above fragmentation limit, attempt to find a block on the CBS, using
oldest-fit search
o Attempt to request a segment of fill size from the arena
o Attempt to find a block on the CBS, using oldest-fit search
o Otherwise, fail

.impl.c.ap.empty: An AP empty request will be handled as follows:
o If remaining free is less than min size, return it to the CBS
o If the remaining free is larger than any previous splinter, return that
splinter to the CBS and save this one for use by a subsequent fill
o Otherwise return the remaining block to the CBS

.impl.c.free: When blocks are returned to the CBS a search is made for adjacent
blocks that can be merged. If not, the block is simply inserted in the CBS. If
a merge occurs between two blocks on the ABQ, the ABQ must be adjusted to
reflect the merge. .impl.c.free.exception: Exceptional blocks are returned
directly to the arena.

.impl.c.free.merge: If a merge occurs and the merged block is larger than reuse
size:
o If the ABQ is full, remove the block at the head of the ABQ from the ABQ and
CBS and return it to the arena(*)
o Insert the newly merged block at the tail of the ABQ, leaving it on the CBS
for further merging

.impl.c.free.merge.segment: (*) Merged blocks may not align with arena
segments. If necessary, return the interior segments of a block to the arena
and return the splinters to the CBS.  .impl.c.free.merge.segment.reuse: If the
reuse size (the size at which blocks recycle from the CBS to the ABQ) is at
least twice the fill size (the size of segments the pool allocates from the
arena), we can guarantee that there will always be a returnable segment in
every ABQ block. .impl.c.free.merge.segment.overflow: If the reuse size is set
smaller (see .arch.adapt), there may not be a returnable segment in an ABQ
block, in which case the ABQ has "overflowed".  Whenever this occurs, the ABQ
will be refilled by searching the CBS for dropped reusable blocks when needed.

.impl.c.free.merge.segment.risk: The current segment structure does not really
support what we would like to do.  Loci should do better:  support reserving
contiguous address space and mapping/unmapping any portion of that address
space.

.impl.c.free.merge.alternative: Alternatively, if the MPS segment substrate
permitted mapping/unmapping of pages, the pool could use very large segments
and map/unmap pages as needed.


AP Dispatch

.impl.c.multiap: The initial implementation will be a glue layer that selects
among several AP's for allocation according to the predicted deathtime (as
approximated by size) of the requested allocation. Each AP will be filled from
a pool instance tuned to the range of object sizes expected to be allocated
from that AP. [For bonus points provide an interface that creates a batch of
pools and AP's according to some set of expected object sizes. Eventually
expand to understand object lifetimes and general lifetime prediction keys.]

impl.c.multiap.sample-code: This glue code is not properly part of the pool or
MPS interface.  It is a layer on top of the MPS interface, intended as sample
code for unsophisticated clients.  Sophisticated clients will likely want to
choose among multiple AP's more directly.




TESTING:

.test.component: Components .impl.c.splay, .impl.c.cbs, and .impl.c.abq will be
subjected to individual component tests to verify their functionality.

.test.regression: All tests applied to poolmv (design.mps.poolmv(0)) and
poolepdl (design.mps.poolepdl(0)) will be applied to poolmvt to ensure that mvt
is at least as functional as the pools it is replacing.

.test.qa: Once poolmvt is integrated into the MPS, the standard MPS QA tests
will be applied to poolmvt prior to each release.

.test.customer: Customer acceptance tests will be performed on a per-customer
basis before release to that customer (cf., proc.release.epcore(2).test)


TEXT:

Possible tweaks (from mail.pekka.1998-04-15.13-10(0)):

1. Try to coalesce splinters returned from AP's with the front (or any) block
on the ABQ.
2. Sort ABQ in some other way to minimize splitting/splinters. E.g., proximity
to recently allocated blocks.

A. References

B. Document History

2002-06-07 RB Converted from MMInfo database design document.

C. Copyright and License

This document is copyright © 1995-2002 Ravenbrook Limited. All rights reserved. This is an open source license. Contact Ravenbrook for commercial licensing options.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Redistributions in any form must be accompanied by information on how to obtain complete source code for the this software and any accompanying software that uses this software. The source code must either be included in the distribution or be available for no more than the cost of distribution plus a nominal fee, and must be freely redistributable under reasonable conditions. For an executable file, complete source code means the source code for all modules it contains. It does not include source code for modules or files that typically accompany the major components of the operating system on which the executable file runs.

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement, are disclaimed. In no event shall the copyright holders and contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.


$Id: //info.ravenbrook.com/project/mps/version/1.105/design/poolmvt/index.html#1 $

Ravenbrook / Projects / Memory Pool System / Master Product Sources / Design Documents