.intro: This is the design for impl.c.cbs, which implements a data structure for the management of non-intersecting memory ranges, with eager coalescence.
.readership: This document is intended for any MM developer.
.source: design.mps.poolmv2, design.mps.poolmvff.
.overview: The “coalescing block structure” is a set of addresses (or a subset of address space), with provision for efficient management of contiguous ranges, including insertion and deletion, high level communication with the client about the size of contiguous ranges, and detection of protocol violations.
.hist.0: This document was derived from the outline in design.mps.poolmv2(2). Written by Gavin Matthews 1998-05-01.
.hist.1: Updated by Gavin Matthews 1998-07-22 in response to approval comments in change.epcore.anchovy.160040 There is too much fragmentation in trapping memory.
.hist.2: Updated by Gavin Matthews (as part of change.epcore.brisling.160158: MVFF cannot be instantiated with 4-byte alignment) to document new alignment restrictions.
.hist.3: Converted from MMInfo database design document. Richard Brooksby, 2002-06-07.
.hist.4: Converted to reStructuredText. Gareth Rees, 2013-04-14.
.def.range: A (contiguous) range of addresses is a semi-open interval on address space.
.def.isolated: A contiguous range is isolated with respect to some property it has, if adjacent elements do not have that property.
.def.interesting: A block is interesting if it is of at least the minimum interesting size specified by the client.
.req.set: Must maintain a set of addresses.
.req.fast: Common operations must have a low amortized cost.
.req.add: Must be able to add address ranges to the set.
.req.remove: Must be able to remove address ranges from the set.
.req.size: Must report concisely to the client when isolated contiguous ranges of at least a certain size appear and disappear.
.req.iterate: Must support the iteration of all isolated contiguous ranges. This will not be a common operation.
.req.protocol: Must detect protocol violations.
.req.debug: Must support debugging of client code.
.req.small: Must have a small space overhead for the storage of typical subsets of address space and not have abysmal overhead for the storage of any subset of address space.
.req.align: Must support an alignment (the alignment of all addresses specifying ranges) of down to sizeof(void *) without losing memory.
.header: CBS is used through impl.h.cbs.
.type.cbs: CBS is the main datastructure for manipulating a CBS. It is intended that a CBSStruct be embedded in another structure. No convenience functions are provided for the allocation or deallocation of the CBS.
.type.cbs.block: CBSBlock is the data-structure that represents an isolated contiguous range held by the CBS. It is returned by the new and delete methods described below.
.type.cbs.method: The following methods are provided as callbacks to advise the client of certain events. The implementation of these functions should not cause any CBS function to be called on the same CBS. In this respect, the CBS module is not re-entrant.
.type.cbs.change.size.method: CBSChangeSizeMethod is the function pointer type, four instances of which are optionally registered via CBSInit.
These callbacks are invoked under CBSInsert(), CBSDelete(), or CBSSetMinSize() in certain circumstances. Unless otherwise stated, oldSize and newSize will both be non-zero, and different. The accessors CBSBlockBase(), CBSBlockLimit(), and CBSBlockSize() may be called from within these callbacks, except within the delete callback when newSize is zero. See .impl.callback for implementation details.
.type.cbs.iterate.method: CBSIterateMethod is a function pointer type for a client method invoked by the CBS module for every isolated contiguous range in address order, when passed to the CBSIterate() or CBSIterateLarge() functions. The function returns a boolean indicating whether to continue with the iteration.
.function.cbs.init: CBSInit() is the function that initialises the CBS structure. It performs allocation in the supplied arena. Four methods are passed in as function pointers (see .type.* above), any of which may be NULL. It receives a minimum size, which is used when determining whether to call the optional methods. The mayUseInline Boolean indicates whether the CBS may use the memory in the ranges as a low-memory fallback (see .impl.low-mem). The alignment indicates the alignment of ranges to be maintained. An initialised CBS contains no ranges.
.function.cbs.init.may-use-inline: If mayUseInline is set, then alignment must be at least sizeof(void *). In this mode, the CBS will never fail to insert or delete ranges, even if memory for control structures becomes short. Note that, in such cases, the CBS may defer notification of new/grow events, but will report available blocks in CBSFindFirst() and CBSFindLast(). Such low memory conditions will be rare and transitory. See .align for more details.
.function.cbs.finish: CBSFinish() is the function that finishes the CBS structure and discards any other resources associated with the CBS.
.function.cbs.insert: CBSInsert() is the function used to add a contiguous range specified by [base,limit) to the CBS. If any part of the range is already in the CBS, then ResFAIL is returned, and the CBS is unchanged. This function may cause allocation; if this allocation fails, and any contingency mechanism fails, then ResMEMORY is returned, and the CBS is unchanged.
.function.cbs.insert.callback: CBSInsert() will invoke callbacks as follows:
.function.cbs.delete: CBSDelete() is the function used to remove a contiguous range specified by [base,limit) from the CBS. If any part of the range is not in the CBS, then ResFAIL is returned, and the CBS is unchanged. This function may cause allocation; if this allocation fails, and any contingency mechanism fails, then ResMEMORY is returned, and the CBS is unchanged.
.function.cbs.delete.callback: CBSDelete() will invoke callbacks as follows:
.function.cbs.iterate: CBSIterate() is the function used to iterate all isolated contiguous ranges in a CBS. It receives a pointer, unsigned long closure pair to pass on to the iterator method, and an iterator method to invoke on every range in address order. If the iterator method returns FALSE, then the iteration is terminated.
.function.cbs.iterate.large: CBSIterateLarge() is the function used to iterate all isolated contiguous ranges of size greater than or equal to the client indicated minimum size in a CBS. It receives a pointer, unsigned long closure pair to pass on to the iterator method, and an iterator method to invoke on every large range in address order. If the iterator method returns FALSE, then the iteration is terminated.
.function.cbs.set.min-size: CBSSetMinSize() is the function used to change the minimum size of interest in a CBS. This minimum size is used to determine whether to invoke the client callbacks from CBSInsert() and CBSDelete(). This function will invoke either the new or delete callback for all blocks that are (in the semi-open interval) between the old and new values. oldSize and newSize will be the same in these cases.
.function.cbs.describe: CBSDescribe() is a function that prints a textual representation of the CBS to the given stream, indicating the contiguous ranges in order, as well as the structure of the underlying splay tree implementation. It is provided for debugging purposes only.
.function.cbs.block.base: The CBSBlockBase() function returns the base of the range represented by the CBSBlock. This function may not be called from the delete callback when the block is being deleted entirely.
Note
The value of the base of a particular CBSBlock is not guaranteed to remain constant across calls to CBSDelete() and CBSInsert(), regardless of whether a callback is invoked.
.function.cbs.block.limit: The CBSBlockLimit() function returns the limit of the range represented by the CBSBlock. This function may not be called from the delete callback when the block is being deleted entirely.
Note
The value of the limit of a particular CBSBlock is not guaranteed to remain constant across calls to CBSDelete() and CBSInsert(), regardless of whether a callback is invoked.
.function.cbs.block.size: The CBSBlockSize() function returns the size of the range represented by the CBSBlock. This function may not be called from the delete callback when the block is being deleted entirely.
Note
The value of the size of a particular CBSBlock is not guaranteed to remain constant across calls to CBSDelete() and CBSInsert(), regardless of whether a callback is invoked.
.function.cbs.block.describe: The CBSBlockDescribe() function prints a textual representation of the CBSBlock to the given stream. It is provided for debugging purposes only.
.function.cbs.find.first: The CBSFindFirst() function locates the first block (in address order) within the CBS of at least the specified size, and returns its range. If there are no such blocks, it returns FALSE. It optionally deletes the top, bottom, or all of the found range, depending on the findDelete argument (this saves a separate call to CBSDelete(), and uses the knowledge of exactly where we found the range), which must come from this enumeration:
enum {
CBSFindDeleteNONE, /* don't delete after finding */
CBSFindDeleteLOW, /* delete precise size from low end */
CBSFindDeleteHIGH, /* delete precise size from high end */
CBSFindDeleteENTIRE /* delete entire range */
};
.function.cbs.find.last: The CBSFindLast() function locates the last block (in address order) within the CBS of at least the specified size, and returns its range. If there are no such blocks, it returns FALSE. Like CBSFindFirst(), it optionally deletes the range.
.function.cbs.find.largest: The CBSFindLargest() function locates the largest block within the CBS, and returns its range. If there are no blocks, it returns FALSE. Like CBSFindFirst(), it optionally deletes the range (specifying CBSFindDeleteLOW or CBSFindDeleteHIGH has the same effect as CBSFindDeleteENTIRE).
.align: When mayUseInline is specified to permit inline data structures and hence avoid losing memory in low memory situations, the alignments that the CBS supports are constrained by three requirements:
All alignments that meet these requirements are aligned to sizeof(void *), so we take that as the minimum alignment.
.impl: Note that this section is concerned with describing various aspects of the implementation. It does not form part of the interface definition.
.impl.callback: The size change callback protocol concerns the mechanism for informing the client of the appearance and disappearance of interesting ranges. The intention is that each range has an identity (represented by the CBSBlock). When blocks are split, the larger fragment retains the identity. When blocks are merged, the new block has the identity of the larger fragment.
.impl.callback.delete: Consider the case when the minimum size is minSize, and CBSDelete() is called to remove a range of size middle. The two (possibly non-existant) neighbouring ranges have (possibly zero) sizes left and right. middle is part of the CBSBlock middleBlock.
.impl.callback.delete.delete: The delete callback will be called in this case if and only if:
left + middle + right >= minSize && left < minSize && right < minSize
That is, the combined range is interesting, but neither remaining fragment is. It will be called with the following parameters:
.impl.callback.delete.new: The new callback will be called in this case if and only if:
left >= minSize && right >= minSize
That is, both remaining fragments are interesting. It will be called with the following parameters:
.impl.callback.delete.shrink: The shrink callback will be called in this case if and only if:
left + middle + right >= minSize && (left >= minSize || right >= minSize)
That is, at least one of the remaining fragments is still interesting. It will be called with the following parameters:
.impl.callback.insert: Consider the case when the minimum size is minSize, and CBSInsert() is called to add a range of size middle. The two (possibly non-existant) neighbouring blocks are leftBlock and rightBlock, and have (possibly zero) sizes left and right.
.impl.callback.insert.delete: The delete callback will be called in this case if and only if:
left >= minSize && right >= minSize
That is, both neighbours were interesting. It will be called with the following parameters:
.impl.callback.insert.new: The new callback will be called in this case if and only if:
left + middle + right >= minSize && left < minSize && right < minSize
That is, the combined block is interesting, but neither neighbour was. It will be called with the following parameters:
.impl.callback.insert.grow: The grow callback will be called in this case if and only if:
left + middle + right >= minSize && (left >= minSize || right >= minSize)
That is, at least one of the neighbours was interesting. It will be called with the following parameters:
.impl.splay: The CBS is principally implemented using a splay tree (see design.mps.splay). Each splay tree node is embedded in a CBSBlock that represents a semi-open address range. The key passed for comparison is the base of another range.
.impl.splay.fast-find: CBSFindFirst() and CBSFindLast() use the update/refresh facility of splay trees to store, in each CBSBlock, an accurate summary of the maximum block size in the tree rooted at the corresponding splay node. This allows rapid location of the first or last suitable block, and very rapid failure if there is no suitable block.
.impl.find-largest: CBSFindLargest() simply finds out the size of the largest block in the CBS from the root of the tree (using SplayRoot()), and does SplayFindFirst() for a block of that size. This is O(log(n)) in the size of the free list, so it’s about the best you can do without maintaining a separate priority queue, just to do CBSFindLargest(). Except when the emergency lists (see .impl.low-mem) are in use, they are also searched.
.impl.low-mem: Low memory situations cause problems when the CBS tries to allocate a new CBSBlock structure for a new isolated range as a result of either CBSInsert() or CBSDelete(), and there is insufficient memory to allocation the CBSBlock structure:
.impl.low-mem.no-inline: If mayUseInline is FALSE, then the range is not added to the CBS, and the call to CBSInsert() or CBSDelete() returns ResMEMORY.
.impl.low-mem.inline: If mayUseInline is TRUE:
.impl.low-mem.inline.block: If the range is large enough to contain an inline block descriptor consisting of two pointers, then it is kept on an emergency block list. The CBS will eagerly attempt to add this block back into the splay tree during subsequent calls to CBSInsert() and CBSDelete(). The CBS will also keep its emergency block list in address order, and will coalesce this list eagerly. Some performance degradation will be seen when the emergency block list is in use. Ranges on this emergency block list will not be made available to the CBS’s client via callbacks. CBSIterate() and CBSIterateLarge() will not iterate over ranges on this list.
.impl.low-mem.inline.block.structure: The two pointers stored are to the next such block (or NULL), and to the limit of the block, in that order.
.impl.low-mem.inline.grain: Otherwise, the range must be large enough to contain an inline grain descriptor consisting of one pointer, then it is kept on an emergency grain list. The CBS will eagerly attempt to add this grain back into either the splay tree or the emergency block list during subsequent calls to CBSInsert() and CBSDelete(). The CBS will also keep its emergency grain list in address order. Some performance degradation will be seen when the emergency grain list is in use. Ranges on this emergency grain list will not be made available to the CBS’s client via callbacks. CBSIterate() and CBSIterateLarge() will not iterate over ranges on this list.
.impl.low-mem.inline.grain.structure: The pointer stored is to the next such grain, or NULL.
.impl.cbs.block: The block contains a base-limit pair and a splay tree node.
.impl.cbs.block.special: The base and limit may be equal if the block is halfway through being deleted.
.impl.cbs.block.special.just: This conflates values and status, but is justified because block size is very important.
.test: The following testing will be performed on this module:
.test.cbstest: There is a stress test for this module in impl.c.cbstest. This allocates a large block of memory and then simulates the allocation and deallocation of ranges within this block using both a CBS and a BT. It makes both valid and invalid requests, and compares the CBS response to the correct behaviour as determined by the BT. It also iterates the ranges in the CBS, comparing them to the BT. It also invokes the CBSDescribe() method, but makes no automatic test of the resulting output. It does not currently test the callbacks.
.test.pool: Several pools (currently MVT (Manual Variable Temporal) and MVFF (Manual Variable First Fit)) are implemented on top of a CBS. These pool are subject to testing in development, QA, and are/will be heavily exercised by customers.
.future.not-splay: The initial implementation of CBSs is based on splay trees. It could be revised to use any other data structure that meets the requirements (especially .req.fast).
.future.hybrid: It would be possible to attenuate the problem of .risk.overhead (below) by using a single word bit set to represent the membership in a (possibly aligned) word-width of grains. This might be used for block sizes less than a word-width of grains, converting them when they reach all free in the bit set. Note that this would make coalescence slightly less eager, by up to (word-width - 1).
.risk.overhead: Clients should note that the current implementation of CBSs has a space overhead proportional to the number of isolated contiguous ranges. [Four words per range.] If the CBS contains every other grain in an area, then the overhead will be large compared to the size of that area. [Four words per two grains.] See .future.hybrid for a suggestion to solve this problem. An alternative solution is to use CBSs only for managing long ranges.
Note
The following relates to a pending re-design and does not yet relate to any working source version. GavinM 1998-09-25
The CBS system provides its services by combining the services provided by three subsidiary CBS modules:
The three sub-modules have a lot in common. Although their methods are not invoked via a dispatcher, they have been given consistent interfaces, and consistent internal appearance, to aid maintenance.
Methods supported by sub-modules (not all sub-modules support all methods):
The CBS supplies the following utilities:
Internally, the CBS* sub-modules each have an internal structure CBS*Block that represents an isolated range within the module. It supports the following methods (for sub-module internal use):