MPS Configuration
author | Gavin Matthews |
copyright | See Copyright and License. |
date | 1998-02-27 |
index terms | pair: locus manager; design |
revision | //info.ravenbrook.com/project/mps/version/1.114/design/locus.txt#1 |
status | incomplete design |
tag | design.mps.locus |
Introduction
.intro: The locus manager coordinates between the pools and takes the burden of having to be clever about tract/group placement away from the pools, preserving trace differentiability and contiguity where appropriate.
.source: mail.gavinm.1998-02-05.17-52(0), mail.ptw.1998-02-05.19-53(0), mail.pekka.1998-02-09.13-58(0), and mail.gavinm.1998-02-09.14-05(0).
.readership: Any MPS developer.
Overview
The MPS manages three main resources:
- storage;
- address space;
- time.
The locus manager manages address space at the arena level.
When a pool wants some address space, it expresses some preferences to the locus manager. The locus manager and the arena (working together) try to honour these preferences, and decide what address space the pool gets.
Preferences are expressed by the SegPref argument to SegAlloc(). Note that, when they call SegAlloc(), pools are asking for address space and writeable storage simultaneously, in a single call. There is currently no way for pools to reserve address space without requesting storage.
Why is it important to manage address space?
Trace differentiability
Carefully chosen addresses are used by reference tracing systems (ie. automatic pools), to categorise objects into clumps; and to summarise and cheaply find references between clumps.
Different clumps will become worth collecting at different times (the classic example, of course, is generations in a generational collector). For these partial collections to be efficient, it must be cheap to keep these clumps differentiable, cheap to condemn (Whiten) a particular clump, and cheap to find a good conservative approximation to all inward references to a clump (both initially to construct the Grey set, and to make scanning the Grey set efficient).
This is what the MPS zone mechanism is all about.
The locus manager manages the mapping from clumps to zones.
To specify a clump, pools use the SegPrefGen argument to SegPrefExpress().
Note
The name is misleading: generations are only one sort of clump. Richard Kistruck, 2007-04-24.
Prevent address space fragmentation (within the arena)
Address space is not infinite.
In some use cases, the MPS is required to remain efficient when using very nearly all available address space and storage. For example, with the client-arena class, where the only address space available is that of the storage available.
Even with the VM arena class, typical storage sizes (as of 2007) can make 32-bit address space constrained: the client may need several gigabytes, which leaves little spare address space.
Address space fragmentation incurs failure when there is no way to allocate a big block of address space. The big block may be requested via the MPS (by the client), or by something else in the same process, such as a third-party graphics library, image library, etc.
Address space fragmentation incurs cost when:
- desired large-block requests (such as for buffering) are denied, causing them to be re-requested as a smaller block, or as several smaller blocks;
- possible operating-system costs in maintaining a fragmented mapping?
Prevent storage fragmentation (within tracts and segments)
Storage is not infinite: it is allocated in multiples of a fixed-size tract. Small lonely objects, each retaining a whole tract, cause storage fragmentation.
Non-moving pools manage this fragmentation with placement strategies that use:
- co-located death (in space and time);
- segment merging and splitting.
These pool-level strategies always care about contiguity of object storage. They also often care about the ordering of addresses, because pool code uses an address-ordered search when choosing where to place a new object. For these two reasons, the address chosen (by the locus manager and arena) for new tracts is important.
Certain specialised pools, and/or some client programs that use them, have carefully tuned segment sizes, positioning, and search order. Be careful: seemingly inconsequential changes can catastrophically break this tuning.
Pools can specify a preference for High and Low ends of address space, which implies a search-order. Pools could also specify clumping, using either SegPrefGen or SegPrefZoneSet.
Discovering the layout
The locus manager is not given advance notice of how much address space will be required with what preferences. Instead, the locus manager starts with an empty layout, and adapts it as more requests come in over time. It is attempting to discover a suitable layout by successive refinement. This is ambitious.
Definitions
.note.cohort: We use the word "cohort" in its usual sense here, but we're particularly interested in cohorts that have properties relevant to tract placement. It is such cohorts that the pools will try to organize using the services of the locus manager. Typical properties would be trace differentiability or (en masse) death-time predictability. Typical cohorts would be instances of a non-generational pool, or generations of a collection strategy.
.def.trace.differentiability: Objects (and hence tracts) that are collected, may or may not have "trace differentiability" from each other, depending on their placement in the different zones. Objects (or pointers to them) can also have trace differentiability (or not) from non-pointers in ambiguous references; in practice, we will be worried about low integers, that may appear to be in zones 0 or -1.
Requirements
.req.cohort: Tract allocations must specify the cohort they allocate in. These kind of cohorts will be called loci, and they will have such attributes as are implied by the other requirements. Critical.
.req.counter.objects: As a counter-requirement, pools are expected to manage objects. Objects the size of a tract allocation request (segment-sized) are exceptional. Critical. .req.counter.objects.just: This means the locus manager is not meant to solve the problems of allocating large objects, and it isn't required to know what goes on in pools.
.req.contiguity: Must support a high level of contiguity within cohorts when requested. This means minimizing the number of times a cohort is made aware of discontiguity. Essential (as we've effectively renegotiated this in SW, down to a vague hope that certain critical cohorts are not too badly fragmented). .req.contiguity.just: TSBA.
.req.contiguity.specific: It should be possible to request another allocation next to a specific tract on either side (or an extension in that direction, as the case may be). Such a request can fail, if there's no space there. Nice. It would also be nice to have one for "next to the largest free block".
.req.differentiable: Must support the trace differentiability of segments that may be condemned separately. Due to the limited number of zones, it must be possible to place several cohorts into the same zone. Essential.
.req.differentiable.integer: It must be possible to place collectable allocations so that they are trace-differentiable from small integers. Essential.
.req.disjoint: Must support the disjointness of pages that have different VM properties (such as mutable/immutable, read-only/read-write, and different lifetimes). Optional.
Note
I expect the implementation will simply work at page or larger granularity, so the problem will not arise, but Tucker insisted on stating this as a requirement. Pekka P. Pirinen, 1998-10-28.
.req.low-memory: The architecture of the locus manager must not prevent the design of efficient applications that often use all available memory. Critical. .req.low-memory.expl: This basically says it must be designed to perform well in low-memory conditions, but that there can be configurations where it doesn't do as well, as long as this is documented for the application programmer. Note that it doesn't say all applications are efficient, only that if you manage to design an otherwise efficient application, the locus manager will not sink it.
.req.address: Must conserve address space in VM arenas to a reasonable extent. Critical.
.req.inter-pool: Must support the association of sets of tracts in different pools into one cohort. Nice.
.req.ep-style: Must support the existing EP-style of allocation whereby allocation is from one end of address space either upwards or downwards (or a close approximation thereto with the same behavior). .req.ep-style.just: We cannot risk disrupting a policy with well-known properties when this technology is introduced.
.req.attributes: There should be a way to inform the locus manager about various attributes of cohorts that might be useful for placement: deathtime, expected total size, and so on. Optional. It's a given that the cohorts must then have these attributes, within the limits set in the contract of the appropriate interface. .req.attributes.action: The locus manager should use the attributes to guide its placement decisions. Nice.
.req.blacklisting: There should be a way of maintaining at least one blacklist for pages (or some other small unit), that can not/should not be allocated to collectable pools. Optional.
Note
How to do blacklist breaking for ambiguous refs?
.req.hysteresis: There should be a way to indicate which cohorts fluctuate in size and by how much, to guide the arena hysteresis to hold on to suitable pages. Optional.
Analysis
.anal.sw: Almost any placement policy would be an improvement on the current SW one.
.anal.cause-and-effect: The locus manager doesn't usually need to know why things need to be differentiable, disjoint, contiguous, and so on. Abstracting the reason away from the interface makes it more generic, more likely to have serendipitous new uses. Attributes described by a quantity (deathtime, size, etc.) are an exception to this, because we can't devise a common measure.
.anal.stable: The strategy must be stable: it must avoid repeated recomputation, especially the kind that switches between alternatives with a short period (repeated "bites" out the same region or flip-flopping between two regions).
.anal.fragmentation: There's some call to avoid fragmentation in cohorts that don't need strict contiguity, but this is not a separate requirement, since fragmentation is a global condition, and can only be ameliorated if there's a global strategy that clumps allocations together.
.anal.deathtime: Cohorts with good death-time clumping of their objects could use some locality of tract allocation, because it increases the chances of creating large holes in the address space (for other allocation to use). OTOH. many cohorts will not do multiple frees in short succession, or at least cannot reasonably be predicted to do so. This locality is not contiguity, nor is it low fragmentation, it's just the requirement to place the new tracts next to the tract where the last object was allocated in the cohort. Note that the placement of objects is under the control of the pool, and the locus manager will not know it, therefore this requirement should be pursued by requesting allocation next to a particular tract (which we already have a requirement for).
.anal.asymmetrical: The strategy has to be asymmetrical with respect to cohorts growing and shrinking. The reason of this asymmetry is that it can choose where to grow, but it cannot choose where to shrink (except in a small way by growing with good locality).
Interface
.interface.locus: A cohort will typically reside on multiple tracts (and the pools will avoid putting objects of other cohorts on them), so there should be an interface to describe the properties of the cohort, and associate each allocation request with the cohort. We shall call such an object, created to represent a cohort, a locus (pl. loci).
.interface.locus.pool: Loci will usually be created by the pool that uses it. Some of the locus attributes will be inherited from client-specified pool attributes [this means there will be additional pool attributes].
.interface.detail: This describes interface in overview; for details, see implementation section and code, or user doc.
Loci
Res LocusCreate(Locus *locusReturn, LocusAttrs attrs, ZoneGroup zg, LocusAllocDesc adesc)
.function.create: A function to create a locus: adesc contains the information about the allocation sequences in the locus, zg is used for zone differentiability, and attrs encodes the following:
- .locus.contiguity: A locus can be contiguous. This means performing as required in .req.contiguity, non-contiguous allocations can be freely placed anywhere (but efficiency dictates that similar allocations are placed close together and apart from others).
- .locus.blacklist: Allocations in the locus will avoid blacklisted pages (for collectable segments).
- .locus.zero: Allocations in the locus are zero-filled.
Note
Other attributes will be added, I'm sure.
.interface.zone-group: The locus can be made a member of a zone group. Passing ZoneGroupNONE means it's not a member of any group (allocations will be placed without regard to zone, except to keep them out of stripes likely to be needed for some group).
Note
I propose no mechanism for managing zone groups at this time, since it's only used internally for one purpose. Pekka P. Pirinen, 2000-01-17.
.interface.size: An allocation descriptor (LocusAllocDesc) contains various descriptions of how the locus will develop over time (inconsistent specifications are forbidden, of course):
.interface.size.typical-alloc: Size of a typical allocation in this locus, in bytes. This will mainly affect the grouping of non-contiguous loci.
.interface.size.large-alloc: Typical large allocation that the manager should try to allow for (this allows some relief from .req.counter.objects), in bytes. This will mainly affect the size of gaps that will be allotted adjoining this locus.
- .interface.size.direction: Direction of growth: up/down/none.
Only useful if the locus is contiguous.
.interface.size.lifetime: Some measure of the lifetime of tracts (not objects) in the cohort.
Note
Don't know the details yet, probably only useful for placing similar cohorts next to each other, so the details don't actually matter. Pekka P. Pirinen, 2000-01-17.
.interface.size.deathtime: Some measure of the deathtime of tracts (not objects) in the cohort.
Note
Ditto. Pekka P. Pirinen, 2000-01-17.
.function.init: LocusInit() is like LocusCreate(), but without the allocation. This is the usual interface, since most loci are embedded in a pool or something.
.function.alloc: ArenaAlloc() to take a locus argument. ArenaAllocHere() is like it, plus it takes a tract and a specification to place the new allocation immediately above/below a given tract; if that is not possible, it returns ResFAIL (this will make it useful for reallocation functionality).
ArenaSetTotalLoci(Arena arena, Size nLoci, Size nZoneGroups)
.function.set-total: A function to tell the arena the expected number of (non-miscible client) loci, and of zone groups.
Peaks
mps_res_t mps_peak_create(mps_peak_t*, mps_arena_t)
.function.peak.create: A function to create a peak. A newly-created peak is open, and will not be used to guide the strategy of the locus manager.
mps_res_t mps_peak_describe_pool(mps_peak_t, mps_pool_t, mps_size_desc_t)
.function.peak.add: A function to add a description of the state of one pool into the peak. Calling this function again for the same peak and pool instance will replace the earlier description.
.function.peak.add.size: The size descriptor contains a total size in bytes or percent of arena size.
Note
Is this right? Pekka P. Pirinen, 2000-01-17.
.function.peak.add.remove: Specifying a NULL size will remove the pool from the peak. The client is not allowed to destroy a pool that is mentioned in any peak; it must be first removed from the peak, or the peak must be destroyed. This is to ensure that the client adjusts the peaks in a manner that makes sense to the application; the locus manager can't know how to do that.
mps_res_t mps_peak_close(mps_peak_t)
.function.peak.close: A function to indicate that all the significant pools have been added to the peak, and it can now be used to guide the locus manager. For any pool not described in the peak, the locus manager will take its current size at any given moment as the best prediction of its size at the peak.
.function.peak.close.after: It is legal to add more descriptions to the peak after closing, but this will reopen the peak, and it will have to be closed before the locus manager will use it again. The locus manager uses the previous closed state of the peak, while this is going on.
void mps_peak_destroy(mps_peak_t)
.function.peak.destroy: A function to destroy a peak.
.interface.ep-style: This satisfies .req.ep-style by allowing SW to specify zero size for most pools (which will cause them to be place next to other loci with the same growth direction).
Note
Not sure this is good enough, but we'll try it first. Pekka P. Pirinen, 2000-01-17.
Architecture
Data objects
.arch.locus: To represent the cohorts, we have locus objects. Usually a locus is embedded in a pool instance, but generations are separate loci.
.arch.locus.attr: contiguity, blacklist, zg, current region, @@@@
.arch.locus.attr.exceptional: The client can define a typical large allocation for the locus. Requests substantially larger than that are deemed exceptional.
.arch.zone-group: To satisfy .req.differentiable, we offer zone groups. Each locus can be a member of a zone group, and the locus manager will attempt to place allocations in this locus in different zones from all the other zone groups. A zone-group is represented as @@@@.
.arch.page-table: A page table is maintained by the arena, as usual to track association between tracts, pools and segments, and mapping status for VM arenas.
.arch.region: All of the address space is divided into disjoint regions, represented by region objects. These objects store their current limits, and high and low watermarks of currently allocated tracts (we hope there's usually a gap of empty space between regions). The limits are actually quite porous and flexible.
.arch.region.assoc: Each region is associated with one contiguous locus or any number of non-contiguous loci (or none). We call the first kind of region "contiguous". .arch.locus.assoc: Each locus remembers all regions where it has tracts currently, excepting the badly-placed allocations (see below). It is not our intention that any locus would have very many, or that loci that share regions would have any reason to stop doing do.
.arch.region.more: Various quantities used by the placement computation are also stored in the regions and the loci. Regions are created (and destroyed) by the placement recomputation. Regions are located in stripes (if it's a zoned region), but they can extend into neighboring stripes if an exceptionally large tract allocation is requested (to allow for large objects).
.arch.chunk: Arenas may allocate more address space in additional chunks, which may be disjoint from the existing chunks. Inter-chunk space will be represented by dummy regions. There are also sentinel regions at both ends of the address space. See design.mps.arena.chunk.
Overview of strategy
.arch.strategy.delay: The general strategy is to delay placement decisions until they have to be made, but no later.
.arch.strategy.delay.until: Hence, the locus manager only makes placement decisions when an allocation is requested (frees and other operations might set a flag to cause the next allocation to redecide). This also allows the client to change the peak and pool configuration in complicated ways without causing a lot of recomputation, by doing all the changes without allocating in the middle (unless the control pool needs more space because of the changes).
.arch.strategy.normal: While we want the placement to be sophisticated, we do not believe it is worth the effort to consider all the data at each allocation. Hence, allocations are usually just placed in one of the regions used previously (see .arch.alloc) without reconsidering the issues.
.arch.strategy.normal.limit: However, the manager sets precautionary limits on the regions to ensure that the placement decisions are revisited when an irrevocable placement is about to be made.
.arch.strategy.create: The manager doesn't create new regions until they are needed for allocation (but it might compute where they could be placed to accommodate a peak).
Allocation
.arch.alloc: Normally, each allocation to a locus is placed in its current region. New regions are only sought when necessary to fulfill an allocation request or when there is reason to think the situation has changed significantly (see .arch.significant).
.arch.alloc.same: An allocation is first attempted next to the previous allocation in the same locus, respecting growth direction. If that is not possible, a good place in the current region is sought. .arch.alloc.same.hole: At the moment, for finding a good place within a region, we just use the current algorithm, limited to the region. In future, the placement within regions will be more clever.
.arch.alloc.extend: If there's no adequate hole in the current region and the request is not exceptional, the neighboring regions are examined to see if the region could be extended at one border. (This will basically only be done if the neighbor has shrunk since the last placement recomputation, because the limit was set on sophisticated criteria, and should not be changed without justification.) .arch.alloc.extend.here: When an allocation is requested next to a specific tract (ArenaAllocHere()), we try to extend a little harder (at least for change_size, perhaps not for locality).
.arch.alloc.other: If no way can be found to allocate in the current region, other regions used for this locus are considered in the same way, to see if space can be found there. [Or probably look at other regions before trying to extend anything?]
.arch.alloc.recompute: When no region of this locus has enough space for the request, or when otherwise required, region placement is recomputed to find a new region for the request (which might be the same region, after extension).
.arch.alloc.current: This region where the allocation was placed then becomes the current region for this locus, except when the request was exceptional, or when the region chosen was "bad" (see @@@@).
.arch.significant: Significant changes to the parameters affecting placement are deemed to have happened at certain client calls and when the total allocation has changed substantially since the last recomputation. Such conditions set a flag that causes the next allocation to recompute even if its current region is not full (possibly second-guess the decision to recompute after some investigation of the current state?).
Deallocation
.arch.free: Deallocation simply updates the counters in the region and the locus. For some loci, it will make the region of the deallocation the current region. .arch.free.remove: If a region becomes entirely empty, it is deleted (and the neighbors limits might be adjusted).
Note
This is quite tricky to get right.
Region placement recomputation
.arch.gap: When doing placement computations, we view the arena as a sequence of alternating region cores and gaps (which can be small, even zero-sized). Initially, we'll take the core of a region to be the area between the high and low watermark, but in the future we might be more flexible about that.
Note
Edge determination is actually a worthwhile direction to explore.
.arch.reach: The gap between two cores could potentially end up being allocated to either region, if they grow in that direction, or one or neither, if they don't. The set of states that the region assignment could reach by assigning the gaps to their neighbors is called the reach of the current configuration.
.arch.placement.object: The object of the recomputation is to find a configuration of regions that is not too far from the current configuration and that keeps all the peaks inside its reach; if that is not possible, keep the nearest ones in the reach and then minimize the total distance from the rest.
.arch.placement.hypothetical: The configurations that are considered will include hypothetical placements for new regions for loci that cannot fit in their existing regions at the peak. This is necessary to avoid choosing a bad alternative.
.arch.placement.interesting: The computation will only consider new regions of loci that are deemed interesting, that is, far from their peak state. This will reduce the computational burden and avoid jittering near a peak.
Note
Details missing.
Implementation
[missing]
Notes
.idea.change: Even after the first segment, be prepared to change your mind, if by the second segment a lot of new loci have been created.
.distance: If the current state is far from a peak, there's time to reassign regions and for free space to appear (in fact, under the steady arena assumption, enough free space will appear).
.clear-pool: Need to have a function to deallocate all objects in a pool, so that PoolDestroy() won't have to be used for that purpose.
Document History
- 1998-02-27 Gavin Matthews. Incomplete design. Originally written as part of change.dylan.box-turtle.170569. Much developed since.
- 1998-10-28 Pekka P. Pirinen. Wrote the real requirements after some discussion.
- 1998-12-15 Pekka P. Pirinen. Deleted Gavin's design and wrote a new one.
- 2002-06-07 RB Converted from MMInfo database design document.
- 2007-04-24 Richard Kistruck. Added Guide: Manage arena address space, why, discover layout.
- 2013-05-23 GDR Converted to reStructuredText.
Copyright and License
Copyright © 2013-2014 Ravenbrook Limited. All rights reserved. <http://www.ravenbrook.com/>. This is an open source license. Contact Ravenbrook for commercial licensing options.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Redistributions in any form must be accompanied by information on how to obtain complete source code for this software and any accompanying software that uses this software. The source code must either be included in the distribution or be available for no more than the cost of distribution plus a nominal fee, and must be freely redistributable under reasonable conditions. For an executable file, complete source code means the source code for all modules it contains. It does not include source code for modules or files that typically accompany the major components of the operating system on which the executable file runs.
This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement, are disclaimed. In no event shall the copyright holders and contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.