THE DESIGN OF THE MPS TELEMETRY MECHANISM design.mps.telemetry incomplete design richard 1997-07-07 INTRODUCTION: This documents the design of the telemetry mechanism within the MPS. .readership: This document is intended for any MPS developer. .source: Various meetings and brainstorms, including meeting.general.1997-03-04(0), mail.richard.1997-07-03.17-01(0), mail.gavinm.1997-05-01.12-40(0). Document History .hist.0: 1997-04-11 GavinM Rewritten .hist.1: 1997-07-07 GavinM Rewritten again after discussion in Pool Hall. OVERVIEW: Telemetry permits the emission of events from the MPS. These can be used to drive a graphical tool, or to debug, or whatever. The system is flexible and robust, but doesn't require heavy support from the client. REQUIREMENTS: .req.simple: It must be possible to generate code both for the MPS and any tool without using complicated build tools. .req.open: We must not constrain the nature of events before we are certain of what we want them to be. .req.multi: We must be able to send events to multiple streams. .req.share: It must be possible to share event descriptions between the MPS and any tool. .req.version: It must be possible to version the set of events so that any tool can detect whether it can understand the MPS. .req.back: Tools should be able to understand older and newer version of the MPS, so far as is appropriate. .req.type: It must be possible to transmit a rich variety of types to the tool, including doubles, and strings. .req.port: It must be possible to transmit and receive events between different platforms. .req.control: It must be possible to control whether and what events are transmitted at least at a coarse level. .req.examine: There should be a cheap means to examine the contents of logs. .req.pm: The event mechanism should provide for post mortem to detect what significant events led up to death. .req.perf: Events should not have a significant effect on performance when unwanted. .req.small: Telemetry streams should be small. .req.avail: Events should be available in all varieties, subject to performance requirements. .req.impl: The plinth support for telemetry should be easy to write and flexible. .req.robust: The telemetry protocol should be robust against some forms of corruption, e.g. packet loss. .req.intern: It should be possible to support string-interning. ARCHITECTURE: .arch: Event annotations are scattered throughout the code, but there is a central registration of event types and properties. Events are written to a buffer via a specialist structure, and are optionally written to the plinth. Events can take any number of parameters of a range of types, indicated as a format both in the annotation and the the registry. ANALYSIS: .anal: The proposed order of development, with summary of requirements impact is as follows: v c e s e o x r i i m s r n a s a o n m o u h s t p t m p m v i b t b p p l a i y o r i e a a m u e a l e t r o p r o n p r l i p s r c e n i e n e t l e m f l l l t n k .sol.format 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 Merged. .sol.struct 0 0 0 0 0 + 0 0 0 0 + - 0 0 0 0 0 Merged. .sol.string 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 + 0 Merged. .sol.relation + 0 0 + 0 0 0 0 + 0 0 + 0 0 0 0 0 Merged. .sol.dumper 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 Merged. .sol.kind 0 - 0 0 0 0 0 + 0 + 0 0 0 0 0 0 0 Merged. .sol.control 0 0 0 0 0 0 0 + 0 0 + 0 0 0 0 0 0 Merged. .sol.variety 0 0 0 0 0 0 0 0 0 + + 0 + 0 0 0 0 [ Not yet ordered. ] .sol.buffer 0 0 0 0 0 0 0 + 0 + + 0 0 0 0 0 0 .sol.traceback 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 .sol.client 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 .sol.head 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 .sol.version 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 + .sol.exit 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 .sol.block 0 0 0 0 0 0 0 0 0 0 + - 0 0 + 0 0 .sol.code 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 + .sol.msg 0 0 + 0 0 0 + 0 0 0 0 0 0 + + 0 0 .file-format: One of the objectives of this plan is to minimise the impact of the changes to the log file format. This is to be achieved firstly by completing all necessary support before changes are initiated, and secondly by performing all changes at the same time. IDEAS: .sol.format: Event annotations indicate the types of their arguments, e.g. EVENT_WD for a Word, and a double. (.req.type) .sol.struct: Copy event data into a structure of the appropriate type, e.g. EventWDStruct. (.req.type, .req.perf, but not .req.small because of padding) .sol.string: Permit at most one string per event, at the end, and use the char [1] hack, and specialised code; deduce the string length from the event length and also NUL-terminate (.req.type, .req.intern) .sol.buffer: Enter all events initially into internal buffers, and conditionally send them to the message stream. (.req.pm, .req.control, .req.perf) .sol.variety: In optimized varieties, have internal events (see .sol.buffer) for a subset of events and no external events; in normal varieties have all internal events, and the potential for external events. (.req.avail, .req.pm, .req.perf) .sol.kind: Divide events by some coarse type into around 6 groups, probably related to frequency. (.req.control, .req.pm, but not .req.open) .sol.control: Hold flags to determine which events are emitted externally. (.req.control, .req.perf) .sol.dumper: Write a simple tool to dump event logs as text. (.req.examine) .sol.msg: Redesign the plinth interface to send and receive messages, based on any underlying IPC mechanism, e.g. append to file, TCP/IP, messages, shared memory. (.req.robust, .req.impl, .req.port, .req.multi) .sol.block: Buffer the events and send them as fixed size blocks, commencing with a timestamp, and ending with padding. (.req.robust, .req.perf, but not .req.small) .sol.code: Commence each event with two bytes of event code, and two bytes of length. (.req.small, .req.back) .sol.head: Commence each event stream with a platform-independent header block giving information about the session, version (see .sol.version), and file format; file format will be sufficient to decode the (platform-dependent) rest of the file. (.req.port) .sol.exit: Provide a mechanism to flush events in the event of graceful sudden death. (.req.pm) .sol.version: Maintain a three part version number for the file comprising major (incremented when the format of the entire file changes (other than platform differences)), median (incremented when an existing event changes its form or semantics), and minor (incremented when a new event type is added); tools should normally fail when the median or major is unsupported. (.req.version, .req.back) .sol.relation: Event types will be defined in terms of a relation specifying their name, code, optimised behaviour (see .sol.variety), kind (see .sol.kind), and format (see .sol.format); both the MPS and tool can use this by suitable #define hacks. (.req.simple. .req.share, .req.examine, .req.small (no format information in messages)) .sol.traceback: Provide a mechanism to output recent events (see .sol.buffer) as a form of backtrace when AVERs fire or from a debugger, or whatever. (.req.pm) .sol.client: Provide a mechanism for user events. (.req.intern) IMPLEMENTATION: Annotation .annot: An event annotation is of the form: EVENT3(FooCreate, pointer, address, word); .annot.string: If there is a string in the format, it must be the last parameter (and hence there can be only one). There is currrently a maximum string length, defined by EventMaxStringLength in impl.h.eventcom. .annot.type: The event type should be given as the first parameter to the event macro, as registered in impl.h.eventdef. .annot.param: The parameters of the event should be given as the remaining parameters of the event macro, in order as indicated in the event parameters definition in impl.h.eventdef. Registration .reg: All event types and parameters should be registered in impl.h.eventdef, in the form of a higher-order list macros. .reg.just: This use of a higher-order macros enables great flexibility in the use of this file. .reg.rel: The event type registration is of the form: EVENT(X, FooCreate, 0x1234, TRUE, Arena) .reg.type: The first parameter of the relation is the event type. This needs no prefix, and should correspond to that used in the annotation. .reg.code: The second parameter is the event code, a 16-bit value used to represent this event type. Codes should not be re-used for new event types, to allow interpretation of event log files of all ages. .reg.always: The third parameter is a boolean value indicating whether this event type should be implemented in all varieties. See .control.buffer. Unless your event is on the critical path (typically per reference or per object), you will want this to be TRUE. .reg.kind: The fourth parameter is a kind keyword indicating what category this event falls into. See .control. The possible values are: Arena -- per space or arena or global Pool -- pool-related Trace -- per trace or scan Seg -- per segment Ref -- per reference or fix Object -- per object or allocation User -- invoked by the user through the MPS interface This list can be seen in impl.h.eventcom. [.reg.doc: Add a docstring column. RB 2012-09-03] .reg.params: The event parameters registration is of the form: #define EVENT_FooCreate_PARAMS(PARAM, X) \ PARAM(X, 0, P, firstParamPointer) \ PARAM(X, 1, U, secondParamUnsigned) .reg.param.index: The first column is the index, and must start at zero and increase by one for each row. .reg.param.sort: The second column is the parameter "sort", which, when appended to EventF, yields a type for the parameter. It is a letter from the following list: P -- void * A -- Addr W -- Word U -- unsigned int S -- char * D -- double B -- Bool The corresponding event parameter must be assignment compatible with the type. .param.types: When an event has parameters whose type is not in the above list, use the following guidelines: All C pointer types not representing strings use P; Size, Count, Index use W; others should be obvious. .reg.param.name: The third column is the parameter name. It should be a valid C identifier and is used for debugging display and human readable output. [.reg.param.doc: Add a docstring column. RB 2012-09-03] .reg.dup: It is permissible for the one event type to be used for more than one annotation. There are generally two reasons for this: - Variable control flow for successful function completion; - Platform/Otherwise-dependent implementations of a function. Note that all annotations for one event type must have the same format (as implied by .reg.format). Control .control: There are two types of event control, buffer and output. .control.buffer: Buffer control affects whether particular events implemented at all, and is controlled statically by variety using the always value (see .reg.always) for the event type. The hot variety does compiles out annotations with always=FALSE. The cool variety does not, so always buffers a complete set of events. .control.output: Output control affects whether events written to the internal buffer are output via the plinth. This is set on a per-kind basis (see .reg.kind), using a control bit table stored in EventKindControl. By default, all event kinds are off. You may switch some kinds on using a debugger. For example, to enable Pool events using gdb (see impl.h.eventcom for numeric codes): rb@silverbird$ gdb ./xci3gc/cool/amcss (gdb) break GlobalsInit (gdb) run ... (gdb) print EventKindControl |= 2 $2 = 2 (gdb) continue ... (gdb) quit rb@silverbird$ ./xci3gc/cool/eventcnv -v | sort | head 0000178EA03ACF6D PoolInit 9C1E0 9C000 0005E040 0000178EA03C2825 PoolInitMFS 9C0D8 9C000 1000 C 0000178EA03C2C27 PoolInitMFS 9C14C 9C000 1000 44 0000178EA03C332C PoolInitMV 9C080 9C000 1000 20 10000 0000178EA03F4DB4 BufferInit 2FE2C4 2FE1B0 0 0000178EA03F4EC8 BufferInitSeg 2FE2C4 2FE1B0 0 0000178EA03F57DA AMCGenCreate 2FE1B0 2FE288 0000178EA03F67B5 BufferInit 2FE374 2FE1B0 0 0000178EA03F6827 BufferInitSeg 2FE374 2FE1B0 0 0000178EA03F6B72 AMCGenCreate 2FE1B0 2FE338 .control.env: The initial value of EventKindControl is read from the C environment when the ANSI Plinth is used, and so event output can be controlled like this: MPS_TELEMETRY_CONTROL=127 amcss or like this MPS_TELEMETRY_CONTROL="Pool Arena" amcss where the variable is set to a space-separated list of names defined by EventKindENUM. .control.just: These controls are coarse, but very cheap. .control.external: The MPS interface function mps_telemetry_control can be used to change EventKindControl. .control.tool: The tools will be able to control EventKindControl. Debugging .debug.buffer: Each event kind is logged in a separate buffer, EventBuffer[kind]. .debug.buffer.reverse: The events are logged in reverse order from the top of the buffer, with the last logged event at EventLast[kind]. This allows recovery of the list of recent events using the event->any.size field. .debug.dump: The contents of all buffers can be dumped with the EventDump function from a debugger, e.g. gdb> print EventDump(mps_lib_get_stdout()) .debug.describe: Individual events can be described with the EventDescribe function, e.g. gdb> print EventDescribe(EventLast[3], mps_lib_get_stdout()) .debug.core: The event buffers are preserved in core dumps and can be used to work out what the MPS was doing before a crash. Since the kinds correspond to frequencies, ancient events may still be available in some buffers, even if they have been flushed to the output stream. Some digging may be required. Dumper Tool .dumper: A primitive dumper tool is available in impl.c.eventcnv. For details, see guide.mps.telemetry. Allocation Replayer Tool .replayer: A tool for replaying an allocation sequence from a log is available in impl.c.replay. For details, see design.mps.telemetry.replayer.
A. References
B. Document History
2002-06-07 | RB | Converted from MMInfo database design document. |
2012-09-03 | RB | Removed basic untruths and added some discussion of debugging, though this starts to resemble a manual rather than a design document, and needs to be reworked. |
C. Copyright and License
This document is copyright © 1995-2002 Ravenbrook Limited. All rights reserved. This is an open source license. Contact Ravenbrook for commercial licensing options.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Redistributions in any form must be accompanied by information on how to obtain complete source code for the this software and any accompanying software that uses this software. The source code must either be included in the distribution or be available for no more than the cost of distribution plus a nominal fee, and must be freely redistributable under reasonable conditions. For an executable file, complete source code means the source code for all modules it contains. It does not include source code for modules or files that typically accompany the major components of the operating system on which the executable file runs.
This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement, are disclaimed. In no event shall the copyright holders and contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.