Ravenbrook / Projects / Memory Pool System / Master Product Sources / Design Documents

Memory Pool System Project


             DESIGN OF THE POSIX THREAD EXTENSIONS FOR MPS
                         design.mps.pthreadext
                               draft doc
                            tony 2000-02-01

INTRODUCTION

.readership: Any MPS developer.

.intro: This is the design of the Pthreads extension module, which provides 
some low-level threads support for use by MPS (notably suspend and resume).


DEFINITIONS

.pthreads: The term "Pthreads" means an implementation of the POSIX 
1003.1c-1995 thread standard. (Or the Single UNIX Specification, Version 2, aka 
USV2 or UNIX98.)

.context: The "context" of a thread is a (platform-specific) OS-defined 
structure which describes the current state of the registers for that thread.


REQUIREMENTS

.req.suspend: A means to suspend threads, so that they don't make any progress. 
.req.suspend.why: Needed by the thread manager so that other threads registered 
with an arena can be suspended (see design.mps.thread-manager).  Not directly 
provided by Pthreads.

.req.resume: A means to resume suspended threads, so that they are able to make 
progress again.  .req.resume.why: Needed by the thread manager.  Not directly 
provided by Pthreads.

.req.suspend.multiple: Allow a thread to be suspended on behalf of one arena 
when it has already been suspended on behalf of one or more other arenas. 
.req.suspend.multiple.why: The thread manager contains no design for 
cooperation between arenas to prevent this.

.req.resume.multiple: Allow requests to resume a thread on behalf of each arena 
which had previously suspended the thread.  The thread must only be resumed 
when requests from all such arenas have been received.  
.req.resume.multiple.why: A thread manager for an arena must not permit a 
thread to make progress before it explicitly resumes the thread.

.req.suspend.context: Must be able to access the context for a thread when it 
is suspended.

.req.suspend.protection: Must be able to suspend a thread which is currently 
handling a protection fault (i.e., an arena access).  Such a thread might even 
own an arena lock.

.req.legal: Required to use Pthreads / POSIX APIs in a legal manner.



ANALYSIS


.anal.suspend: Thread suspension is inherently asynchronous.  MPS needs to be 
able to suspend another thread without prior knowledge of the code that thread 
is running.  (I.e., we can't rely on cooperation between threads.)  The only 
asynchronous communication available on POSIX is via signals - so the suspend 
and resume mechanism must ultimately be built from signals.

.anal.signal.safety: POSIX imposes some restrictions on what a signal handler 
function might do when invoked asynchronously (see 
<URL:http://www.opengroup.org/onlinepubs/007908799/xsh/sigaction.html>, and 
search for the string "reentrant").  In summary, a small number of POSIX 
functions are defined to be "async-signal safe", which means they may be 
invoked without restriction in signal handlers.  All other POSIX functions are 
considered to be unsafe.  Behaviour is undefined if an unsafe function is 
interrupted by a signal and the signal handler then proceeds to call another 
unsafe function.  See mail.tony.1999-08-24.15-40(0)and followups for some 
further analysis.

.anal.signal.safety.implication: Since we can't assume that we won't attempt to 
suspend a thread while it is running an unsafe function, we must limit the use 
of POSIX functions in the suspend signal handler to those which are designed to 
be "async-signal safe".  One of the few such functions related to 
synchronization is sem_post.

.anal.signal.example: An example of how to suspend threads in POSIX was posted 
to newsgroup comp.programming.threads in August 1999.  The code that was posted 
was written by David Butenhof, and may be found here:

Some further discussion about the code in the newsgroup is recorded here:


.anal.signal.linux-hack: In the current implementation of Linux Pthreads, it 
would be possible to implement suspend/resume using SIGSTOP and SIGCONT.  This 
is, however, nonportable and will probably stop working on Linux at some point.

.anal.component: There is no known way to meet the requirements above in a way 
which cooperates with another component in the system which also provides its 
own mechanism to suspend and resume threads.  The best bet for achieving this 
is to provide the functionality in shared low-level component which may be used 
by MPS and other clients.  This will require some discussion with other 
potential clients and/or standards bodies.  .anal.component:  NB., such 
cooperation is actually a requirement for Dylan (req.dylan.dc.env.self), though 
this is not a problem, since all the Dylan components share the MPS mechanism.



INTERFACE

.if.pthreadext.abstract: A thread is represented by the abstract type 
PThreadext. A PThreadext object corresponds directly with a PThread (of type 
pthread_t). There may be more than one PThreadext object for the same PThread.

.if.pthreadext.structure: The structure definition of PThreadext 
(PThreadextStruct) is exposed by the interface so that it may be embedded in a 
client datastructure (e.g. ThreadStruct). This means that all storage 
management can be left to the client (which is important because there might be 
multiple arenas involved). Clients may not access the fields of a 
PThreadextStruct directly.

.if.init: Initializes a PThreadext object for a thread with the given id:
void PThreadextInit(PThreadext pthreadext, pthread_t id)

.if.check: Checks a PThreadext object for consistency:
Bool PThreadextCheck(PThreadext pthreadext)
Note that this function takes the mutex, so it must not be called with the 
mutex held (doing so will probably deadlock the thread).

.if.suspend: Suspends a PThreadext object (puts it into a suspended state). 
Meets .req.suspend.*. The object must not already be in a suspended state. If 
the function returns ResOK, the context of the thread is returned in 
contextReturn, and the corresponding PThread will not make any progress until 
it is resumed:
Res PThreadextSuspend(PThreadext pthreadext, struct sigcontext **contextReturn)

.if.resume: Resumes a PThreadext object. Meets .req.resume.*. The object must 
already be in a suspended state. Puts the object into a non-suspended state. 
Permits the corresponding PThread to make progress again, (although that might 
not happen immediately if there is another suspended PThreadext object 
corresponding to the same thread):
Res PThreadextResume(PThreadext pthreadext)

.if.finish: Finishes a PThreadext object:
void PThreadextFinish(PThreadext pthreadext)


IMPLEMENTATION


.impl.pthreadext: The structure definition for a PThreadext object is:
typedef struct PThreadextStruct {
  Sig sig;                         /* design.mps.sig */
  pthread_t id;                    /* Thread ID */
  struct sigcontext *suspendedScp; /* sigcontext if suspended */
  RingStruct threadRing;           /* ring of suspended threads */
  RingStruct idRing;               /* duplicate suspensions for id */
} PThreadextStruct;

.impl.field.id: The id field shows which PThread the object corresponds to.

.impl.field.scp: The suspendedScp field contains the context when in a 
suspended state. Otherwise it is NULL.

.impl.field.threadring: The threadRing field is used to chain the object onto 
the suspend ring when it is in the suspended state (see .impl.suspend-ring). 
When not in a suspended state, this ring is single.

.impl.field.idring: The idRing field is used to group the object with other 
objects corresponding to the same PThread (same id field) when they are in the 
suspended state. When not in a suspended state, or when this is the only 
PThreadext object with this id in the suspended state, this ring is single.

.impl.global.suspend-ring: The module maintains a global suspend-ring - a ring 
of PThreadext objects which are in a suspended state.  This is primarily so 
that it's possible to determine whether a thread is curently suspended anyway 
because of another PThreadext object, when a suspend attempt is made.

.impl.global.victim: The module maintains a global variable which is used to 
indicate which PThreadext is the current victim during suspend operations. This 
is used to communicate information between the controlling thread and the 
thread being suspended (the victim). The variable has value NULL at other times.

.impl.static.mutex: We use a lock (mutex) around the suspend and resume 
operations. This protects the state data (suspend-ring etc. see impl.global.*). 
Since only one thread can be suspended at a time, there's no possibility of two 
arenas suspending each other by concurrently suspending each other's threads.

.impl.static.semaphore: We use a semaphore to synchronize between the 
controlling and victim threads during the suspend operation. See .impl.suspend 
and .impl.suspend-handler).

.impl.static.init: The static data and global variables of the module are 
initialized on the first call to PThreadextSuspend, using pthread_once to avoid 
concurrency problems. We also enable the signal handlers at the same time (see 
.impl.suspend-handler and .impl.resume-handler).

.impl.suspend: PThreadextSuspend first ensures the module is initialized (see 
.impl.static.init). After this, it claims the mutex (see .impl.static.mutex). 
It then checks to see whether thread of the target PThreadext object has 
already been suspended on behalf of another PThreadext object. It does this by 
iterating over the suspend ring. 

.impl.suspend.already-suspended: If another object with the same id is found on 
the suspend ring, then the thread is already suspended. The context of the 
target object is updated from the other object, and the other object is linked 
into the idRing of the target. 

.impl.suspend.not-suspended: If the thread is not already suspended, then we 
forcibly suspend it using a technique similar to Butenhof's (see 
.anal.signal.example): First we set the victim variable (see 
.impl.global.victim) to indicate the target object. Then we send the signal 
PTHREADEXT_SIGSUSPEND to the thread (see .impl.signals), and wait on the 
semaphore for it to indicate that it has received the signal and updated the 
victim variable with the context. If either of these operations fail (e.g. 
because of thread termination) we unlock the mutex and return ResFAIL.

.impl.suspend.update: Once we have ensured that the thread is definitely 
suspended, we add the target PThreadext object to the suspend ring, unlock the 
mutex, and return the context to the caller.

.impl.suspend-handler: The suspend signal handler is invoked in the target 
thread during a suspend operation, when a PTHREADEXT_SIGSUSPEND signal is sent 
by the controlling thread (see .impl.suspend.not-suspended). The handler 
determines the context (received as a parameter, although this may be 
platform-specific) and stores this in the victim object (see 
.impl.global.victim). The handler then masks out all signals except the one 
that will be received on a resume operation (PTHREADEXT_SIGRESUME) and 
synchronizes with the controlling thread by posting the semaphore. Finally the 
handler suspends until the resume signal is received (using sigsuspend).

.impl.resume: PThreadextResume first claims the mutex (see .impl.static.mutex). 
It then checks to see whether thread of the target PThreadext object has also 
been suspended on behalf of another PThreadext object (in which case the id 
ring of the target object will not be single). 

.impl.resume.also-suspended: If the thread is also suspended on behalf of 
another PThreadext, then the target object is removed from the id ring. 

.impl.resume.not-also: If the thread is not also suspended on behalf of another 
PThreadext, then the thread is resumed using the technique proposed by Butenhof 
(see .anal.signal.example). I.e. we send it the signal PTHREADEXT_SIGRESUME 
(see .impl.signals) and expect it to wake up. If this operation fails (e.g. 
because of thread termination) we unlock the mutex and return ResFAIL.

.impl.resume.update: Once the target thread is in the appropriate state, we 
remove the target PThreadext object from the suspend ring, set its context to 
NULL and unlock the mutex.

.impl.resume-handler: The resume signal handler is invoked in the target thread 
during a resume operation, when a PTHREADEXT_SIGRESUME signal is sent by the 
controlling thread (see .impl.resume.not-also). The resume signal handler 
simply returns. This is sufficient to unblock the suspend handler, which will 
have been blocking the thread at the time of the signal. The Pthreads 
implementation ensures that the signal mask is restored to the value it had 
before the signal handler was invoked.

.impl.finish: PThreadextFinish supports the finishing of objects in the 
suspended state, and removes them from the suspend ring and id ring as 
necessary. It must claim the mutex for the removal operation (to ensure 
atomicity of the operation). Finishing of suspended objects is supported so 
that clients can dispose of resources if a resume operation fails (which 
probably means that the PThread has terminated).

.impl.signals: The choice of which signals to use for suspend and restore 
operations may need to be platform-specific. Some signals are likely to be 
generated and/or handled by other parts of the application and so should not be 
used (e.g. SIGSEGV). Some implementations of PThreads use some signals for 
themselves, so they may not be used; e.g. LinuxThreads uses SIGUSR1 and SIGUSR2 
for its own purposes. The design abstractly names the signals 
PTHREADEXT_SIGSUSPEND and PTHREAD_SIGRESUME, so that they may be easily mapped 
to appropriate real signal values. Candidate choices are SIGXFSZ and SIGPWR.


ATTACHMENTS
   "posix.txt"
   "susp.c"

A. References

B. Document History

2002-06-07 RB Converted from MMInfo database design document.

C. Copyright and License

This document is copyright © 1995-2002 Ravenbrook Limited. All rights reserved. This is an open source license. Contact Ravenbrook for commercial licensing options.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Redistributions in any form must be accompanied by information on how to obtain complete source code for the this software and any accompanying software that uses this software. The source code must either be included in the distribution or be available for no more than the cost of distribution plus a nominal fee, and must be freely redistributable under reasonable conditions. For an executable file, complete source code means the source code for all modules it contains. It does not include source code for modules or files that typically accompany the major components of the operating system on which the executable file runs.

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement, are disclaimed. In no event shall the copyright holders and contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.


$Id: //info.ravenbrook.com/project/mps/version/1.105/design/pthreadext/index.html#1 $

Ravenbrook / Projects / Memory Pool System / Master Product Sources / Design Documents