31. POSIX thread extensions

31.1. Introduction

.readership: Any MPS developer.

.intro: This is the design of the Pthreads extension module, which provides some low-level threads support for use by MPS (notably suspend and resume).

31.2. Definitions

.pthreads: The term “Pthreads” means an implementation of the POSIX 1003.1c-1995 thread standard. (Or the Single UNIX Specification, Version 2, aka USV2 or UNIX98.)

.context: The “context” of a thread is a (platform-specific) OS-defined structure which describes the current state of the registers for that thread.

31.3. Requirements

.req.suspend: A means to suspend threads, so that they don’t make any progress.

.req.suspend.why: Needed by the thread manager so that other threads registered with an arena can be suspended (see design.mps.thread-manager). Not directly provided by Pthreads.

.req.resume: A means to resume suspended threads, so that they are able to make progress again. .req.resume.why: Needed by the thread manager. Not directly provided by Pthreads.

.req.suspend.multiple: Allow a thread to be suspended on behalf of one arena when it has already been suspended on behalf of one or more other arenas. .req.suspend.multiple.why: The thread manager contains no design for cooperation between arenas to prevent this.

.req.resume.multiple: Allow requests to resume a thread on behalf of each arena which had previously suspended the thread. The thread must only be resumed when requests from all such arenas have been received. .req.resume.multiple.why: A thread manager for an arena must not permit a thread to make progress before it explicitly resumes the thread.

.req.suspend.context: Must be able to access the context for a thread when it is suspended.

.req.suspend.protection: Must be able to suspend a thread which is currently handling a protection fault (i.e., an arena access). Such a thread might even own an arena lock.

.req.legal: Must use the Pthreads and other POSIX APIs in a legal manner.

31.4. Analysis

.anal.suspend: Thread suspension is inherently asynchronous. MPS needs to be able to suspend another thread without prior knowledge of the code that thread is running. (That is, we can’t rely on cooperation between threads.) The only asynchronous communication available on POSIX is via signals – so the suspend and resume mechanism must ultimately be built from signals.

.anal.signal.safety: POSIX imposes some restrictions on what a signal handler function might do when invoked asynchronously (see the sigaction documentation, and search for the string “reentrant”). In summary, a small number of POSIX functions are defined to be “async-signal safe”, which means they may be invoked without restriction in signal handlers. All other POSIX functions are considered to be unsafe. Behaviour is undefined if an unsafe function is interrupted by a signal and the signal handler then proceeds to call another unsafe function. See mail.tony.1999-08-24.15-40 and followups for some further analysis.

.anal.signal.safety.implication: Since we can’t assume that we won’t attempt to suspend a thread while it is running an unsafe function, we must limit the use of POSIX functions in the suspend signal handler to those which are designed to be “async-signal safe”. One of the few such functions related to synchronization is sem_post().

.anal.signal.example: An example of how to suspend threads in POSIX was posted to newsgroup comp.programming.threads in August 1999 [Lau_1999-08-16]. The code in the post was written by David Butenhof, who contributed some comments on his implementation [Butenhof_1999-08-16]

.anal.signal.linux-hack: In the current implementation of Linux Pthreads, it would be possible to implement suspend/resume using SIGSTOP and SIGCONT. This is, however, nonportable and will probably stop working on Linux at some point.

.anal.component: There is no known way to meet the requirements above in a way which cooperates with another component in the system which also provides its own mechanism to suspend and resume threads. The best bet for achieving this is to provide the functionality in shared low-level component which may be used by MPS and other clients. This will require some discussion with other potential clients and/or standards bodies.

.anal.component.dylan: Note that such cooperation is actually a requirement for Dylan (req.dylan.dc.env.self), though this is not a problem, since all the Dylan components share the MPS mechanism.

31.5. Interface

PThreadextStruct *PThreadext

.if.pthreadext.abstract: A thread is represented by the abstract type PThreadext. A PThreadext object corresponds directly with a PThread (of type pthread_t). There may be more than one PThreadext object for the same PThread.

.if.pthreadext.structure: The structure definition of PThreadext (PThreadextStruct) is exposed by the interface so that it may be embedded in a client datastructure (for example, ThreadStruct). This means that all storage management can be left to the client (which is important because there might be multiple arenas involved). Clients may not access the fields of a PThreadextStruct directly.

void PThreadextInit(PThreadext pthreadext, pthread_t id)

.if.init: Initializes a PThreadext object for a thread with the given id.

Bool PThreadextCheck(PThreadext pthreadext)

.if.check: Checks a PThreadext object for consistency. Note that this function takes the mutex, so it must not be called with the mutex held (doing so will probably deadlock the thread).

Res PThreadextSuspend(PThreadext pthreadext, struct sigcontext **contextReturn)

.if.suspend: Suspends a PThreadext object (puts it into a suspended state). Meets .req.suspend. The object must not already be in a suspended state. If the function returns ResOK, the context of the thread is returned in contextReturn, and the corresponding PThread will not make any progress until it is resumed:

Res PThreadextResume(PThreadext pthreadext)

.if.resume: Resumes a PThreadext object. Meets .req.resume. The object must already be in a suspended state. Puts the object into a non-suspended state. Permits the corresponding PThread to make progress again, (although that might not happen immediately if there is another suspended PThreadext object corresponding to the same thread):

void PThreadextFinish(PThreadext pthreadext)

.if.finish: Finishes a PThreadext object.

31.6. Implementation

struct PThreadextStruct PThreadextStruct

.impl.pthreadext: The structure definition for a PThreadext object is:

struct PThreadextStruct {
  Sig sig;                         /* design.mps.sig */
  pthread_t id;                    /* Thread ID */
  struct sigcontext *suspendedScp; /* sigcontext if suspended */
  RingStruct threadRing;           /* ring of suspended threads */
  RingStruct idRing;               /* duplicate suspensions for id */
};

.impl.field.id: The id field shows which PThread the object corresponds to.

.impl.field.scp: The suspendedScp field contains the context when in a suspended state. Otherwise it is NULL.

.impl.field.threadring: The threadRing field is used to chain the object onto the suspend ring when it is in the suspended state (see .impl.global.suspend-ring). When not in a suspended state, this ring is single.

.impl.field.idring: The idRing field is used to group the object with other objects corresponding to the same PThread (same id field) when they are in the suspended state. When not in a suspended state, or when this is the only PThreadext object with this id in the suspended state, this ring is single.

.impl.global.suspend-ring: The module maintains a global suspend-ring – a ring of PThreadext objects which are in a suspended state. This is primarily so that it’s possible to determine whether a thread is curently suspended anyway because of another PThreadext object, when a suspend attempt is made.

.impl.global.victim: The module maintains a global variable which is used to indicate which PThreadext is the current victim during suspend operations. This is used to communicate information between the controlling thread and the thread being suspended (the victim). The variable has value NULL at other times.

.impl.static.mutex: We use a lock (mutex) around the suspend and resume operations. This protects the state data (the suspend-ring the victim: see .impl.global.suspend-ring and .impl.global.victim respectively). Since only one thread can be suspended at a time, there’s no possibility of two arenas suspending each other by concurrently suspending each other’s threads.

.impl.static.semaphore: We use a semaphore to synchronize between the controlling and victim threads during the suspend operation. See .impl.suspend and .impl.suspend-handler).

.impl.static.init: The static data and global variables of the module are initialized on the first call to PThreadextSuspend(), using pthread_once() to avoid concurrency problems. We also enable the signal handlers at the same time (see .impl.suspend-handler and .impl.resume-handler).

.impl.suspend: PThreadextSuspend() first ensures the module is initialized (see .impl.static.init). After this, it claims the mutex (see .impl.static.mutex). It then checks to see whether thread of the target PThreadext object has already been suspended on behalf of another PThreadext object. It does this by iterating over the suspend ring.

.impl.suspend.already-suspended: If another object with the same id is found on the suspend ring, then the thread is already suspended. The context of the target object is updated from the other object, and the other object is linked into the idRing of the target.

.impl.suspend.not-suspended: If the thread is not already suspended, then we forcibly suspend it using a technique similar to Butenhof’s (see .anal.signal.example): First we set the victim variable (see .impl.global.victim) to indicate the target object. Then we send the signal PTHREADEXT_SIGSUSPEND to the thread (see .impl.signals), and wait on the semaphore for it to indicate that it has received the signal and updated the victim variable with the context. If either of these operations fail (for example, because of thread termination) we unlock the mutex and return ResFAIL.

.impl.suspend.update: Once we have ensured that the thread is definitely suspended, we add the target PThreadext object to the suspend ring, unlock the mutex, and return the context to the caller.

.impl.suspend-handler: The suspend signal handler is invoked in the target thread during a suspend operation, when a PTHREADEXT_SIGSUSPEND signal is sent by the controlling thread (see .impl.suspend.not-suspended). The handler determines the context (received as a parameter, although this may be platform-specific) and stores this in the victim object (see .impl.global.victim). The handler then masks out all signals except the one that will be received on a resume operation (PTHREADEXT_SIGRESUME) and synchronizes with the controlling thread by posting the semaphore. Finally the handler suspends until the resume signal is received, using sigsuspend().

.impl.resume: PThreadextResume() first claims the mutex (see .impl.static.mutex). It then checks to see whether thread of the target PThreadext object has also been suspended on behalf of another PThreadext object (in which case the id ring of the target object will not be single).

.impl.resume.also-suspended: If the thread is also suspended on behalf of another PThreadext, then the target object is removed from the id ring.

.impl.resume.not-also: If the thread is not also suspended on behalf of another PThreadext, then the thread is resumed using the technique proposed by Butenhof (see .anal.signal.example). I.e. we send it the signal PTHREADEXT_SIGRESUME (see .impl.signals) and expect it to wake up. If this operation fails (for example, because of thread termination) we unlock the mutex and return ResFAIL.

.impl.resume.update: Once the target thread is in the appropriate state, we remove the target PThreadext object from the suspend ring, set its context to NULL and unlock the mutex.

.impl.resume-handler: The resume signal handler is invoked in the target thread during a resume operation, when a PTHREADEXT_SIGRESUME signal is sent by the controlling thread (see .impl.resume.not-also). The resume signal handler simply returns. This is sufficient to unblock the suspend handler, which will have been blocking the thread at the time of the signal. The Pthreads implementation ensures that the signal mask is restored to the value it had before the signal handler was invoked.

.impl.finish: PThreadextFinish() supports the finishing of objects in the suspended state, and removes them from the suspend ring and id ring as necessary. It must claim the mutex for the removal operation (to ensure atomicity of the operation). Finishing of suspended objects is supported so that clients can dispose of resources if a resume operation fails (which probably means that the PThread has terminated).

.impl.signals: The choice of which signals to use for suspend and restore operations may need to be platform-specific. Some signals are likely to be generated and/or handled by other parts of the application and so should not be used (for example, SIGSEGV). Some implementations of PThreads use some signals for themselves, so they may not be used; for example, LinuxThreads uses SIGUSR1 and SIGUSR2 for its own purposes, and so do popular tools like Valgrind that we would like to be compatible with the MPS. The design therefore abstractly names the signals PTHREADEXT_SIGSUSPEND and PTHREAD_SIGRESUME, so that they may be easily mapped to appropriate real signal values. Candidate choices are SIGXFSZ and SIGXCPU.

.impl.signals.config: The identity of the signals used to suspend and resume threads can be configured at compilation time using the preprocessor constants CONFIG_PTHREADEXT_SIGSUSPEND and CONFIG_PTHREADEXT_SIGRESUME respectively.

31.7. Attachments

[missing attachment “posix.txt”]

[missing attachment “susp.c”]

31.8. References

Butenhof_1999-08-16

Dave Butenhof. comp.programming.threads. 1999-08-16. “Re: Problem with Suspend & Resume Thread Example”.

Lau_1999-08-16

Raymond Lau. comp.programming.threads. 1999-08-16. “Problem with Suspend & Resume Thread Example”.