www.ravenbrook.com - thread-safety.rst

.. index::
   pair: thread safety; design

.. _design-thread-safety:


Thread safety in the MPS
========================

.. mps:prefix:: design.mps.thread-safety


Introduction
------------

:mps:tag:`intro` This describes how thread safety is achieved in the MPS.


Overview
--------

:mps:tag:`over` The MPS is expected to run in an environment with multiple
threads calling into the MPS. The initial approach is very simple.
Some of the code is known to operate with exclusive access to the data
it manipulates, so this code is safe. For the rest of the code, shared
data structures are locked by the use of a single binary lock
(design.mps.lock(0)) per arena. This lock is claimed on entry to the
MPS and released on exit from it. So there is at most a single thread
(per arena) running "inside" the MPS at a time.


Requirements
------------

:mps:tag:`req.mt` Code must work correctly in presence of multiple threads
all calling into the MPS.

:mps:tag:`req.perf` Performance should not be unreasonably hindered.


Architecture
------------

:mps:tag:`arch.arena` Arena Lock: no shared data between arenas.

:mps:tag:`arch.global.binary` Global binary lock: protects mutable data
shared between arenas -- that is, the arena ring, see
design.mps.arena.static.ring.lock.

:mps:tag:`arch.global.recursive` Global recursive lock: protects static data
which must be initialized once -- for example, pool classes, see
design.mps.protocol.impl.init-lock.

:mps:tag:`arch.other` Other: data not shared.

:mps:tag:`arch.static` Static data: sigs: shared-non-mutable always inited
to same thing.

:mps:tag:`arena-entry` Each arena has a single lock. Externally visible
calls fall into two categories. Simple: arena lock not held. Lock is
claimed on entry, and released on exit. Recall: These are callable
only after a call-back from the MPS. In this case a arena lock is
already held.

:mps:tag:`interface` The definition of the interface should guarantee safe
use of calls (from a locking point of view). For example, a buffer
must be exclusive to a thread.

:mps:tag:`buffers` The buffer code is designed not to need a lock in the
fast case. A lock is only claimed on the exceptional reserve, trip and
commit cases (fill and trip?). A buffer contains references to shared
data (via pool field). Accessing this shared data must involve a lock.

:mps:tag:`deadlock` A strict ordering is required between the global and
arena locks to prevent deadlock. The binary global lock may not be
claimed while either the arena or recursive global lock is held; the
arena lock may not be claimed while the recursive global lock is held.
Each arena lock is independent of all other arena locks; that is, a
thread may not attempt to claim more than one arena lock at a time.


Analysis
--------

:mps:tag:`anal.simple` To have the code functioning correctly it should be
easy to change correctly. So a simple approach is desirable. We have
to also ensure that performance is not unreasonably downgraded.

Performance cost of locking
...........................

:mps:tag:`lock-cost` The cost of locking in performance terms are:

- :mps:tag:`lock-cost.overhead` the overhead of claiming and releasing locks;

- :mps:tag:`lock-cost.pause` the pauses caused by one thread being blocked
  on another thread.

- :mps:tag:`lock-cost.wait` the time wasted by one thread being blocked on
  another thread.

:mps:tag:`anal.perf.signif` :mps:ref:`.lock-cost.pause` is significant if there are
MPS functions that take a long time. Using more locks, e.g. having a
lock per pool as well as a lock per arena, is a way of decreasing the
locking conflict between threads (.lock-cost.pause and
:mps:ref:`.lock-cost.wait`). However this could increase
:mps:ref:`.lock-cost.overhead` significantly.

:mps:tag:`anal.perf.work` But all MPS functions imply a small work-load
unless a collection is taking place. In the case of a collection, in
practice and certainly in the near future, all threads will most
likely be suspended while the collection work is going on. (The pages
being scanned will need to be unprotected which implies the mutator
will have to be stopped.) We also have to remember that unless we are
running on genuine multiprocessor :mps:ref:`.lock-cost.wait` is irrelevant.

:mps:tag:`anal.perf.alloc` During typical use we expect that it is
allocation that is the most frequent activity. Allocation buffers
(design.mps.buffer) are designed to allow allocation in concurrent
threads without needing a lock. So the most significant time a thread
spends in the MPS will be on a buffer-fill or during a collection. The
next most significant use is likely to be buffer create and deletion,
as a separate buffer will be required for each thread.

:mps:tag:`anal.perf.lock` So overall the performance cost of locking is, I
estimate, most significantly the overhead of calling the locking
functions. Hence it would be undesirable from a performance point of
view to have more than one lock.

Recursive vs binary locks
.........................

:mps:tag:`anal.reentrance` The simplest way to lock the code safely is to
define which code runs inside or outside the lock. Calling from the
outside to the inside implies a lock has to be claimed. Returning
means the lock has to be released. Control flow from outside to
outside and from inside to inside needs no locking action. To
implement this a function defined on the external interface needs to
claim the lock on entry and release it on exit. Our code currently
uses some external functions with the lock already held. There are two
ways to implement this:

#. :mps:tag:`recursive` Each external function claims a recursive lock.

   - simple;
   - have to worry about locking depth;
   - extra locking overhead on internal calls of external functions;

#. :mps:tag:`binary` Each external function claims a binary lock. Replace
   each internal call of an external function with a call to a newly
   defined internal one.

   - more code
   - slightly easier to reason about

:mps:tag:`anal.strategy` It seems that the :mps:ref:`.recursive` strategy is the
easiest to implement first, but could be evolved into a :mps:ref:`.binary`
strategy. (That evolution has now happened. tony 1999-08-31).


Ideas
-----

:mps:tag:`sol.arena-lock` Lock per arena which locks all MPS structures
associated with the arena, except allocation buffers.

:mps:tag:`sol.init` Shared static data may not be changed. It is initialised
before being read, and if re-initalised the values written must be
identical to those already there. Essentially only read-only shared
static data is allowed.

:mps:tag:`sol.fine-grain` Use finer grained locks, for example, a lock per
per pool instance. Arena lock locks only operations on arena. Pool
locks are claimed per pool. An ordering on pool instances would avoid
deadlock.

:mps:tag:`sol.global` Use global locks for genuinely global data which must
be updated dynamically. An ordering between global and arena locks
would avoid deadlock.


Implementation
--------------

Use MPS locks (design.mps.lock) to do locking.

Locking Functions
.................

:c:func:`ArenaEnter()` and :c:func:`ArenaLeave()` are used to claim and release the
arena lock. To implement this:

- There is a lock for every arena. The arena class init function
  allocates the lock as well as the arena itself.
- :c:func:`ArenaInit()` calls :c:func:`LockInit()` on the lock and initializes the
  pointer to it from the arena.
- :c:func:`ArenaDestroy()` calls :c:func:`LockFinish()` on it.
- :c:func:`ArenaEnter()` claims the lock.
- :c:func:`ArenaLeave()` releases the lock.

Shared and non-shared data
..........................

Non-shared data is data for which no other thread has a handle on it.
shared-non-mutable data is data which is never changed after
initialisation. It may be re-initialised, if re-initialisation does
not change its value. atomically updatable data is data which is not
locked, but may be shared because it is in a consistent state before
and after an update.

A function is "safe" if it may safely execute without exclusive access
to the data it manipulates.

A "safe" function may:

- call other safe functions;
- manipulate non-shared data;
- read shared-non-mutable data;
- claim the arena lock around code which may manipulate shared data in
  the arena.

Each function in the external MPS interface falls into one of the
following categories:

- calls :c:func:`ArenaEnter()` on entry and :c:func:`ArenaLeave()` on exit;
- uses :c:func:`PoolArena()` to identify the arena, before claiming the lock;
- uses :c:func:`BufferPool()` and :c:func:`PoolArena()` to identify the arena, before
  claiming the lock;
- is not defined as external but is listed for explicitness;
- only claims the lock in otherwise unsafe situations (buffer code?);
- may be called externally but only in a situation where the arena
  lock is already held;
- is the unique accessor of its data.

So :c:func:`PoolArena()` and :c:func:`BufferPool()` must be "safe". ``pool->arena`` is
shared-non-mutable. ``buffer->pool`` is shared-non-mutable.

Validation
..........

We have to be careful about validation. Any function that is called
from a arena-safe function without the arena-lock held, must itself be
safe, or manipulating non-shared data.

For example, calling :c:func:`PoolIsValid()` before claiming the lock would be
wrong if :c:func:`PoolIsValid()` is unsafe. Defining it to be safe would
involve locking it, which if done in all similar situations would be
very expensive.

Possibly remove validation from accessor methods; replace with
signature check and :c:func:`IsValid()` calls in callers of accessor
functions.

Annotations?:
- safe
- non-shared
- shared-non-mutable

Safe functions
..............

Arena

- :c:func:`ArenaCreate()` -- no shared data; no lock; calls :c:func:`LockInit()`.
- :c:func:`ArenaDestroy()` -- no shared data; no lock (should only finish
  arena after use); calls :c:func:`LockFinish()`.
- :c:func:`ArenaDescribe()` -- lock.

Root (for the purposes of locking this module can be thought of as external)

- :c:func:`RootCreate()` -- calls create
- :c:func:`RootCreateTable()` -- calls create
- create -- lock
- :c:func:`RootDestroy()` -- lock
- :c:func:`RootDescribe()` -- lock

will be attached to arena, can lock now.


Pool               

- :c:func:`PoolCreate()` / :c:func:`PoolCreateV()` -- lock (Create calls CreateV which locks).
- :c:func:`PoolDestroy()` -- lock
- :c:func:`PoolAlloc()` -- lock
- :c:func:`PoolFree()` -- lock
- :c:func:`PoolArena()` -- accesses shared-non-mutable data only
- :c:func:`PoolDescribe()` -- lock

Format

- :c:func:`FormatCreate()` -- lock
- :c:func:`FormatDestroy()` -- lock

Buffer

- :c:func:`BufferCreate()` -- lock
- :c:func:`BufferDestroy()` -- lock
- :c:func:`BufferFill()` -- lock
- :c:func:`BufferTrip()` -- lock
- :c:func:`BufferPool()` -- accesses shared-non-mutable data only
- :c:func:`BufferDescribe()` -- lock
- :c:func:`BufferCommit()` -- "unsafe": buffer may be used by single thread
  only. (but safe wrt arena)
- :c:func:`BufferReserve()` -- "unsafe": also

PoolClass (only shared data is static and non-mutable)

- :c:func:`PoolClass()`
- :c:func:`PoolClassAMC()`
- :c:func:`PoolClassMV()`
- :c:func:`PoolClassMFS()`

Sig (as with :c:type:`PoolClass`, relies on static data reinitialised to
constant value)

Collect

- :c:func:`Collect()` -- lock

Thread

- :c:func:`ThreadRegister()` -- lock
- :c:func:`ThreadDeregister()` -- lock