.. highlight:: none


.. index::
   pair: hexadecimal; transliterating

.. _design-guide.hex.trans:


Transliterating the alphabet into hexadecimal
=============================================

.. mps:prefix:: guide.hex.trans


Introduction
------------

:mps:tag:`scope` This document explains how to represent the alphabet as
hexadecimal digits.

:mps:tag:`readership` This document is intended for anyone devising
arbitrary constants which may appear in hex-dumps.

:mps:tag:`sources` This transliteration was supplied by Richard Kistruck
[RHSK-1997-04-07]_ based on magic number encodings for object signatures
used by Richard Brooksby [RB-1996-02-12]_, the existence of which was
inspired by the structure marking used in the Multics operating system
[THVV-1995]_.


Transliteration
---------------

:mps:tag:`forward` The chosen transliteration is as follows::

    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    ABCDEF9811C7340BC6520F3812

:mps:tag:`backward` The backwards transliteration is as follows::

    0 OU
    1 IJY
    2 TZ
    3 MW
    4 N
    5 S
    6 R
    7 L
    8 HX
    9 G
    A A
    B BP
    C CKQ
    D D
    E E
    F FV

:mps:tag:`pad` If padding is required (to fill a hex constant length), you
should use 9's, because G is rare and can usually be inferred from
context.

:mps:tag:`punc` There is no formal scheme for spaces, or punctuation. It is
suggested that you use 9 (as :mps:ref:`.pad`).


Justification
--------------

:mps:tag:`letters` The hexadecimal letters (A-F) are all formed by
similarity of sound. B and P sound similar, as do F and V, and C, K, &
Q can all sound similar.

:mps:tag:`numbers` The numbers (0-9) are all formed by similarity of shape
(but see :mps:ref:`.trans.t`). Nevertheless, 1=IJY retains some similarity of
sound.

:mps:tag:`trans.t` T is an exception to :mps:ref:`.numbers`, but is such a common
letter that it deserves it.


Notes
-----

:mps:tag:`change` This transliteration differs from the old transliteration
used for signatures (see design.mps.sig_), as follows: J:6->1;
L:1->7; N:9->4; R:4->6; W:8->3; X:5->8; Y:E->I.

.. _design.mps.sig: sig.html

:mps:tag:`problem.mw` There is a known problem that M and W are both common,
map to the same digit (3), and are hard to distinguish in context.

:mps:tag:`find.c` It is possible to find all 8-digit hexadecimal constants
and how many times they're used in C files, using the following Perl
script::

    perl5 -n -e 'BEGIN { %C=(); } if(/0x([0-9A-Fa-f]{8})/) { $C{$1} = +[] if(
    !defined($C{$1})); push(@{$C{$1}}, $ARGV); } END { foreach $H (sort(keys(%C)))
    { printf "%3d %s %s\n", scalar(@{$C{$H}}), $H, join(", ", @{@C{$H}}); } }' *.c
    *.h

:mps:tag:`comment` It is a good idea to add a comment to any constant
declaration indicating the English version and which letters were
selected (by capitalisation), e.g.::

    #define SpaceSig        ((Sig)0x5195BACE) /* SIGnature SPACE */


References
----------

.. [RB-1996-02-12]
   "Signature magic numbers" (e-mail message);
   `Richard Brooksby`_;
   Harlequin;
   1996-12-02 12:05:30Z.

.. _`Richard Brooksby`: mailto:rb@ravenbrook.com

.. [RHSK-1997-04-07]
   "Alpha-to-Hex v1.0 beta";
   Richard Kistruck;
   Ravenbrook;
   1997-04-07 14:42:02+0100;
   <https://info.ravenbrook.com/project/mps/mail/1997/04/07/13-44/0.txt>.

.. [THVV-1995]
   "Structure Marking";
   Tom Van Vleck;
   multicians.org_;
   <http://www.multicians.org/thvv/marking.html>.

.. _multicians.org: http://www.multicians.org/