MPS/MMREF import procedures
Nick Levine, Ravenbrook Limited, 2001-08-15
1. INTRODUCTION
This is the procedure for the one-off (but see section 1.1 for
qualification of that) import of data from the HOPE/RCS hierarchy
[Global Graphics 2001-08-13] into Ravenbrook's perforce repository
under /project/mps.
This procedure takes four .tar.gz archives from Global Graphics,
representing the revision history of the MPS and MMRef projects, and
merges those projects into Ravenbrook's Perforce repository.
(This procedure includes the import of data from the MMref project as it
is related and any faults in the procedure form one project will
probably need fixing identically in the other. Also, we're going to make
interleaving of changes even nastier if we do the two seperately.)
The purpose is to allow Ravenbrook to develop and maintain the MPS and
MMRef projects.
The intended readership is Ravenbrook development staff.
This document is not confidential.
1.1. Status
The procedure was first run on 2001-08-17. It took about 2.5 hours,
most of which was spent doing the backups described in section 3.1,
items 3 and 4 below. Sections 3.4, 3.5 and 3.6 were carried out while
this was going on; this part of the procedure took less than an hour.
This document was updated after the procedure was run, as the merge
with our central system had not been fully tested in advance.
Numerous problems emerged after the merge (see for example [NB
2001-08-24]); after investigation we settled on surgery:
a) of RCS files to remove checkpoint labels, as these confuse the
rcstoperf script
b) of the resulting checkpoint, which still contained bogus and
broken entries [NB 2001-09-10].
We may decide to restore checkpoint labels later as a seperate
operation.
This document now also desribes how to merge the imported material
into Ravenbrook's Perforce repository given that a previous (but
broken) version of that material is already there.
2. HOW THE MERGE WORKS
2.1. Paths and branches
Paths in the original look like
mps/src/misc.h
mmref/src/diagrams/address.png
For the trunk branch (revisions of the form 1.x), these gets mapped to
//info.ravenbrook.com/project/mps/branch/2001-08-13/trunk/src/misc.h
//info.ravenbrook.com/project/mmref/branch/2001-08-13/trunk/src/diagrams/address.png
2.2. Branches
There are no branches in the MMRef project. In the MPS, each branch
gets mapped to
//info.ravenbrook.com/project/mps/branch/DATE/NAME/...
where DATE is NDL's best guess as to when the branch was made [NDL
2001-08-14b] and NAME has been generated by the RCS to Perforce script,
from branch labels in the RCS files.
2.3. Changelists and dates
Changelists will run sequentially from the most recent change in
Ravenbrook's repository. This means that changelists will be out of
sequence with respect to date (but within each project things will be in
order). Renumbering changelists would be unacceptable because it would
invalidate lots of stuff in the information system, or stuff we've
published.
3. PROCEDURE
The outline is: backup Perforce; unpack the archives from Global
Graphics; run the RCS to Perforce script to fake up a checkpoint and
build a depot; edit the checkpoint to fix paths in the depot; restore
from the edited checkpoint and test; merge checkpoints and depots;
restore.
3.1. Getting started
Note: second time through (to repair the checkpoint created first
time), we already know the range of change numbers affected. It is
not necessary to turn off robots or the Perforce server until the
merge is ready (section 3.8).
1. Turn off the infomail robot on raven:
su infomail
crontab -r
Turn off the infosys robot on sparrow:
su infosys
crontab -r
2. Run "p4 counters" and then halt the server (this stops the change
number from being bumped up while I'm working). Remember the value
for the change counter; we'll need it in section 3.5, item 3).
p4 -u root admin stop
3. Make a tape backup of sparrow [GDR 2000-10-04].
4. Build a checkpoint. For safety, copy the checkpoint, db.* files
and the depot under /home/p4/repository to a different location.
export DATE=`date "+%Y%m%dT%T"`
p4d -r /home/p4/repository -jd checkpt.$DATE
cp -R /home/p4/repository/{checkpt.$DATE,db.*,info.ravenbrook.com} /home/ndl/backup.$DATE
Note after the event: this copy took forever. It would probably
have been quicker to build a tar file.
3.2. Environment
Running on gannet (NT) with cygwin. (I have not tried this on any
other environment.)
I needed to set
PATH=/usr/bin:$PATH
so that I would get cygwin versions of find and sort (as opposed to
win32 versions).
We made minor fixes to the rcstoperf.sh script, as follows:
1. An additional command-line option (-halt) which allows you to do a
copy and nothing else (so that we can check the intermediate
results).
2. A minor bug fix (-r missing from a p4d command).
The modified tool is [NDL 2001-08-15].
3.3. Preconditions
1. An empty directory to to all this in. All relative pathnames are
relative to this directory. I assume you're in this directory at
the start of each section below.
2. A clean (ie unused) Perforce server. I used version 2000.2 as it
matches our central server. Unpack it in p4d/ and add this
directory to PATH.
3. The four tar files from
/project/mps/import/2001-08-13/MM-rcs/,
/project/mmref/import/2001-08-13/MM-rcs/,
in rcs/
4. The contents of /project/mps/tool/rcs-import/ and
/project/mps/tool/second-rcs-import/, in tool/
5. This procedure uses two scripts written in Common Lisp. These
have been tested against version 4.1.20 of LispWorks (Windows NT
"professional" release).
3.4. Install the RCS files:
1. Unpack and group (as per [NDL 2001-08-14a]):
cd rcs
for tar in HOMEmm.tar MMQA.tar MMsrc.tar; do tar xf $tar; done
mkdir mps
mv mm mps/home
mv mmqa mps/qa
mv src mps
tar xf MMref.tar
mkdir mmref
mv mm mmref/src
rm *.tar
2. Remove checkpoint labels:
Start LispWorks. Compile and load tool/remove-checkpoint-labels.lisp.
Run
(rewrite-rcs-files <directory>)
where <directory> is a string naming the full path to the rcs/
directory, for example:
(rewrite-rcs-files "e:/tmp/p4/all/rcs/")
This function descends through subdirectories, assuming all files it
meets are RCS files, removing all labels corresponding to
checkpoints, and rewriting each file in-place.
3.5. Run the (modified) rcstoperf script:
I prefer to run it in stages and verify the results as it goes.
1. tool/rcstoperf.sh -copy -halt rcs p4d
Copies the RCS hierarchy. OK, you could copy these files by hand, but
this is a quick sanity check that things are where they ought to be.
You should end up with a directory p4d/depot/IMPORT with subdirectories
mps/home, mps/qa, mps/src and mmref/src. (When we edit the checkpoint
in section 3.6, item 6, we'll change all the paths starting IMPORT.)
2. tool/rcstoperf.sh -nocopy -extract rcs p4d
This builds a (large) file p4d/tmp.extract. This looks a bit like a
checkpoint; it contains all the metadata from the RCS files.
3. tool/rcstoperf.sh -nocopy -next CHANGE -changes -meta rcs p4d
Where CHANGE is one of the following:
a) If the RCS changes are being added to Perforce for the first
time, CHANGE is 1 plus the highest changelevel in Ravenbrook's
Perforce server (see step 3.1, item 2).
b) If the RCS changes have already been added to Perforce and we
are now attempting to fix problems (as described in section 1.1
above), CHANGE is the same as last time. In this case, it is
essential to verify after this operation that the value of the
change counter in the first line of tmp.meta is the same as last
time. The operations in section 3.8 below assume this.
This call to rcstoperf builds a checkpoint file p4d/tmp.meta. On
gannet this checkpoint needs immediate surgery before we can proceed
(see comments about awk in [Perforce 1999]).
4. The perl script to restore missing "@" charcters is in tool/:
mv p4d/tmp.meta p4d/tmp.meta.at
perl tool/terminal-ats.pl p4d/tmp.meta.at > p4d/tmp.meta
5. tool/rcstoperf.sh -nocopy -load rcs p4d
We should now have a more compact checkpoint file (tmp.checkpt)
and a bunch of db.* files, all in p4d/.
6. Edit the checkpoint, removing systematic errors.
Start LispWorks (or continue to use the old image). Compile and load
tool/edit-checkpoint.lisp. Run
(edit-checkpoint <checkpoint>)
where <checkpoint> is a string naming the full path to the
checkpoint file created above (tmp.checkpoint). For example:
(edit-checkpoint "e:/tmp/p4/all/p4d/tmp.checkpt")
For each file / branch combination, this does the following:
a) Remove db.rev and db.revcx entries with lowest changelist
numbers (they're bogus), confirming that the remaining db.rev
entry with the lowest changelist number is correctly numbered in
the "lbrRev" field
b) Subtract 1 from revision values for all remaining entries
c) Change "action" on earliest remaining db.rev / db.revcx entries
to 0 ("add")
d) Store changelist from earliest remaining db.rev / db.revcx
entries in db.integ (in place of zero there at present)
e) Change "resolved" to 2 ("automatically as part of branch") in
all db.integ entries
f) Change "how" from 3 ("branch into") to 11 ("dirty branch into")
in first of each db.integ pair
This procedure generates a new checkpoint file, tmp.checkpoint.out,
which is significantly shorter than before (but with the same
number of changes).
7. Install this checkpoint.
cd p4d/
rm db.*
p4d -r . -jr tmp.checkpoint.out
3.6. Rename the branches
We need to do some relocation within the depot (as per [NDL
2001-08-14b]).
1. Start the Perforce server. Create yourself a client.
2. mkdir tmp
3. tool/branches.sh > tmp/branches.txt
This creates a file with two columns separated by whitespace; the
branch name and an estimate of the date the branch was created in [ISO
8601] format. (Perforce got the branch names from labels in the RCS
files at step 5 in section 3.5.)
(This is a non-scalable report; give it a minute or two to run!)
4. awk '{printf "s+//depot/%s/mps/+//info.ravenbrook.com/project/mps/branch/%s/%s/+\n", $1,$2,$1}' tmp/branches.txt > tmp/uniq.sed
Convert the map from date to name into a sed script that carries out the
path rewriting for all the branches.
5. Edit the tail of tmp/uniq.sed:
In the last line, change
DATE/main
to
2001-08-13/trunk
Make a copy of the last line, with "mmref" substituted for the two
occurances of "mps".
Add one further line:
s+//depot/IMPORT/+//info.ravenbrook.com/IMPORT/+
In an attempt to learn from earlier defects, I note that an inability
to spell ravenbrook correctly has annoying consequences downstream.
Add the following, which translate users:
s/@gavinm@/@grm@/
s/@nickb@/@nb@/
s/@richard@/@rb@/
(Make sure there's a final newline, otherwise sed won't see the last
line of the file.)
6. sed -f tmp/uniq.sed p4d/tmp.checkpt.out > p4d/checkpt.mm
This will take around 10 minutes; the new checkpoint file will be
about half as big again as the old one. While it's running, carry out
step 10.
7. Halt the Perforce server, remove the db files and journal
rm p4d/db.* p4d/journal
8. We need to fix up the (few) binary files in the distribution.
The only binary files have extension .png, and they're all on version
1.1.
Edit p4d/checkpt.mm. Every line matching the regexp
db\.rev@.*\.png@
should be changed: the lines have the pattern:
"@pv@ 3 @db.rev@", $file, $rev, 0, ..., 0
Both zeros shown should be changed to 257. Retain the trailing space
at the end of the lines.
(Note: [Perforce 1999] says only to change the first 0. Experiments
have shown that this leads to crlf corruption of binary files.)
Now go to the directory containing these diagrams and unpack them:
cd p4d/depot/IMPORT/mmref/src/diagrams
co *,v
(Gannet doesn't have rcs installed. I built a tarball of the .v files,
ftp'd it to raven, unpacked it there and copied the .png files back.)
Finally, move the files thus:
for p in *.png; do mkdir $p,d; mv $p $p,d/1.1; done
9. p4d -r p4d -jr checkpt.mm
10. mv p4d/depot p4d/info.ravenbrook.com
(Cygwin does this as a deep copy. Use the desktop for real-time
results.)
11. Restart Perforce. Create a depot info.ravenbrook.com. Create
yourself a client. Try it out. (Check: for non-corruption of png
files; for plausibility of files, braches, changes, etc.)
3.7. Merge, version A
Follow this section only if merging for the first time. Otherwise
follow section 3.8 below.
1. Copy the fabricated depot and checkpoint to sparrow and unpack
them.
cd p4d
tar cf - info.ravenbrook.com checkpt.mm | gzip -c > mm.tar.gz
FTP the tarball to sparrow and unpack it.
cd /home/p4/repository
gunzip -c mm.tar.gz | tar xvf -
(We know that this won't overwrite any files because the new depot
is all under info.ravenbrook.com/IMPORT/.)
2. Restore from the RCS checkpoint.
p4d -r /home/p4/repository -jr checkpt.mm
3. Start Perforce server and test it.
4. Restart infomail and infosys.
On raven:
su infomail
crontab /home/infomail/etc/crontab
On sparrow:
su infosys
crontab /home/infosys/etc/crontab
3.8. Merge, version B
Follow this section only if merging for a second time. First time
around, follow section 3.7 above.
1. If you haven't already carried out the steps in section 3.1, do
so now.
2. Copy the fabricated depot to sparrow and unpack it:
cd p4d
tar cf - info.ravenbrook.com | gzip -c > mm.tar.gz
FTP the tarball to sparrow and unpack it.
cd /home/p4/repository
mv info.ravenbrook.com/IMPORT ./IMPORT.bak
gunzip -c mm.tar.gz | tar xvf -
(We know that this won't overwrite any files because both the new and
previously new depots are all under info.ravenbrook.com/IMPORT/.)
3. Set up a working space:
Make an empty directory "sparrow" on gannet.
Copy the checkpoint dumped in section 3.1, item 4 above into
sparrow/ on gannet, naming it checkpt.start
Copy the fabricated checkpoint from last time into sparrow/ on
gannet, naming it checkpt.old
Copy the fabricated checkpoint from this time into sparrow/,
naming it checkpt.new
4. We want to generate a new checkpoint file, which has the
following differences from checkpt.start:
a) all but one of the changes which were introduced by checkpt.old
removed, the exception being the removal of the "change" counter,
which is bogus.
b) all the changes introduced on mps/branch and mmref/branch since
then removed (these are db.have entries caused by people syncing
their clients; note that we exclude additions of m*/branch/index.html
as these two files are not part of the RCS merge.
c) all but one of the changes introduced by checkpt.new added, the
exception being the resetting of the "change" counter, which is now
bogus.
cd sparrow/
p4d -r . -jr checkpt.start
sed s/^@pv@/@dv@/ checkpt.old | tail +2 > checkpt.dv1
p4d -r . -jr checkpt.dv1
p4d -r . -jd checkpt.midway
grep 'project/m[^/]*/branch/[12]' checkpt.midway | sed s/^@pv@/@dv@/ > checkpt.dv2
p4d -r . -jr checkpt.dv2
tail +2 checkpt.new > checkpt.new-no-counter
p4d -r . -jr checkpt.new-no-counter
p4d -r . -z -jd checkpt.final.gz
5. Advise users that their client spaces will need fixing up by
hand.
awk -F/ '{print $3}' checkpt.dv2 |sort -u
This generates a list of clients which had synced some or all of the
affected branches. Since (a) what they had synced is probably bogus
and (b) checkpt.final does not know about these syncs, the owners of
these clients will need to remove these branch directories by hand
rm -rf project/m*/branch/[12]*
6. Copy (FTP) this new fabricated depot to /home/p4/repository on
sparrow.
7. Restore from the RCS checkpoint.
cd /home/p4/repository
rm db.*
p4d -r . -z -jr checkpt.final.gz
8. Start Perforce server and test it.
9. Restart infomail and infosys, as in item 4 of section 3.7 above.
A. REFERENCES
[GDR 2000-10-04] "Backup procedures for sparrow.ravenbrook.com";
Gareth Rees; Ravenbrook Limited; 2000-10-04.
<URL:/doc/2000/10/04/backup-procedure/>
[Global Graphics 2001-08-13] "Memory Management RCS files"; Global
Graphics 2001-08-13. <URL:/project/mps/import/2001-08-13/MM-rcs/>
[ISO 8601] "ISO 8601:2000 Data elements and interchange formats --
Information interchange -- Representation of dates and times"; ISO;
1988-06-15.
[NB 2001-08-24] "MPS import to Perforce seems to be broken" (email
message); Nick Barnes; Ravenbrook Limited; 2001-08-24.
<URL://info.ravenbrook.com/mail/2001/08/24/13-35-45/0.txt>
[NB 2001-09-10] "Re: MPS import to Perforce seems to be broken" (email
message); Nick Barnes; Ravenbrook Limited; 2001-09-10.
<URL://info.ravenbrook.com/mail/2001/09/10/11-22-47/0.txt>
[NDL 2001-08-14a] "Re: Pekka P. Pirinen: Re: schedule of items to be
assigned" (email message); Nick Levine; Ravenbrook Limited;
2001-08-14. <URL:
http://info.ravenbrook.com/mail/2001/08/14/14-35-29/0.txt>
[NDL 2001-08-14b] "Re: Pekka P. Pirinen: Re: schedule of items to be
assigned" (email message); Nick Levine; Ravenbrook Limited;
2001-08-14. <URL:
//info.ravenbrook.com/mail/2001/08/14/18-15-36/0.txt>
[NDL 2001-08-15] "RCS to Perforce"; Nick Levine; Ravenbrook Limited;
2001-08-15; <URL:/project/mps/tool/rcs-import/>.
[Perforce 1999] "Problems with the rcstoperf.sh conversion script" /;
Perforce Inc; 1999. <URL:
/project/mps/import/1999/RCS-to-Perforce/note031.html>
B. DOCUMENT HISTORY
2001-08-15 NDL Created.
2001-08-16 NDL Fixed instructions for binary files.
2001-08-16 GDR Expanded to specify the whole procedure.
2001-08-17 NDL Note procedure has been carried out. Fixed bits we
couldn't test in advance.
2001-09-12 NDL Revise in the light previous failures.
2001-09-13 NDL Merge process for second pass (section 3.8).
2002-06-20 NDL Removed confidentiality notice and updated the
copyright / license.
C. COPYRIGHT AND LICENSE
Copyright (C) 2001-2002 Ravenbrook Limited <http://www.ravenbrook.com/>.
All rights reserved. This is an open source license. Contact
Ravenbrook for commercial licensing options.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Redistributions in any form must be accompanied by information on how
to obtain complete source code for this software and any accompanying
software that uses this software. The source code must either be
included in the distribution or be available for no more than the cost
of distribution plus a nominal fee, and must be freely redistributable
under reasonable conditions. For an executable file, complete source
code means the source code for all modules it contains. It does not
include source code for modules or files that typically accompany the
major components of the operating system on which the executable file
runs.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, OR NON-INFRINGEMENT, ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDERS AND CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
$Id: //info.ravenbrook.com/project/mps/procedure/rcs-import/index.txt#11 $