Ravenbrook / Projects / Perforce Defect Tracking Integration / Master Product Sources / Design

Perforce Defect Tracking Integration Project

Replicator design

Gareth Rees, Ravenbrook Limited, 2000-09-13

1. Introduction

This document describes the design, data structures and algorithms of the Perforce defect tracking integration's replicator daemon.

The purpose of this document is to make it possible for people to maintain the replicator, and to adapt it to work on new platforms and with new defect tracking systems, to meet requirements 20 and 21.

This document will be modified as the product is developed.

The readership of this document is the product developers.

This document is not confidential.

2. Design notes

2.1. Structure of the replicator

For each pair consisting of a defect tracking server and Perforce server where there is replication going on there is a replicator object. The replicator object in Python belongs to the replicator class in the replicator module, or to a subclass.

These replicator objects do not communicate with each other. This makes their design and implementation simple. (There may be a loss of efficiency by having multiple connections to a defect tracking server, or making multiple queries to find cases that have changed, but I believe that the gain in simplicity is worth the risk of loss of performance.)

The replicator object is completely independent of the defect tracking system: all defect tracking system specific code is in a separate object [RB 2000-11-08]. This makes it easier to port the integration to a new defect tracking system (requirement 21).

Each replicator object is paired with a defect tracker object, which represents the connection to the defect tracking system. The defect tracker object in Python belongs to a subclass of the defect_tracker class in the replicator module [RB 2000-10-16, 7.1].

The defect tracker object will in turn use some interface to connect to the defect tracking system. This may be an API from the defect tracking vendor, or a direct connection to the database.

The structure of the replicator is illustrated in figure 1.

Figure 1. The replicator structure
Diagram of
the replicator structure

2.2. Replicator identifier

Each replicator has a unique identifier. This is a string of up to 32 characters that matches the syntax of an identifier in C (only letters, digits and underscores, must start with a letter or underscore). The replicator identifier can be used as a component of other identifiers where it is necessary to distinguish between different replicators. The replicator identifier makes it possible to support organizations with multiple defect tracking servers and/or multiple Perforce servers (requirements 96, 97 and 98).

When the integration is installed, the administrator must extend the Perforce jobspec P4DTI-rid field which contains the identifier of the replicator which replicates that job (see section 4.3). The integration must extend the defect tracking system's issue table the first time it runs with a field that will contain a replicator identifier. This field will not be filled in until the issue is selected for replication; see section 2.9.

A consequence of this design is that each job is replicated to one and only one issue (and vice versa).

2.3. Perforce server identifier

Each Perforce server has a unique identifier. This is a string of up to 32 characters that matches the syntax of an identifier in C (only letters, digits and underscores, must start with a letter or underscore). The server identifier makes it possible to support organizations with multiple Perforce servers (requirements 97 and 98).

The integration must extend the defect tracking system's issue table the first time it runs with a field that will contain the Perforce server identifier of the server the issue is replicated to. This field will not be filled in until the issue is selected for replication; see section 2.9.

Note that the design of the replicator means that each replicator corresponds to exactly one Perforce server. However, this is an incidental feature of the implementation, not a principle on which you can depend. So make sure you always bear in mind the possibility that a replicator may replicate to multiple Perforce servers.

At initialization time, each defect tracker object will provide to the defect tracking system the Perforce servers it supports replication to (for example, it may put this information in a table in the defect tracking database). This allow the defect tracking system to present the name of the server that each issue is replicated to.

2.4. Mapping between jobs and issues

The replicator needs to find the issue corresponding to a job and the job corresponding to an issue. At installation time, the administrator must extend the Perforce jobspec with a P4DTI-issue-idfield which, if the job is being replicated, will contain a string from which the defect tracker object can deduce the identifier of the corresponding issue (see section 4.2). (I expect this to be issue identifier itself, if it is a string, or a string conversion, if it is a number, but any string representation is allowed.) The integration must extend the defect tracking system's issue table the first time it runs with a field that will contain the name of the corresponding job.

The choice of jobname for new jobs that correspond to issues is up to the defect tracker object.

We don't use the jobname to represent the mapping, because we need to support migration from just using jobs without renaming the existing jobs, to meet requirement 95.

It may not even be a good idea to create jobs with special names because it would look like we're using the name, and we're not. We don't want to confuse users or administrators or developers who won't read this paragraph. On the other hand, for people who use both systems, it would be useful to be able to see at a glance which issue a job corresponds to.

2.5. Storing associated filespecs in Perforce

Associated filespecs are stored in a field in the job.

At installation time, the administrator must create a P4DTI-filespecs field in the job to store the associated filespecs; see section 4.1.

2.6. Identifying changed entities in Perforce

In Perforce, changed entities are identified using the p4 logger command, available in Perforce 2000.1. The logger must be started by setting the logger counter to zero with p4 counter logger 0. It is a bad idea to do this more than once; see [Seiwald 2000-09-11].

The output of p4 logger gives a list of changed changelists and jobs that looks like this:

435 job job000034 436 change 1234 437 job job000016

Changes to the fixes relation show up in the logger output as changes to the associated changelist and job. Changes to the associated filespecs relation show up as changes to jobs.

We keep a counter (named p4dti- plus the replicator id) that records the highest log entry we've dealt with. At the start of a poll, we use p4 logger -t p4dti-rid to get fresh log entries. At the end of a successful poll we update the counter to the highest sequence number we read with p4 logger -c sequence-number -t p4dti-rid. If the highest sequence number we read is in fact the last entry in the log, this has the side effect of clearing the log. See p4 help undoc.

To avoid replicating changes made by the replicator back to the defect tracker, and so doubling the frequency of conflicts (see for example job000042), the replicator keeps track of how many times it has updated each job in the course of a poll, either through editing the job directly, or through creating, modifying, or deleting a fix. On the next poll the corresponding number of log entries can be discarded; then a job is replicated only if it has log entries remaining.

It is important to distinguish changes made by users of Perforce from changes replicated from the defect tracking system, so that these changes are not replicated back again (this would not necessarily be harmful, but it would double the likelihood of inconsistency, since there would be twice as many replications, and so possibly fail to meet requirement 1).

2.7. Who last modified a job?

In Perforce 2002.1 or later, the P4DTI-user field in a job gives the user who last updated that job (either through editing the job directly or through issuing a fix against it).

In older versions of Perforce, there's no reliable way to determine who last modified a job; see job000016.

So the replicator makes its best guess at who last modified a job by using more information from Perforce, as follows:

Is there a fix record, submitted more recently than the job has been modified, by someone other than the replicator? If so, take the person who submitted the most recent such fix as the modifier.
If not, does the P4DTI-user field contain a user other than the replicator? If so, take them as the modifier.
If not, take some suitable user (the job owner, say) as the modifier.

This approach can be fooled by making a fix and then deleting it before the replicator polls.

Changes to changelists are replicated from Perforce to the defect tracking system only, so there is no need to make this distinction.

Note 1: If someone edits the same job twice in Perforce before the replicator can replicate it, then the replicator cannot determine what the intermediate state was. This has consequences when the defect tracker has a workflow model: suppose that a job status changes from A to B (which corresponds to transition T) and then from B to C (which corresponds to transition U). But the replicator sees only the status change from A to C, which doesn't correspond to any transition. So the workflow can't be consistently recorded in the defect tracker. There's nothing the replicator can do about this: the intermediate state of the job is not recorded in Perforce.

2.8. Identifying deleted entities

The integration does not support deletion of jobs and defect tracking issues. Deletion of jobs and issues is a bad idea anyway, since you lose information about the history of activity in the system.

Because of the possibility of deletion of fixes, the replicator fetches all fix records in both systems when replicating an issue; it computes the differences between the lists and replicates the additions, updates and deletions.

2.9. Starting replication

The replicator initiates the replication of unreplicated issues by applying a policy function, which is configurable by the administrator of the replication. We want to support organizations which have multiple Perforce servers (requirement 96). It may not be possible to tell which Perforce server an issue should be replicated to until some point in the workflow (perhaps when the issue is assigned to a project or to a developer). So each replicator should examine each unreplicated issue each time that issue changes, and apply the policy function to determine if the issue should be replicated.

Justification for this decision was given in [GDR 2000-10-04] and is repeated here:

There are three solutions to the problem of getting started with replication (that is, deciding which cases to replicate, and which Perforce server to replicate them to, when there are multiple Perforce servers):

The replicator identifier and Perforce server fields in the case is editable by the defect tracker user, who picks the appropriate Perforce server at some point.
The defect tracker picks a replicator and Perforce server at some point, by applying a policy function configured by the administrator of the integration.
Each replicator identifies cases that are not yet set up for replication and decides whether it should replicate them, by applying a policy function configured by the administrator of the integration.

Solution 1 is the least appropriate. The defect tracker user may not have the knowledge to make the correct choice. The point of the integration is to make things easier for users, and selection of Perforce server should be automated if possible. By exposing the Perforce server field to the user, we run into other difficulties: should the field be editable? What if the user makes a mistake? Best to avoid these complexities.

2 and 3 are similar solutions, but 3 keeps the integration configuration in one place (in the replicator) where it is easier to manage than if it is split between the replicator and defect tracker. It is also the solution that depends least on support from the defect tracking vendor.

2.10. Design principle: no internal state

The replicator has no internal state. Each database has its own record of how its entities relate to entities in the other database, and what the replicator has replicated. These records are updated only when they are known to be consistent with the other system. This makes it possible to check the consistency of the whole system using a separate tool, increasing our confidence in the correctness of the replication algorithm. It also means that replicator is robust against problems occurring in the middle of a replication (such as the network going down, or the replicator's machine crashing, or the replicator running out of memory): if it can't complete a replication, then as far as it is concerned, it hasn't yet done it. So the next time it runs, it will try the replication again.

This design principle helps to meet requirement 1.

3. Replication algorithm

See [NB 2000-11-24] for the design decision that matches the steps in this algorithm to method calls in the defect_tracker class [RB 2000-10-16, 7.1].

Get the set of changed jobs in Perforce and the set of new and changed issues in the defect tracking system. The latter involves looking for new, changed and deleted filespec and fix records as well, and getting the issue with which the record is associated.
For each corresponding pair (job, issue):
1. Decide whether to replicate from Perforce to the defect tracker; replicate from the defect tracker to Perforce; or do nothing, as follows:
  
  If the job has changed but not the issue, replicate from Perforce to the defect tracker.
  
  If the issue has changed but not the job, replicate from the defect tracker to Perforce.
  
  If neither the job nor the issue has changed, do nothing.
  
  If both have changed, apply a policy function to decide what to do. The administrator might set up a rule here that says, "Perforce is always right" or "the defect tracker is always right", or something more complex. The default rule is to overwrite the job with the issue.
2. To replicate from Perforce to the defect tracker:
  1. Get all the fixes and filespecs for the job and the issue (the filespecs for the job are in the P4DTI-filespecs field in the job; see section-4.1).
  2. If the defect tracker supports workflow transitions, choose an appropriate transition [GDR 2000-11-17]:
    1. Has the job status changed? If not, the transition is some default "update" transition, as specified in the defect tracking object's configuration.
    2. Otherwise, apply some function to all the data to work out what workflow transition to apply. This will typically be a function of the old state and the new state.
      
      This function may not always be able to get it right, since it may not be able to work out the intention of the user who edited the job in Perforce, or the edits they made may not correspond to a transition, or multiple changes have happened in Perforce before the replicator noticed, and the sum of these changes doesn't correspond to any valid transition; see section 2.6.
  3. Apply the transition to the issue in the defect tracker so that it matches the job in Perforce. If the defect tracker has no transitions, just update the issue.
  4. If the transition or update succeeded, update the fixes and filespecs in the defect tracker (if necessary) to match those in Perforce.
3. To replicate from the defect tracker to Perforce:
  1. Get all the fixes and filespecs for the job and the issue.
  2. Update the fixes in Perforce so that they match the fixes in the defect tracker.
  3. Update the job in Perforce so that it matches the issue and its associated filespecs.

4. Additions to the Perforce jobspec

The replicator requires the fields below to be present in the Perforce jobspec. The field numbers for these added fields are not important. They are presented here for illustration only.

4.1. P4DTI-filespecs

Fields: 191 P4DTI-filespecs text 0 optional

The P4DTI-filespecs field contains a list of filespecs that are associated with the job, one per line.

4.2. P4DTI-issue-id

Fields: 192 P4DTI-issue-id word 0 required Preset-P4DTI-issue-id: None

The P4DTI-issue-id field contains a string from which the defect tracker object can deduce the identifier of the corresponding issue, or None if the job is not replicated.

4.3. P4DTI-rid

Fields: 193 P4DTI-rid word 32 required Preset-P4DTI-rid: None

The P4DTI-rid field contains the identifier of the replicator that replicates this job, or None if the if job is not replicated.

4.4. P4DTI-user

Fields: 194 P4DTI-user word 32 always Preset-P4DTI-user: $user

The P4DTI-user field is the Perforce user who last modified the job.

5. Configuration

See [RB 2000-11-20a], [RB 2000-11-20b], [RB 2000-11-20c], [RB 2000-11-28a], [RB 2000-11-28b] for the original design decisions for the P4DTI configuration.

5.1. Configuration architecture

Each user function of the P4DTI corresponds to a Python script. There are three of these:

`check.py`	Check that the defect tracker database and the Perforce jobs system are consistent and produce a report stating any inconsistencies.
`refresh.py`	Delete all Perforce jobs and then replicate all issues modified since the start date from the defect tracking system to Perforce [RB 2000-11-29].
`run.py`	Start the replicator.

Each of these scripts has the same basic pattern: it imports the object r (the replicator object) from the init.py module, and calls a method of that object. For example, here's the complete executable contents of the run.py script:

from init import r r.run()

The init.py module has three functions:

To construct an object dt to represent the defect tracker;
To construct an object r to represent the replicator; and
To set up the Perforce jobspec so that issues can be replicated.

5.2. Configuration dataflow

Figure 2 shows the dataflow during configuration of the P4DTI.

Figure 2. Dataflow during configuration

Diagram of dataflow during configuration

This is what happens during initialization:

The configuration comes initially from the user settings in config.py module. This module is accessed by importing config_loader.py, which contains code to check the P4DTI_CONFIG environment variable.

from config_loader import config
The dt_name configuration parameter names a defect tracker. This name, converted to lowercase, is used to import the configuration generator for that defect tracker, which must be called configure_defect_tracker.py, and the defect tracker module.

dt_name = string.lower(config.dt_name) configuration_module = __import__('configure_' + dt_name) dt_module = __import__('dt_' + dt_name)
The config module is passed to the configuration function in the configuration generator module, which returns a Perforce jobspec and a revised configuration module.

jobspec, config = configure_module.configuration(config)
The revised configuration is passed to the defect tracker constructor to build the dt object representing the defect tracker.

dt = dt_module.__dict__['dt_' + dt_name](config)
The revised configuration is used to build a Perforce interface.

p4_interface = p4.p4( client = ('p4dti-%s' % socket.gethostname()), client_executable = config.p4_client_executable, password = config.p4_password, port = config.p4_port, user = config.p4_user, logger = config.logger)
The Perforce jobspec is updated.

p4_interface.run('jobspec -i', jobspec)
The revised configuration is passed to the replicator constructor to build the object r representing the replicator. The replicator is initialized (this also initializes the defect tracker by calling its init() method).

r = replicator.replicator(dt, p4_interface, config) r.init()

The replicator object is now ready for whatever action is next requested (checking consistency, refreshing Perforce jobs, or starting the replicator).

A. References

[GDR 2000-09-07]	"Replicator design notes 2000-09-07" (e-mail message); Gareth Rees; Ravenbrook Limited; 2000-09-08 15:59:19 GMT.
[GDR 2000-10-04]	"Design decision: starting replication" (e-mail message); Gareth Rees; Ravenbrook Limited; 2000-10-04 16:31:12 GMT.
[GDR 2000-11-17]	"Re: TeamShare/Perforce integration planning meeting, 2000-08-16" (e-mail message); Gareth Rees; Ravenbrook Limited; 2000-11-17 17:00:25 GMT.
[NB 2000-11-24]	"Defect tracker design discussion" (e-mail messages); Nick Barnes; Ravenbrook Limited; 2000-11-24.
[RB 2000-08-10]	"Replication mapping design notes" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-08-10 11:27:03 GMT.
[RB 2000-08-30]	"Design document structure" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-08-30.
[RB 2000-10-05]	"P4DTI Project Design Document Procedure"; Richard Brooksby; Ravenbrook Limited; 2000-10-05.
[RB 2000-10-16]	"Perforce Defect Tracking Integration Integrator's Guide"; Richard Brooksby; Ravenbrook Limited; 2000-10-16.
[RB 2000-11-08]	"Replicator architecture design discussion, 2000-11-01/02"; Richard Brooksby; Ravenbrook Limited; 2000-11-08.
[RB 2000-11-20a]	"Automatic configuration design" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-11-20.
[RB 2000-11-20b]	"Re: Automatic configuration design" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-11-20.
[RB 2000-11-20c]	"Re: Automatic configuration design" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-11-20.
[RB 2000-11-28a]	"Case of state names" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-11-28.
[RB 2000-11-28b]	"Distinguished state to map to 'closed'" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-11-28.
[RB 2000-11-28c]	"Abolishing the role of resolver" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-11-28.
[RB 2000-11-29]	"Setting items to be replicated" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-11-29.
[Seiwald 2000-09-11]	"Re: Is 'p4 counter logger 0' idempotent?" (e-mail message); Christopher Seiwald; Perforce Software; 2000-09-11 16:45:04 GMT.

B. Document History

2000-09-13	GDR	Created based on [RB 2000-08-10], [RB 2000-08-18], [RB 2000-08-30] and [GDR 2000-09-07].
2000-09-14	GDR	Improved definition of `P4DTI-filespecs` and `P4DTI-status` fields.
2000-09-17	GDR	Added some references to requirements.
2000-10-04	GDR	Added design decision from [GDR 2000-10-04].
2000-10-10	GDR	Made changes identified in review on 2000-10-09.
2000-10-15	GDR	Applied design decision on conflict resolution [GDR 2000-10-05].
2000-12-01	RB	Updated references to the "SAG" to "AG" since it's now called the Administrator's Guide.
2001-03-02	RB	Transferred copyright to Perforce under their license.
2001-03-13	GDR	Deleted the recording of conflicts and the need for manual conflict resolution. Conflict resolution is always immediate [RB 2000-11-28c].
2001-03-21	GDR	Added configuration architecture and design.
2001-03-22	GDR	Added design principle: no internal state.
2002-01-28	GDR	Improved design for determining changed entities and job modifier to take advantage of Perforce 2002.1 accurately recording who last modified a job.

Redistribution and use of this document in any form, with or without modification, is permitted provided that redistributions of this document retain the above copyright notice, this condition and the following disclaimer.

This document is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright holders and contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this document, even if advised of the possibility of such damage.

$Id: //info.ravenbrook.com/project/p4dti/master/design/replicator/index.html#41 $

Ravenbrook / Projects / Perforce Defect Tracking Integration / Master Product Sources / Design