Ravenbrook / Projects / Perforce Defect Tracking Integration / Version 1.0 Product Sources / Design

Perforce Defect Tracking Integration Project

Replicator design

Gareth Rees, Ravenbrook Limited, 2000-09-13

1. Introduction

This document describes the design, data structures and algorithms of the Perforce defect tracking integration's replicator daemon.

The purpose of this document is to make it possible for people to maintain the replicator, and to adapt it to work on new platforms and with new defect tracking systems, to meet requirements 20 and 21.

This document will be modified as the product is developed.

The readership of this document is the product developers.

This document is not confidential.

2. Design notes

2.1. Structure of the replicator

For each pair consisting of a defect tracking server and Perforce server where there is replication going on there is a replicator object. The replicator object in Python belongs to the replicator class in the replicator module, or to a subclass.

These replicator objects do not communicate with each other. This makes their design and implementation simple. (There may be a loss of efficiency by having multiple connections to a defect tracking server, or making multiple queries to find cases that have changed, but I believe that the gain in simplicity is worth the risk of loss of performance.)

The replicator object is completely independent of the defect tracking system: all defect tracking system specific code is in a separate object. This makes it easier to port the integration to a new defect tracking system (requirement 21).

Each replicator object is paired with a defect tracker object, which represents the connection to the defect tracking system. The defect tracker object in Python belongs to a subclass of the defect_tracker class in the replicator module.

The defect tracker object will in turn use some interface to connect to the defect tracking system. This may be an API from the defect tracking vendor, or a direct connection to the database.

The structure of the replicator is illustrated in figure 1.

Figure 1. The replicator structure
Diagram of
the replicator structure

2.2. Replicator identifier

Each replicator has a unique identifier. This is a string of up to 32 characters that matches the syntax of an identifier in C (only letters, digits and underscores, must start with a letter or underscore). The replicator identifier can be used as a component of other identifiers where it is necessary to distinguish between different replicators. The replicator identifier makes it possible to support organizations with multiple defect tracking servers and/or multiple Perforce servers (requirements 96, 97 and 98).

When the integration is installed, the administrator must extend the Perforce jobspec P4DTI-rid field which contains the identifier of the replicator which replicates that job (see section 4.3). The integration must extend the defect tracking system's issue table the first time it runs with a field that will contain a replicator identifier. This field will not be filled in until the issue is selected for replication; see section 2.9.

A consequence of this design is that each job is replicated to one and only one issue (and vice versa).

2.3. Perforce server identifier

Each Perforce server has a unique identifier. This is a string of up to 32 characters that matches the syntax of an identifier in C (only letters, digits and underscores, must start with a letter or underscore). The server identifier makes it possible to support organizations with multiple Perforce servers (requirements 97 and 98).

The integration must extend the defect tracking system's issue table the first time it runs with a field that will contain the Perforce server identifier of the server the issue is replicated to. This field will not be filled in until the issue is selected for replication; see section 2.9.

Note that the design of the replicator means that each replicator corresponds to exactly one Perforce server. However, this is an incidental feature of the implementation, not a principle on which you can depend. So make sure you always bear in mind the possibility that a replicator may replicate to multiple Perforce servers.

At initialization time, each defect tracker object will provide to the defect tracking system the Perforce servers it supports replication to (for example, it may put this information in a table in the defect tracking database). This allow the defect tracking system to present the name of the server that each issue is replicated to.

2.4. Mapping between jobs and issues

The replicator needs to find the issue corresponding to a job and the job corresponding to an issue. At installation time, the administrator must extend the Perforce jobspec with a P4DTI-issue-idfield which, if the job is being replicated, will contain a string from which the defect tracker object can deduce the identifier of the corresponding issue (see section 4.2). (I expect this to be issue identifier itself, if it is a string, or a string conversion, if it is a number, but any string representation is allowed.) The integration must extend the defect tracking system's issue table the first time it runs with a field that will contain the name of the corresponding job.

The choice of jobname for new jobs that correspond to issues is up to the defect tracker object.

We don't use the jobname to represent the mapping, because we need to support migration from just using jobs without renaming the existing jobs, to meet requirement 95.

It may not even be a good idea to create jobs with special names because it would look like we're using the name, and we're not. We don't want to confuse users or administrators or developers who won't read this paragraph. On the other hand, for people who use both systems, it would be useful to be able to see at a glance which issue a job corresponds to.

2.5. Storing associated filespecs in Perforce

Associated filespecs are stored in a field in the job.

At installation time, the administrator must create a P4DTI-filespecs field in the job to store the associated filespecs; see section 4.1.

2.6. Identifying changed entities in Perforce

In Perforce, changed entities are identified using the p4 logger command, available in Perforce 2000.1. The logger must be started by setting the logger counter to zero with p4 counter logger 0. It is a bad idea to do this more than once; see [Seiwald 2000-09-11].

The output of p4 logger gives a list of changed changelists and jobs that looks like this:

435 job job000034 436 change 1234 437 job job000016

Changes to the fixes relation show up in the logger output as changes to the associated changelist and job. Changes to the associated filespecs relation show up as changes to jobs.

It is necessary to distinguish changes made by users of Perforce from changes replicated from the defect tracking system, so that these changes are not replicated back again (this would not necessarily be harmful, but it would double the likelihood of inconsistency, since there would be twice as many replications, and so possibly fail to meet requirement 1). The replicator uses the P4DTI-user field of the job to determine who modified the job most recently (see section 4.5).

If this user is the replicator, then the job need not be replicated again. Note that each replicator must therefore have a unique user id in Perforce. By default, this user id is P4DTI- plus the replicator identifier. I recommend that users of the integration stick to this convention.

However, the scheme outlined above doesn't work, since a job can be changed without actually editing it: its status can be changed by fixing it. In this case it is not possible to tell who last modified the job. As of 2000-10-10 I don't know a way to do this: see job000016.

Changes to changelists are replicated from Perforce to the defect tracking system only, so there is no need to make this distinction.

If someone edits the same job twice in Perforce before the replicator can replicate it, then the replicator cannot determine what the intermediate state was. This has consequences when the defect tracker has a workflow model: suppose that a job status changes from A to B (which corresponds to transition T) and then from B to C (which corresponds to transition U). But the replicator sees only the status change from A to C, which doesn't correspond to any transition. So the workflow can't be consistently recorded in the defect tracker. There's not much I can do about this: the intermediate state of the job is not recorded in Perforce, except in the journal.

2.7. Identifying deleted entities

The integration does not support deletion of jobs and defect tracking issues. Deletion of jobs and issues is a bad idea anyway, since you lose information about the history of activity in the system.

In Perforce we unfortunately have no way of discovering that a fix has been deleted. See job000013. This is not a fatal problem: if the job is changed next, the replicator will discover that the fix is missing and replicate the deletion. However, if the issue is changed next, then the replicator will restore the deleted fix.

If the defect tracking system permits deletion of fixes, it should mark the fix record as deleted rather than actually deleting it. The replicator should do the deletion in the defect tracker's database when it has replicated the deletion to Perforce. This is not implemented yet.

Because of the possibility of deletion of fixes, the replicator fetches all fix records in both systems when replicating an issue; it computes the differences between the lists and replicates the additions, updates and deletions.

2.8. Recording conflicts

Conflicts occur when a job is modified simultaneously with the corresponding issue, or when a job cannot be replicated to an issue because data is invalid or permissions are lacking.

It is necessary to record that the job and the issue are conflicting, otherwise the replicator might forget the conflict and overwrite one of the modified entities.

In Perforce, the replicator sets the P4DTI-status field to "ok" when there is no (known) conflict, and "conflicting" when there is a conflict (see section 4.4).

In the defect tracking system, the replicator can set a status field in the issue.

The replicator reports conflicting entities when the conflict is discovered. Thereafter it ignores them. The administrator of the integration must resolve the conflict using a procedure that will be documented in the AG.

2.9. Starting replication

The replicator initiates the replication of unreplicated issues by applying a policy function, which is configurable by the administrator of the replication. We want to support organizations which have multiple Perforce servers (requirement 96). It may not be possible to tell which Perforce server an issue should be replicated to until some point in the workflow (perhaps when the issue is assigned to a project or to a developer). So each replicator should examine each unreplicated issue each time that issue changes, and apply the policy function to determine if the issue should be replicated.

Justification for this decision was given in [GDR 2000-10-04] and is repeated here:

There are three solutions to the problem of getting started with replication (that is, deciding which cases to replicate, and which Perforce server to replicate them to, when there are multiple Perforce servers):

The replicator identifier and Perforce server fields in the case is editable by the TeamTrack user, who picks the appropriate Perforce server at some point.
TeamTrack picks a replicator and Perforce server at some point, by applying a policy function configured by the administrator of the integration.
Each replicator identifies cases that are not yet set up for replication and decides whether it should replicate them, by applying a policy function configured by the administrator of the integration.

Solution 1 is the least appropriate. The TeamTrack user may not have the knowledge to make the correct choice. The point of the integration is to make things easier for users, and selection of Perforce server should be automated if possible. By exposing the Perforce server field to the user, we run into other difficulties: should the field be editable? What if the user makes a mistake? Best to avoid these complexities.

2 and 3 are similar solutions, but 3 keeps the integration configuration in one place (in the replicator) where it is easier to manage than if it is split between the replicator and TeamTrack. It is also the solution that depends least on support from the defect tracking vendor.

2.10. Conflict resolution

The administrator of the integration needs to be able to resolve conflicts. The replicator needs to record information about what to do when jobs and issues are conflicting.

The solution to this is to have an action field in the job and the issue (in the job this is P4DTI-action; see section 4.5). The action field takes four values: "replicate", "wait", "keep" and "discard".

When everything is going correctly, the action field contains "replicate".

When a conflict is discovered (by the replicator) between the job and the issue, the replicator sets both action fields to "wait". This causes the replicator to wait for the conflict to be resolved by the administrator.

The administrator can resolve the conflict by selecting the issue or the job or both and setting the action field to "keep" or "discard".

Note: if the organization implements an automatic policy such as "defect tracker wins", the actions remain "replicate" and "replicate". No manual intervention is required.

Combinations of actions have the following effect:

Issue action	Job action	Action taken by replicator
replicate	replicate	Situation normal; replicate changes. When a conflict is discovered, alert administrator and set actions to "wait".
wait	wait	Do nothing (wait for intervention).
keep	wait	Overwrite job with issue; set actions to "replicate".
keep	discard	ditto
wait	discard	ditto
wait	keep	Overwrite issue with job; set actions to "replicate".
discard	keep	ditto
discard	wait	ditto
Any other combination.		An error. Alert adminstrator and set actions to "wait".

See [GDR 2000-10-05] for the justification behind this design.

3. Replication algorithm

Get the set of changed jobs in Perforce and the set of new and changed issues in the defect tracking system. The latter involves looking for new, changed and deleted filespec and fix records as well, and getting the issue with which the record is associated.
For each corresponding pair (job, issue):
1. Decide whether to replicate from Perforce to the defect tracker; replicate from the defect tracker to Perforce; report a conflict; or do nothing.
  
  If either job or issue has status set to "keep" or "discard", use the table in section 2.10 to decide which way to replicate. Otherwise:
  
  If the job has changed but not the issue, replicate from Perforce to the defect tracker.
  
  If the issue has changed but not the job, replicate from the defect tracker to Perforce.
  
  If neither the job nor the issue has changed, do nothing.
  
  If both have changed, apply a policy function to decide what to do. The administrator might set up a rule here that says, "Perforce is always right" or "the defect tracker is always right", or something more complex. The default rule is to alert the administrator and record that the job and the issue conflicting.
2. To replicate from Perforce to the defect tracker:
  1. Get all the fixes and filespecs for the job and the issue (the filespecs for the job are in the P4DTI-filespecs field in the job; see section-4.1).
  2. If the defect tracker supports workflow transitions, choose an appropriate transition:
    1. Has the job status changed? If not, the transition is some default "update" transition, as specified in the defect tracking object's configuration.
    2. Otherwise, apply some function to all the data to work out what workflow transition to apply. This will typically be a function of the old state and the new state.
      
      This function may not always be able to get it right, since it may not be able to work out the intention of the user who edited the job in Perforce, or the edits they made may not correspond to a transition, or multiple changes have happened in Perforce before the replicator noticed, and the sum of these changes doesn't correspond to any valid transition; see section 2.6.
  3. Apply the transition to the issue in the defect tracker so that it matches the job in Perforce. If the defect tracker has no transitions, just update the issue.
  4. If the transition or update failed, mark the job and issue as conflicting and report the failure.
  5. If the transition or update succeeded, update the fixes and filespecs in the defect tracker (if necessary) to match those in Perforce. If this fails, mark the job and issue as conflicting and report the failure.
  6. If everything succeeded, mark all entities involved in the replication as being up to date.
3. To replicate from the defect tracker to Perforce:
  1. Get all the fixes and filespecs for the job and the issue.
  2. Update the fixes in Perforce so that they match the fixes in the defect tracker.
  3. Update the job in Perforce so that it matches the issue and its associated filespecs.
4. To report a conflict, mark the issue and the job as being conflicting. Report the conflict to the administrator of the integration in the manner specified in the replicator's configuration.

4. Additions to the Perforce jobspec

The field numbers for these added fields are not important. They are presented here for illustration only.

4.1. P4DTI-filespecs

Fields: 110 P4DTI-filespecs text 0 default

The P4DTI-filespecs field contains a list of filespecs that are associated with the job, one per line.

4.2. P4DTI-issue-id

Fields: 111 P4DTI-issue-id word 0 required Preset-P4DTI-issue-id: None

The P4DTI-issue-id field contains a string from which the defect tracker object can deduce the identifier of the corresponding issue, or None if the job is not replicated.

4.3. P4DTI-rid

Fields: 112 P4DTI-rid word 32 required Preset-P4DTI-rid: None

The P4DTI-rid field contains the identifier of the replicator that replicates this job, or None if the if job is not replicated.

4.4. P4DTI-status

Fields: 113 P4DTI-action select 32 required Values-P4DTI-status: replicate/wait/keep/discard Preset-P4DTI-status: replicate

The P4DTI-action field gives the action that the replicator should take for this issue. The value replicate means to replicate as normal; wait means that the replicator must do nothing and wait for the status to change (the replicator will set the status to wait when a conflict is detected); keep means that the replicator must keep the job and replicate it to the defect tracker; discard means that the replicator must discard the contents of the job and replace it with the replicated defect tracking issue. See [GDR 2000-10-05].

4.5. P4DTI-user

Fields: 114 P4DTI-user word 32 always Preset-P4DTI-user: $user

The P4DTI-user field is the Perforce user who last modified the job.

A. References

[RB 2000-08-10]	"Replication mapping design notes" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-08-10 11:27:03 GMT.
[RB 2000-10-05]	"P4DTI Project Design Document Procedure"; Richard Brooksby; Ravenbrook Limited; 2000-10-05.
[RB 2000-08-30]	"Design document structure" (e-mail message); Richard Brooksby; Ravenbrook Limited; 2000-08-30.
[GDR 2000-09-07]	"Replicator design notes 2000-09-07" (e-mail message); Gareth Rees; Ravenbrook Limited; 2000-09-08 15:59:19 GMT.
[GDR 2000-10-04]	"Design decision: starting replication" (e-mail message); Gareth Rees; Ravenbrook Limited; 2000-10-04 16:31:12 GMT.
[GDR 2000-10-05]	"Design decision: conflict resolution"; Gareth Rees; Ravenbrook Limited; 2000-10-05.
[Seiwald 2000-09-11]	"Re: Is 'p4 counter logger 0' idempotent?" (e-mail message); Christopher Seiwald; Perforce Software; 2000-09-11 16:45:04 GMT.

B. Document History

2000-09-13	GDR	Created based on [RB 2000-08-10], [RB 2000-08-18], [RB 2000-08-30] and [GDR 2000-09-07].
2000-09-14	GDR	Improved definition of `P4DTI-filespecs` and `P4DTI-status` fields.
2000-09-17	GDR	Added some references to requirements.
2000-10-04	GDR	Added design decision from [GDR 2000-10-04].
2000-10-10	GDR	Made changes identified in review on 2000-10-09.
2000-10-15	GDR	Applied design decision on conflict resolution [GDR 2000-10-05].
2000-12-01	RB	Updated references to the "SAG" to "AG" since it's now called the Administrator's Guide.
2001-03-02	RB	Transferred copyright to Perforce under their license.

Redistribution and use of this document in any form, with or without modification, is permitted provided that redistributions of this document retain the above copyright notice, this condition and the following disclaimer.

This document is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright holders and contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this document, even if advised of the possibility of such damage.

$Id: //info.ravenbrook.com/project/p4dti/version/1.0/design/replicator/index.html#3 $

Ravenbrook / Projects / Perforce Defect Tracking Integration / Version 1.0 Product Sources / Design