Perforce Defect Tracking Integration Project

Perforce Defect Tracking Integration Integrator's Guide

Gareth Rees, Ravenbrook Limited, 2000-10-16

1. Introduction
2. Understanding the P4DTI
3. What you need to do
4. Defect tracker database schema extensions
5. The Python interface to the defect tracker
6. The defect tracker interface module
7. Configuration
8. Providing a defect tracker interface to Perforce relations
9. Adapting the configuration module
10. Adapting the manuals
11. Making your work available to the community
A. References
B. Document History

1. Introduction

This is the Perforce Defect Tracking Integration (P4DTI) 0.5 Integrator's Guide. It explains how a developer could extend the integration kit to work with defect tracking systems that aren't supported by the standard distribution.

You should not be extending the Perforce Defect Tracking Integration 0.5. It is a beta release, not intended for extension. See the project overview for information about planned releases.

2. Understanding the integration

This section gives an overview of the architecture and design of the P4DTI, with references to the documents that provide more detail.

I assume that you're familiar with the jobs subsystem of Perforce, and the relationship between jobs, fixes and changelists. See Chapter 10, "Job Tracking" in the Perforce Command Line User's Guide [Perforce 2000-10-09, chapter 10].

The integration uses a replication architecture; see the Perforce Defect Tracking Integration Architecture. A replicator process repeatedly polls two databases (Perforce and the defect tracker) and copies entities from one to the other to make and keep them consistent.

It replicates three relations:

Defect tracking issues are replicated to and from Perforce jobs. The replication goes in both directions, but the Perforce jobs should be considered as a subsidiary copy of the real data in the defect tracker. This means that when the two databases differ (for example, because they have been changed simultaneously) the defect tracker is considered to be definitive.
Changelist descriptions are replicated from Perforce to the defect tracker.
Fixes (links between changelists and issues) are replicated in both directions. Neither database is definitive.

The replicator is designed to be highly independent of both Perforce and the defect tracker. It runs as a separate process and uses public protocols to access both databases. It doesn't need any special support from either system.

The replicator is written in the interpreted programming language Python, a portable, stable, readable and open programming language.

The replicator program itself is split into two major components:

A portable module that interfaces to Perforce, runs the replication algorithm and reports failures by e-mail. See Replicator design for the design.
A defect tracker interface module that interfaces to the defect tracker. It is responsible for fetching and updating records in the defect tracker database. See Section 6. For each new integration, you need to write such a module.

3. What you need to do

This section gives an overview of the work required in developing a new integration.

You must provide full implementations of these components:

A documented design for extensions for the defect tracker database schema (Section 4);
A Python interface to the defect tracker (Section 5);
A defect tracker interface module (Section 6);
A configuration generator (Section 7).

You should provide a defect tracker interface to the Perforce relations, if possible (Section 8).

You must adapt or extend these components:

The configuration module config.py (Section 9).
The Administrator's Guide (Section 10);
The User's Guide (Section 10);

All other components are designed to be portable between defect trackers. If your defect tracker cannot be made to work without changing the replicator module, then there is a defect in the replicator module which should be analyzed and fixed. In this case, please contact [who? GDR 2000-12-10].

Once all the work outlined above is completed and tested to your satisfaction, you must make your work available to the community so that others can benefit from your efforts. See Section 11.

We estimate that 8 weeks of effort are required to develop, test, document and release a new integration.

4. Defect tracker database schema extensions

You must extend the database schema to support two new relations: the changelist relation (Section 4.1) and the fixes relation (Section 4.2). You may need to add other relations to the database, to store the replicator state and configuration (Section 4.3). These schema extensions must be documented so that users of your integration can implement database queries and reports that use this data, to meet requirement 5.

These relations should be stored in separate tables if possible, to most easily support queries and reporting using standard database tools. However, some defect trackers may not support this -- for example, TeamTrack 4 doesn't support the addition of tables to its database schema, so the TeamTrack schema extensions squash these relations into a single table, using a type field to distinguish them.

The design must support multiple replicators replicating from a single defect tracker, and support a single replicator replicating to multiple Perforce servers from one defect tracker. To support this, each relation includes a replicator identifier which identifies the replicator which is handling replication for that record, and a Perforce server identifier, which is a short identifier for the Perforce server that the record is replicated with.

For examples, see the TeamTrack database schema extensions for integration with Perforce and the Bugzilla database schema extensions for integration with Perforce.

4.1. Changelists

The changelist relation has these fields:

Field contents	Field type
Replicator identifier.	`char(32)`
Perforce sever identifier.	`char(32)`
Change number.	`int`
User who created the change.	A foreign key reference to the defect tracker's user relation giving the user who created or submitted the change.
Change status.	An enumeration with two values: pending or submitted.
Date the change was last modified.	A date and time.
Change description.	Text, unlimited in length.
Client from which the change was submitted.	`char(1024)` or `varchar(1024)` since most client names are short.

4.2. Fixes

The fixes relation has these fields:

Field contents	Field type
Replicator identifier.	`char(32)`
Perforce sever identifier.	`char(32)`
Issue.	A foreign key reference to the defect tracker's issue relation, giving the issue which is fixed by the change.
Change number	`int`
Date the fix was last modified.	A date and time.
User who created the fix.	A foreign key reference to the defect tracker's user relation, giving the user who last modified the fix.
Status the job was/will be fixed to.	`char(1024)`, or `varchar(1024)` since most job statuses are short.
Client from which the fix was made.	`char(1024)` or `varchar(1024)` since most client names are short.

4.3. Replicator configuration and state

By design, the replicator has no internal state, so if you need to store information, such as a record of which changes have been replicated (see Section 4.4) you must store it in the defect tracker's database.

The replicator also needs to pass information to the defect tracker, to support an interface from the defect tracker to Perforce, as described in Section 8. There are two configuration parameters which should be communicated to the defect tracker by storing them in a configuration table: changelist_url and p4_server_description.

4.4. Discovering what's changed

The replicator works by repeatedly polling the databases, so you must provide a way to tell it which issues have changed since the last time it polled. Here are some common strategies:

If the defect tracker has a changes table which records the history of changes to issues, then store a record number in the replicator state that gives the last record in the changes table that has been replicated. (We used this approach in both the TeamTrack and Bugzilla integrations [references needed GDR 2000-12-10].)
If the defect tracker has a last modified date field in the issue table, store the value of this field at the point when the replicator was last replicated. Then you can fetch the changed issues by looking for issues whose last modified date is greater than the last replicated date. This is likely to be less efficient than solution 1.
Modify the defect tracker so that it supports solution 1 or 2.
If all else fails, you could store a "shadow" table of issues, containing copies of the issue records as they were when last modified. Then you can find changed issues by finding differing corresponding records. This is likely to be very inefficient.

4.5. Distinguishing replicated changes from other changes

The replicator needs to distinguish the changes it made from changes made by other users of the defect tracker. Otherwise it will attempt to replicate its own changes back to Perforce. This won't actually end up in an infinite loop of replication, since when it replicates back it will discover that there are no changes to be made, and so not actually do anything. However, this double replication gives twice the opportunity for conflicts, and hence annoying e-mail messages for the users of the integration (see job?).

Here are some strategies:

The defect tracker may have separate concepts of "logged in user" and "user who is making the change". In this case, make a special user to represent the replicator and have the replicator log in as that user. The replicator's changes show up with logged in user being the replicator user; all other changes need to be replicated. We use this strategy in the TeamShare API [reference needed GDR 2000-12-10].
You could store a table listing the changes that were made by the replicator. Any other changes need to be replicated. We use this strategy in the Bugzilla integration [reference needed GDR 2000-12-10].
If the defect tracker has a last modified date field in the issue table, store the value of this field at the point when the replicator was last replicated. Then an issue has been changed by someone else if its last modified date differs from the last replicated date.

4.6. Perforce users who don't have licences in the defect tracker

The replicator replicates user fields in issues, changelists and fixes (for example, the owner of an issue or the user who submitted a changelist) by applying a user translation function. When a defect tracker user has no licence in Perforce, the translation function can simply use that user's defect tracker login name, since Perforce doesn't validate user fields in jobs. But if a Perforce user has no licence in the defect tracker, the translator needs to do something with them. Generally it's not possible to create new users in the defect tracker, because of licencing restrictions. For issues your defect tracker interface can simply refuse to replicate when a Perforce user has no licence in the defect tracker. But you have to do something when replicating fixes and changelists. In the TeamTrack integration we map unknown users to the special user 0 (representing "no user").

5. The Python interface to the defect tracker

You'll need a way for Python to read and write defect tracking records. If the defect tracker has an API of some sort, you'll need to use that; if not, you'll have to read and write the database directly, using one of the Python database interfaces. Your defect tracker interface will need to support these kinds of operations:

Get an issue record.
Update an issue record.
Get all the issues needing replication.
Get all the fixes for an issue.
Add/update/delete a fix.
Create a table.
Add a field to a table.
Get a list of the fields that make up the issue relation, together with the field types, lengths, legal values, etc.
Get a list of users, with names, userids, e-mail addresses.

We can't give you a complete or precise list of operations here; you'll have to see what's required as you implement your schema extensions (see Section 4) and defect tracker interface module (see Section 6).

For the TeamTrack integration, we used the TeamShare API to connect to the defect tracker, because this allowed us to apply TeamTrack's privilege system and database validation. We developed a Python extension module that provides an interface to the parts of the TeamShare API that we needed (only a small part of the whole API, as it happened). See the Python interface to TeamTrack code and design.

For the Bugzilla integration, there's no API: you have to understand the Bugzilla database schema and connect directly to the MySQL database. We developed a wrapper module that encapsulates the direct database operations as defect tracker oriented functions like update_bug. See bugzilla.py.

6. The defect tracker interface module

The replicator's interface to the defect tracker takes the form of a module called dt_defect_tracker.py that implements these classes:

The defect tracker interface itself: a subclass of replicator.defect_tracker.
Defect tracker issues: a subclass of replicator.defect_tracker_issue.
Defect tracker fixes: a subclass of replicator.defect_tracker_fix.
A translator between users in the defect tracker and Perforce: a subclass of replicator.translator.
A translator between dates in the defect tracker and Perforce: a subclass of replicator.translator.
Any other translator classes that will be needed to translate fields in the issue relation (for example, enumerated fields or multi-line text fields).

The signatures of these classes are documented (very tersely) in replicator.py. For examples, see the TeamTrack interface, dt_teamtrack.py, and the Bugzilla interface, dt_bugzilla.py.

6.1. The `replicator.defect_tracker` class

[This section needs a lot of work. GDR 2000-12-11]

all_issues(self)

Return a list of all defect tracking issues that have been modified since the "start date", and which are either being replicated by this replicator, or else not being replicated by any replicator. Each element of the list belongs to the defect_tracker_issue class (or a subclass).

changed_entities(self)

Return a triple consisting of (a) a list of the issues that have changed in the defect tracker and which are either replicated by this replicator, or are new issues that are not yet replicated; (b) a list of the changelists that have changed; and (c) a marker. Each element of the first list belongs a subclass of to the defect_tracker_issue class.

The marker will be passed to the method mark_changes_done when that is called after all the issues have been replicated. For defect trackers other than Perforce, the second list should be empty.

mark_changes_done(self, marker)

Called when all the issues returned by changed_entities have been replicated. The argument is the second element of the pair returned by the call to changed_entities. (The idea behind this is that the defect tracker interface may have some way of recording that it has considered all these issues -- perhaps by recording the last key on a changes table. It is important not to record this until it is true, so that if the replicator crashes between getting the changed issues and replicating them then we'll consider the same set of changed issues the next time round, and hopefully this will give us a chance to either replicate them correctly or else discover that they are conflicting.)

init(self)

Set up the defect tracking database for the integration. Set up issues for replication by this replicator according to the policy in the replicate_p method of the issue.

issue(self, issue_id)

Return the defect tracking issue whose identifier has the string form given by issue_id, or None if there is no such issue.

log(self, format, arguments, priority)

Write a message to the replicator's log.

replicate_changelist(self, change, client, date, description, status, user)

Replicate the changelist to the defect tracking database. Return 1 iff the changelist was changed, 0 if it was already in the database and was unchanged.

6.2. The `replicator.defect_tracker_issue` class

A defect tracker issue is conceptually a map from field name to the value for that field.

__getitem__(self, field)

Return the value for the issue field with the name field.

__str__(self)

Return a string describing the issue, suitable for presentation to a user or administrator in a report. Having several lines of the form "field name: value" should be fine.

add_fix(self, change, client, date, status, user)

Add a fix to the issue (a link with a changelist). change (an integer) is the Perforce change number; client (a string) is the Perforce client name from which the fix was made; date is the date the fix was made (in the defect tracker's format — this has been translated by the date translator); status is the status of the fix (a string) — the state the job was changed to when the fix was made; user is the user who made the fix (in the defect tracker's format — this has been translated by the user translator).

fixes(self)

Return a list of the fixes that link to this issue. Each item in the list belongs to the fix class (Section 6.3).

id(self)

Return a string that can be used to uniquely identify the issue and fetch it in future by passing it to the defect_tracker.issue() method. For example, the issue's record identifier in the issue table.

readable_name(self)

Return a human-readable name for the issue, as a string. This must be suitable for use as a Perforce jobname, and should usually be the name used to identify the issue to the users of the defect tracker.

replicate_p(self)

A policy used to set up replication for issues where replication is not yet specified. Return true if the issue should be replicated by this replicator, false if it should not.

rid(self)

Return the replicator identifier of the replicator that is in charge of replicating this issue, or the empty string if the issue is not being replicated.

setup_for_replication(self)

Set up the issue for replication. That is, record that the issue is replicated by this replicator and record any other information in the database that is needed to replicate this issue.

update(self, user, changes)

Update the issue in the defect tracker's database on behalf of user (the user is in the defect tracker's format — this has been translated by the user translator). changes is a dictionary of the changes that must be applied to the issue. The keys of the dictionary are the names of the fields that have changes; the values are the new values for those fields.

6.3. The `replicator.defect_tracker_fix` class

change(self)

Return the change number for the fix as an integer.

delete(self)

Delete the fix so that the change is no longer linked to the issue.

status(self)

Return the status of the fix as a string.

update(self, change, client, date, status, user)

Update the fix so that has the given fields. The arguments are the same as for the defect_tracker_issue.add_fix() method.

6.4. The `replicator.translator` class

Each translator class translates values of a particular type between two defect trackers, called 0 and 1. Conventionally, defect tracker 1 is Perforce, but we haven't limited the design by presuming that it is.

Every defect tracker must provide translators for user fields and date fields. The TeamTrack integration also provides translators for multi-line text fields, state fields, single select fields and foreign key fields.

translate_0_to_1(self, value, dt0, dt1, issue0=None, issue1=None)

Translate value from defect tracker 0 to defect tracker 1. dt0 and dt1 are replicator.defect_tracker instances representing the two defect trackers. If the issue0 and issue1 parameters are not None then they are the issues between which the field is being translated.

[Need more explanation here. These methods could do with better names! GDR 2000-12-11]

[The user translator is a little special. It needs to handle the unknown_users() method. See replicator.py and examples in existing integrations. NB 2001-01-23].

translate_1_to_0(self, value, dt0, dt1, issue0=None, issue1=None)

As above, but translates in the other direction.

6.5. Logging and error handling

[Section not written yet. And anyway the error handling needs to be improved before being documented. See job000030, job000060 and job000065. GDR 2000-12-11]

7. Configuration

7.1. Configuration architecture

The various functions of the integration are executed using a set of Python scripts:

`check.py`	Check that the defect tracker database and the Perforce jobs system are consistent and produce a report stating any inconsistencies.
`refresh_perforce.py`	Delete all Perforce jobs and then replicate all issues from the defect tracking system.
`run.py`	Start the replicator.

Each of these scripts has the same basic pattern: it imports the object r (the replicator object) from the init.py module, and calls a method of that object.

The init.py module has three functions:

To construct an object dt to represent the defect tracker (see section 6.1);
To construct an object r to represent the replicator; and
To set up the Perforce jobspec so that issues can be replicated.

The init.py module gets the configuration from the config.py module, which is essentially a list of assignments to variables documented in Section 5.2 of the Administrator's Guide. In particular, the dt_name parameter gives the name of the defect tracker. The init.py module uses the dt_name parameter to select a configuration generator and to pass it the appropriate parameters.

Each defect tracker has a configuration generator, called configure_defect_tracker.py. This module provides a function configure that takes as arguments certain configuration parameters and returns a tuple containing four pieces of configuration:

A defect tracker configuration dictionary. This will be passed to the constructor for the dt_defect_tracker class described in Section 6.1 to construct the dt object.
A Perforce interface configuration dictionary. This will be passed to the Perforce interface constructor. See Section 7.2.
A replicator configuration dictionary. This will be passed to the replicator constructor to construct the r object. See Section 7.3.
A Perforce jobspec suitable for passing to p4 -G jobspec -i. See Section 7.4.

Figure 1 shows the dataflow during configuration of the integration.

Figure 1. Dataflow during configuration

Diagram of dataflow during configuration

For examples, see the TeamTrack configuration generator, configure_teamtrack.py, and the Bugzilla configuration generator, configure_bugzilla.py.

The goal for the configuration generator is to ensure that the configurations for the defect tracker, the replicator and Perforce are all consistent. For example, the types, lengths and legal values for the fields in the issue relation in the defect tracker must be compatible with the replicated_fields structure in the replicator configuration and with the fields in the Perforce jobspec.

7.2. Perforce interface configuration

[Section not written yet. GDR 2000-12-11]

7.3. Replicator configuration

[Section not written yet. GDR 2000-12-11]

7.4. Perforce jobspecs

[This needs a lot of work. GDR 2000-12-11]

This section describes the fields that need to be added to the Perforce jobspec to support the integration. See also Chapter 5, "Customizing Perforce: Job Specifications", in the Perforce user's guide [Perforce 2000-10-11, Chapter 5].

These fields must be added to the Perforce jobspec. It's not essential that the field numbers be as shown, but we recommend that you keep them the same if possible. The numbers are high (the largest legal field number is 199) so that they appear at the end of the job form where they are nicely out of the way.

Fields: 190 P4DTI-filespecs text 0 default 191 P4DTI-action select 32 required 192 P4DTI-rid word 32 required 193 P4DTI-issue-id word 32 required 194 P4DTI-user word 32 always Values: P4DTI-action: keep/discard/wait/replicate Status: see below Presets: P4DTI-rid: None P4DTI-issue-id: None P4DTI-user: $user P4DTI-action: replicate Comments: # P4DTI-rid: P4DTI replicator identifier. Do not edit! # P4DTI-issue-id: TeamTrack issue database identifier. Do not edit! # P4DTI-user: Last user to edit this job. You can't edit this! # P4DTI-action: Replicator action. See section 11 of the P4DTI administrator guide.
The Status entry in the Values field should list the states that can be replicated from the defect tracker — for example, "open/closed/assigned/deferred/verified".
You can't have a field called "code" in the Perforce jobspec if you're using the integration. This is because Perforce uses the "code" field to pass information about the success or failure of the p4 job -o jobname command.

7.5. Making your own configurations

Warning: The configuration methods in this section are not supported by Perforce or TeamShare.

When making your own configuration, you must make sure that all elements of that configuration are consistent with each other. In particular, the defect tracker's issue record must be consistent with the replicated_fields parameter which must be consistent with the Perforce jobspec. Inconsistencies will cause you all sorts of headaches — you should think long and hard before going your own way.

I suggest using the following steps when making your own configuration:

Choose a new name that can be used as the value for the dt_name parameter, to distinguish your setup from the supported setup.
Edit init.py so that when the dt_name parameter has your new value, it doesn't call the configuration generator, but instead builds its own appropriate values for the variables defect_tracker_config, p4_config, replicator_config, and jobspec. Alternatively, you could call the configuration generator and then post-process the results it returns.

[This section needs a lot more work before it is usable. An example would be very useful. GDR 2001-01-02]

8. Providing a defect tracker interface to Perforce relations

The defect tracker should display, for each issue that is replicated, a description of the Perforce server to which the issue is replicated. Use the configuration parameter p4_server_description which you should have stored in a table in the defect tracker (see Section 4.3).

The defect tracker should display on each issue description page a table of fixes for that issue (if there are any). The table should look like the table below.

Change	Effect	Date	User	Description
5634 (pending)	closed	2000-12-07	GDR	Merging back to master sources.
5541	open	2000-12-06	GDR	If the owner of a job and the person who last changed it are the same, include them only once in any e-mail sent by the replicator about that job.
5524	open	2000-12-06	GDR	Fixed the replicator's user_email_address method so that it really returns None when there is no such user.
5493	open	2000-12-05	GDR	Added replicator method mail_concerning_job() for e-mailing people about a job.

Points to note about this table:

The fixes are listed with most recent first, because recent changes are likely to be more interesting than old changes.
Pending changelists are distinguished from submitted changelists. This is important because the effect of a pending changelist does not happen until the changelist is submitted. So in the above table the status of the job is still "open" but it is understood that when changelist 5634 is submitted it will become "closed".
The user and date are for the change (not for the fix). Knowning when the change was made and by whom is much more important than knowing when the change was linked with the job.
The user is the defect tracker user who corresponds to the Perforce user who made the change.
The change number is a link to the URL given by the changelist_url configuration parameter, with the change number substituted for the %d.
All the fixes for an issue will be being replicating by the same replicator and from the same Perforce server as the issue itself. So when building this table you only need to select records with the same replicator identifier and Perforce server identifier as the issue.

9. Adapting the configuration module

[Section not written yet. GDR 2000-12-10]

10. Adapting the manuals

[Section not written yet. GDR 2000-12-10]

11. Making your work available to the community

[Section not written yet. GDR 2000-12-10]

A. References

[Perforce 2000-10-09]	"Perforce 2000.1 P4 Command Line User's Guide"; Perforce Software; 2000-10-09; <http://www.perforce.com/ perforce/doc.001/ manuals/p4guide/>, <ftp://ftp.perforce.com/ /pub/perforce/r00.1/doc/ manuals/p4guide/p4guide.pdf>.
[Perforce 2000-10-11]	"Perforce 2000.1 System Administrator's Guide"; Perforce Software; 2000-10-11; <http://www.perforce.com/ perforce/doc.001/ manuals/p4sag/>, <ftp://ftp.perforce.com/ /pub/perforce/ r00.1/doc/ manuals/p4sag/p4sag.pdf>.

B. Document History

2000-10-16	RB	Created placeholder after meeting with LMB.
2000-12-10	GDR	Drafted sections 3 and 4.
2000-12-11	GDR	Drafted sections 2, 5, and 8 and outlined sections 6 and 7.
2000-12-31	GDR	The table of fixes in section 8 now distinguishes pending from submitted changes.
2001-01-02	GDR	Added section 7.1 (configuration architecture), figure 1, section 7.5 (customized configuration). Moved text from appendix D of the Administrator's Guide to section 7.5.
2001-02-04	GDR	Updated definition of `defect_tracker.all_issues` method.

Copyright © 2000 Ravenbrook Limited. This document is provided "as is", without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this document. You may make and distribute copies and derivative works of this document provided that (1) you do not charge a fee for this document or for its distribution, and (2) you retain as they appear all copyright and licence notices and document history entries, and (3) you append descriptions of your modifications to the document history.

$Id: //info.ravenbrook.com/project/p4dti/branch/2001-02-12/start-date-2/manual/ig/index.html#2 $