DCPD: Envelope

For additional DCPD information, see:

  1. Introduction
  2. Framework
  3. Envelope
  4. Extensions
  5. Common Formats

Data Envelope

The envelope for a DCPD collection is:

{
  dcpd-format: {          -- req
    name : "..."          -- req
    version : VERSION     -- req
    specification : URI   -- opt
    extensions : [        -- opt
	  { name : "...", version : VERSION, ... }   
    ]  
  }
  
  metadata : METADATA      -- req

  *
}

The dcpd-format stucture is a kind of file header: it describes what this particular file/structure is, and thus also what it is expected to contain. That is done by the name field, which is expected to be unique for a particular file type, as well as the version field. There may be a link to a specification document, which is supposed to contain all the unpleasant details, but the link is optional.

The name field identifies a particular format. This is up to the format designer or implementer to decide, but as it is the primary means of identification it must be unique. (No naming standard is proposed, but something on the lines of the Java package naming convention may be suitable.)

The version field identifies a particular version or release of that format. See Common Formats : VERSION for details.

The specification field is optional: it identifies the document that specifies the format. This is not a document that is expected to be kept inline or in local storage: it may be a normal https:// web link or a urn: document identifer. (This might possibly also need version information, document integrity mechanisms, such as file hashes or digital signatures. All that may also be used elsewhere: again, a common format for links is desirable. And at some point a reference to schema for a particular format realization may be useful. For the present, such refinements are left for possible future development.)

extensions: The format structure normally also contains extension information, listing the building blocks that are needed to interpret this particular file. (Note: the term ‘extension’ refers to structures that have been added to the basic envelope, not some kind of run-time extension.) Extensions should be included, even if the format specification says they are optional: this reduces reliance on implicit information, although it also increases the need for consistency checking. However, they are formally optional to avoid expressing a hard-wired policy by the format itself.

metadata: Information about the publishing history of the file. (See Common Formats : METADATA.)

The asterisk is an indication that, at this point, additional structures may be added, smörgåsbord-fashion, to fulfil a particular task. All of those structures will be specified as extensions to the basic envelope format shown above.

It seems desirable to require this and other structures to appear before the data they introduce. This makes it easier to give warning or error messages about extensions or version issues before a possibly large amount of data has been digested.

A DCPD envelope file/structure following this format is probably formally released on its own, and so has an identity, a publisher or owner, and a release history. All that will be stuffed into some kind of metadata structure, which is another one of the extensions, and so described in another post.

Note: The envelope is not necessarily a file on its own. It could be. But it could just as well be part of a file. For example, if publication is taken very seriously, some form of digital signature may be required to ensure that the content of the envelope has not been modified since it left the publisher. This signature may be included in a file, but kept outside the DCP envelope itself. However, designing such structures will be the job of the eventual file format designer for now.

In situations where no formal publication is involved (say, impromptu copy-and-paste of a single problem record), an alternative format declaration might be used, where name identifies a data type defined by a particular extension. Perhaps something like:

{
  dcpd-type : "simple-problem-collection:problem:standalone", version : "1.0" },
  problem : {
    ... actual problem data
  }
}

It will not be obvious that this is an improvement until a simple-problem-collection has been defined, and proves to be to cumbersome to use for a single problem. It seems probable that a type intended for problems in a collection will rely on indirections that probably won’t be desirable to retain in a cut-and-paste situations. Instead a special standalone type is likely to be required, in which all references and indirections in the original data have been replaced by the actual data.

Alternatively, a format for only this type of data interchange could created.

The next post goes into extensions.