Re: [xml-dev] Should information be encoded into identifiers?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Peter Hunsberger <peter.hunsberger@gmail.com>
To: "Costello, Roger L." <costello@mitre.org>
Date: Mon, 8 Mar 2010 12:34:12 -0600

From: Peter Hunsberger [peter.hunsberger@gmail.com]
Sent: Monday, March 08, 2010 10:39 AM
To: Costello, Roger L.
Subject: Re: [xml-dev] Should information be encoded into identifiers?

On Fri, Mar 5, 2010 at 3:57 PM, Costello, Roger L. <costello@mitre.org> wrote:
>
> Hi Folks,
>
> Should identifiers be dumb? That is, no meaning can be ascribed to identifiers; they are completely random.
>
> Or, should information be encoded into identifiers? What information should be encoded into them?
>
> There are precedents for encoding information into identifiers:
>
<snip>examples</snip>
>
> I suspect there are other examples of identifiers that have information encoded into them.
>
> What are the advantages of encoding information into an identifier? What are the disadvantages?
>

This has been an interesting thread to follow and I like many of the
comments that have been made so far.  I thought I'd start my reply by
taking a slightly different tack on this question, which is to point
out that in part you are asking "when is it ok to assign semantics or
meaning to an otherwise opaque identifier?"   To me, this version of
the question is significant for two reasons:

1) your examples only have meaning to people that understand the
semantics at hand, the semantics are unlikely to be independently
derived (at least not without significant work);

2) people end up assigning semantics to otherwise opaque identifiers,
even when they are randomly generated;

For example of the latter, we have a bunch of patient Ids that have
become embedded in the memory of practically every developer who has
touched one of our research systems.  They happen to be good examples
of certain common or notorious use cases and the semantics they end up
with are those of the use case they get associated with. (A wild
tangent is whether this is how all language gets created...)

Now not much of this addresses your question as to whether semantics
should be intentionally assigned to ids, except to point out that
semantics are where you find them.  As a database designer, the answer
to your question is simple: if any field is a good candidate key then
you use it, regardless of any attached semantics.  If no good
candidate key can be found for your data you create a surrogate key.
If that surrogate key is random, or serially incremented, or something
else really doesn't matter.  For the cases you describe, you do not
have an artificial surrogate key or a pure data based (hah!) candidate
key, but rather some kind of hybrid.  I can see two reasons for using
such a key:

1)  they are the only representation of some bit of data (you have to
decode the key to find it) -- the normal reason for using any data
field as a key;

2) the data can be found otherwise, but using it as key is inefficient
and some form of compression will be result in some overall efficiency
(of storage or transmission or whatever).

As such, these are perfectly good database keys and for me, any good
database key is a reasonable candidate (hah again!) as an id for more
general public consumption.  (There may be reasons of security or
application design for a database key not to be used in public but
that's mostly a different conversation.)  However, personally, when a
key goes floating out into the public I like it to have two characteristics:

1) the id should be traceable, or perhaps more precisely;
indempotent, unique and auditable. It also wouldn't hurt if it's not
too hard for humans to pick out of a list;

2) the syntax and semantics (if any) should be language independent,
to the extent needed by your intended user base, while allowing for
the above.  (If you need language specifics they should be mapped on
top of the non-language specific beast);

So, after all of this, I've got to conclude, that your examples are
likely good candidates as public ids, and that yes, ids that have
embedded semantics are probably a good idea, but they are not
necessary.

References:
- Should information be encoded into identifiers?
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]