[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Abandon the (mistaken) belief that XML attributesprovide "metadata" and set yourself free to explorecapability-based designs
- From: John Cowan <cowan@mercury.ccil.org>
- To: James Fuller <james.fuller.2007@gmail.com>
- Date: Sat, 19 Mar 2011 13:06:13 -0400
James Fuller scripsit:
> * I see this as a variant of the permathread 'elements vs attributes'
> e.g. 'attributes vs everything else' ... there is no conclusion to be
> made here
Quite so. But here's my take on it anyway (note especially point #2 in
favor of attributes):
http://recycledknowledge.blogspot.com/2008/03/elements-or-attributes.html
General points:
Attributes are more restrictive than elements, and all designs have some
elements, so an all-element design is simplest -- which is not the same
as best.
In a tree-style data model, elements are typically represented internally
as nodes, which use more memory than the strings used to represent
attributes. Sometimes the nodes are of different application-specific
classes, which in many languages also takes up memory to represent
the classes.
When streaming, elements are processed one at a time (possibly even
piece by piece, depending on the XML parser you are using), whereas
all the attributes of an element and their values are reported at once,
which costs memory, particularly if some attribute values are very long.
Both element content and attribute values need to be escaped, so escaping
should not be a consideration in the design.
In some programming languages and libraries, processing elements is
easier; in others, processing attributes is easier. Beware of using ease
of processing as a criterion. In particular, XSLT can handle either with
equal facility.
If a piece of data should usually be shown to the user, use an element;
if not, use an attribute. (This rule is often violated for one reason
or another.)
If you are extending an existing schema, do things by analogy to how
things are done in that schema.
Sensible schema languages, meaning RELAX NG, treat elements and attributes
symmetrically. Older and cruder schema languages tend to have better
support for elements.
Using elements:
If something might appear more than once in a data model, use an
element rather than introducing attributes with names like part1, part2,
part3 ....
If order matters between two pieces of data, use elements for them:
attributes are inherently unordered.
If a piece of data has, or might have, its own substructure, use
it in an element: getting substructure into an attribute is always
messy. Similarly, if the data is a constituent part of some larger piece
of data, put it in an element.
An exception to the previous rule: multiple whitespace-separated tokens
can safely be put in an attribute. In principle, the separator can be
anything, but schema-language validators are currently only able to
handle whitespace, so it's best to stick with that.
If a piece of data extends across multiple lines, use an element: XML
parsers will change newlines in attribute values into spaces.
If a piece of data is in a natural language, put it in an element so you
can use the xml:lang attribute to label the language being used. Some
kinds of natural-language text, like Japanese, also require annotations
that are conventionally represented using child elements; right-to-left
languages like Hebrew and Arabic may similarly require child elements
to manage bidirectionality properly.
Using attributes:
If the data is a code from an enumeration, code list, or controlled
vocabulary, put it in an attribute if possible. For example, language
tags, currency codes, medical diagnostic codes, etc. are best handled
as attributes.
If a piece of data is really metadata on some other piece of data (for
example, representing a class or role that the main data serves, or
specifying a method of processing it), put it in an attribute if possible.
In particular, if a piece of data is an ID (either a label or a reference
to a label elsewhere in the document) for some other piece of data,
put the identifying piece in an attribute. When it's a label, use the
name xml:id for the attribute.
Hypertext references (hrefs) are conventionally put in attributes.
If a piece of data is applicable to an element and any descendant elements
unless it is overridden in some of them, it is conventional to put it
in an attribute. Well-known examples are xml:lang, xml:space, xml:base,
and namespace declarations.
If terseness is really the most important thing, use attributes, but
consider gzip compression instead -- it works very well on documents
with highly repetitive structures.
Michael Kay says:
Beginners always ask this question.
Those with a little experience express their opinions passionately.
Experts tell you there is no right answer.
I say:
Newbies always ask:
"Elements or attributes?
Which will serve me best?"
Those who know roar like lions;
Wise hackers smile like tigers.
--a tanka, or extended haiku
Final words:
Break any or all of these rules rather than create a crude, arbitrary,
disgusting mess of a design if that's what following them slavishly
would give you. In particular, random mixtures of attributes and child
elements are hard to follow and hard to use, though it often makes good
sense to use both when the data clearly fall into two different groups
such as simple/complex or metadata/data.
--
Don't be so humble. You're not that great. John Cowan
--Golda Meir cowan@ccil.org
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]