[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Abbreviated Tag Names
- From: Gavin Thomas Nicol <gtn@ebt.com>
- To: Don Park <donpark@docuverse.com>, Xml-Dev <xml-dev@lists.xml.org>
- Date: Sat, 10 Mar 2001 10:50:13 -0500
> So, I have been thinking about abbreviated tag names and wanted your
> thoughts on the subject. There are many aspects to this issue:
>
> 1) should schemas be expanded or an alternate version be used?
> 2) should a new namespace be defined or old namespace be reused?
> 3) what role does RDDL play?
> 4) should there be a dynamic abbreviation mechanism? [no, imho]
I don't know. If you take a document
<purchase.order xmlns:po="...">
<po:sku>123456789</po:sku>
<num.items>23</num.items>
...
</purchase.order>
and then run it through a processor, to produce
<a>
<b>123456789</b>
<c>23</c>
...
</a>
have the semantics changed? I would argue that to the naive application,
they have, *but* to the receiver, they are probably equivalent.
Just as a WAP gateway will encode and compress WML, what's wrong with
having an application-specific tag compression (it might even compress
text sequences as well)? FWIW. I'd be stripping out all the namespaces,
because they add nothing to the data (in terms of semantics)... the
receiver *knows* the namespace(s).
So I guess the whole point here is that *if* you control the receiver,
and you can constrain the possible set of tags that it can interpret,
then you can always perform a bidirectional transformation to
shorten the tags.
I should note that a long time ago now, I performed an analysis of
the benefits of such minimization *after* compression. Typically,
it saves between 2.5% and 15% (using gzip), depending on the tagging
structures and the size of the document (information redundancy increases as
a function of document size, so the advantage decreases). I would
expect that with the block-sorting compression algorithms (bzip2),
the overall effect would be even less.