Lists Home |
Date Index |
- From: Tyler Baker <firstname.lastname@example.org>
- Date: Mon, 10 Aug 1998 14:56:16 -0400
David Megginson wrote:
> Sam Gentile writes:
> > > Also, we have been hearing rumors of a "short" XML notation. Is
> > > there one? We have a need to reduce the size of our buffers.
> No, there is no such thing. XML's parent, SGML, included extensive
> facilities for markup minimisation and has suffered badly for it,
> since SGML tools are far too difficult to write (there is still not a
> single Java-based SGML parser, beside probably more than a dozen
> Java-based XML parsers).
> There are, however, alternatives: for example, you could compile the
> XML to a compact binary format for internal storage then decompile it
> back to a verbose format for export -- there's no requirement to store
> it internally as text.
Simple some very simple compression algorithms like Huffman encoding for
instance, do very well with XML documents as the Name production that is used for
identifying tags among other things will be converted to some binary symbol that
is used as an index to lookup the actual name production. In fact, you could do
this all with entities by simply taking all of the Names specified in the DTD,
spit them into a List, and then declare all entities.
You could index all of this by using base 10 digits or else use something as high
as base 64 to encode the array references.
<!ENTITY % 0 "Foo">
<!ENTITY % 1 "Bar">
Then for a document which had element types with names "Foo" and "Bar" occurences
would be converted to:
For small documents like CDF for instance these sort of techniques may turn out
to be counter-productive.
BTW, on a side-note I am having a problem understanding whether the external
subset or the internal subset should be parsed first. I would assume that the
external subset should go first, but in this case it would make using INCLUDE and
IGNORE sections to be pretty useless. This is something that is not clarified as
far as I can tell in the 1.0 spec so if someone could clarify how this should be
handled by a parser, then I would greatly appreciate it.
Thanx in advance...
xml-dev: A list for W3C XML Developers. To post, mailto:email@example.com
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:firstname.lastname@example.org the following message;
To subscribe to the digests, mailto:email@example.com the following message;
List coordinator, Henry Rzepa (mailto:firstname.lastname@example.org)