[
Lists Home |
Date Index |
Thread Index
]
- To: xml-dev@lists.xml.org
- Subject: Re: [xml-dev] Suggestions for a slightly less verbose (and easierto author) XML
- From: Sean McGrath <sean.mcgrath@propylon.com>
- Date: Mon, 24 Jun 2002 11:42:00 +0100
- In-reply-to: <1024877339.13461.ezmlm@lists.xml.org>
[Paul Prescod]:
>>If the instances are generated under your control by a machine, then by
>>definition they won't use the short-tag feature if your regexps don't
>>support it. The complexity argument also does not wash: entities and
>>CDATA sections easily add the most complexity to XML of any feature.
[Tim Bray]
>Machine-generated XML usually doesn't do entities or CDATA. It does do
><someTag>
> ..stuff..
> ..stuff..
></someTag>
>and perl is just the ticket.
The problem of course is that there is no way to tell whether or not
the 1 Gig XML instance you are about to process contains any entities,
CDATA sections etc.
So you need to make assumptions about the processing environment in your
code. Such assumptions make me nervous and make Walter Perry very
nervous indeed (they are tantamount to XML vocabulary semantics assumptions).
I see three possibilities to make this work reliably:
1) a XML-Lint type utility that would flag the presence of such things
so that assumption-laden Perl is protected from making erroneous
processing decisions. Such lint-like utilities would make excellent
components in XPipe or Schemamachine or Ant or Cocoon or DSDL
pipelines.
2) A canonical XML representation guaranteed to have resolved away
all the funnies e.g. canonical XML or PYX.
3) An manifest mechanism is XML to allow a human/machine to declare
what features the XML instance uses e.g. XFM. This would be of the
hint variety - subject to formal confirmation by an XML-Lint type
utility - but very useful in stopping "grep" and Perl etc. in their tracks
if the manifest asserts something that contradicts the processing
assumptions.
4) A PSVI that .... (only joking!!!!!!)
Personally (surprise, surprise) I think the lint utility in a *pipeline*
is the way to go. That way, people can re-invent all of SGML's tag
minimization features in a layered way without heaping them
all into a monolithic morass with trickle down complexity
to all XML tools. This trickle down effect is what made
SGML such an exasperatingly powerful pain in the ass.
Lets not re-invent it.
Sean
|