Lists Home |
Date Index |
>v[From: "W. E. Perry" <email@example.com>]
>We can take either a top-down or a bottom-up view of 'natural' element
>processing. In the top-down view, data, consisting of element structure plus
>content, is created to the expectations expressed (hard-coded!) in a
But isn't one there one, canonical, element-structure-plus-content view
dictated by XML 1.0 itself. i.e. that the syntactic form can be
mechanically morphed into a hierarchy view? This jejune hierarchy view
does not need to be hardcoded. However, the stuff that is and is not *in*
this hierarchy view dictates what can be in the process view.
>In the bottom-up view, the structure of the instance data must be
>adjusted at each manipulation to the expectations of the process.
This approach is, I think, where Charles Goldfarb et al. were going with
"structure controlled" versus "markup aware" SGML processors.
The former took the element-structure-plus-content view as
their point of departure. (A generation of these things became
known as ESIS processors).
The latter looked at the syntax - editing tools being the
most common example. Processors that cannot work at a
hierarchical structure level as the very building blocks of
that structure are under construction by the processor.
The problems start when you try and take either view of the data
as a point of departure for a process that produces XML as
well as consumes it. The XML view used dictates what
you can and cannot do.
Pure syntax view (lexical):
Pro: Can produce output identical in syntactic form to input.
Con: No tokenization to build on. No WF without separate WF parse pass.
You can end up building parts of XML parsers by stealth. Ugh!
Hierarchy view (infoset):
Pro: Nice tree, we likes trees, yessss, Gollum!
Con: Cannot reproduce input on output without a fiendishly complex infoset
view (a full SGML property set Grove).
The lossy nature of the transformation is processor specific and creates
nasty interop problems.
This I think is the central conundrum. Both views have pros and cons. Neither
is correct. We need both.
<Rant>Most importantly we need a freakin' pipeline
processing model so that we can chain lexical- and infoset-based processing
Taking either route to its extreme leads to madness. Lexical madness is
to build partial XML processors from chunks of regexp code leading
to all sorts of development/maintenance/conformance problems.
Infoset madness is the PSVI. Nuff said.