OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Penance for misspent attributes

[ Lists Home | Date Index | Thread Index ]

"Bullard, Claude L (Len)" <clbullar@ingr.com> wrote:

|> [1] http://www.sgmlsource.com/history/AnnexA.htm

The other stuff in the SGML History Niche is worth reading too:


| One might be called a Markup Luddite for adhering to what it describes, 
| but that is indeed the thinking that spawned the seminal work in markup 

And, IMHO, right on the money.  There's also a difference between being a
Markup Luddite and an ISO8879 Luddite.  ISO8879 gets a number of niggling
details wrong (eg data content notations for elements and the primary
status of ID attributes), but for the most part there are a number of
powerful concepts in SGML that the wholesale (NIH-driven?) impulse to
reinvent in the XML world is unnecessarily neglecting.

Much of that, I think, has to do with the generally underacknowledged
impact of existing software on ways of thinking.  If the software doesn't
support something, the tendency is to ignore it or to find workarounds,
and in any case, not to investigate too closely what it fails to support.
The downstream effect is an implicitly constrained approach to design
issues where everything is organized around what the software does happen
to support, only.  For instance, Simon wrote:

: A lot of people have been storing data in attributes rather than in
: element content.  There are lot of reasons for this, ranging from a more
: compact form to simpler processing in SAX.  (Attributes are presented as
: a convenient group, while you have to wait for child elements)

Hidden here is (IMHO) a deep critique of SAX as a useful API.  Using my
parsing theory parallel again, SAX is basically a (f)lex, a generator of
"interesting" tokens.  There is no yacc/bison analogue to *complete* the
parsing process, in particular to realize this statement from Annex A:

>   If, as postulated, descriptive markup like this suffices for all 
>   processing, it must follow that the processing of a document is a 
>   function of its attributes.

The missing piece is attributes to drive context sensitive processing.
The GI is not the only relevant attribute!  (On the contrary, it's the one
that has *already* been used to drive the order of "events", seen in the
fact that SAX guarantees element-start and element-end notifications in
proper stack order.)  Using my parsing techniques parallel once again,
there are two aspects to this - roughly corresponding to the needs of
bottom-up and top-down processing.

First, bottom-up, actions are normally associated with reductions.  The
element-end event in SAX actually conflates two separate semantic needs:
an element handler completing its own processing, and assimilation of the
completed element in the context of its parent - where often the parent
needs a *semantic summation* of the child ("synthesized" attributes).

Second, top-down, context sensitive processing often means invoking child
handlers according to inherited state.  if you're going to get useful
semantic information on the way back up, you'll need to pass parameters on
the way down.  This, I submit, is the essential function of SGML/XML
attributes: parametrization of how child element processing is invoked. 
Generally, this requires inspection of attributes in an element-start
handler to dispatch appropriate processing, and also points to the wisdom
of SGML elaborating on *types of names* in the concept of declared value,
because names are *the* way to signify or denote requirements.

[Strongly typed languages like Java are somewhat at a disadvantage here.
The element-start and element-end callbacks have 'void' signatures because
only generic (and thus useless) Objects can round-trip through the parser
back to the application.  Otherwise, one could imagine dispensing with the
setContentHandler() interface, and having stuff like this (in pseudo-code)

  interface  ContentHandler {
       ContentHandler  elementStart ( <start-tag-stuff> ) ;
       UsefulObject    elementEnd ( ) ;
       ContentHandler  childFinished ( <childref>, usefulObject ) ;
       ContentHandler  text ( data ) ;

where each element gets to set the handlers for its children (which it
will decide based on - what else - attributes) and also receive a "child
is done" event, on par with character data notifications, where it gets
another chance to direct things - as arguably it should.]

But critiquing SAX is not to single it out for blame (if any).  The fault
is systemic, extending to things like the Infoset.  I think it was you,
Len, who had reservations way back, when the Infoset didn't have lists
(arrays).   And lo, even in XSLT getting the list of tokens in a NAMES or
NMTOKENS or ENTITIES attribute needs an extension function!  It ain't
there, so people won't use it, don't use it, and never find out better.

Attribute-based processing has yet to happen in the XML world, and without
it, much of the point of generalized markup is lost. 


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS