OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: The relentless march of abstraction

2/26/01 8:56:01 PM, Don Park <donpark@docuverse.com> wrote:

>I think infoset spec confuses people because there is no
>obvious use for it, like the way a history major would feel
>in a linear algebra class.  IMHO, 99% of XML users will find
>no use for it.  I found "Appendix D: What is not in the
>Information Set" useful as a list of what not to depend on
>when designing systems, but rest is just elevator music to me.

I think it confuses people because it's a description of an abstract model, 
not a concrete programming API.  The way I see it (yeah, I was a math major), 
the infoset model specifies which aspects of an XML document are invariant 
under certain (editing) transformations of that document.  For example, '$', 
'&#36', and '&#x24' make exactly the same contribution to the infoset, and 
therefore any application that relies on the infoset should not change its 
behavior if the particular method of writing a dollar sign in the source 
document changes (I picked the dollar sign because in WML, a literal dollar 
sign has different semantics from a dollar sign written as a numeric character 
reference.  This sort of dependency severely constrains parser APIs).

Note that there are always going to be some applications that do require 
purely lexical information about documents.  If you have a book in which each 
chapter is physically represented as an external parsed entity and you want to 
run it through a filter that looks up and inserts details for bibliographic 
references, you really do want its output to have the same physical entity 
structure as its input, rather than condensing the entire book into a single 
parsed entity.  But many applications don't require those lexical details, and 
an abstract model like the infoset tells implementers what details they can 
handle themselves and what details they can delegate to the parser.