[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: The relentless march of abstraction
- From: Eric Bohlman <ebohlman@earthlink.net>
- To: Don Park <donpark@docuverse.com>, XML-Dev <xml-dev@lists.xml.org> (E-mail)
- Date: Tue, 27 Feb 2001 01:07:30 -0600
2/26/01 8:56:01 PM, Don Park <donpark@docuverse.com> wrote:
>I think infoset spec confuses people because there is no
>obvious use for it, like the way a history major would feel
>in a linear algebra class. IMHO, 99% of XML users will find
>no use for it. I found "Appendix D: What is not in the
>Information Set" useful as a list of what not to depend on
>when designing systems, but rest is just elevator music to me.
I think it confuses people because it's a description of an abstract model,
not a concrete programming API. The way I see it (yeah, I was a math major),
the infoset model specifies which aspects of an XML document are invariant
under certain (editing) transformations of that document. For example, '$',
'$', and '$' make exactly the same contribution to the infoset, and
therefore any application that relies on the infoset should not change its
behavior if the particular method of writing a dollar sign in the source
document changes (I picked the dollar sign because in WML, a literal dollar
sign has different semantics from a dollar sign written as a numeric character
reference. This sort of dependency severely constrains parser APIs).
Note that there are always going to be some applications that do require
purely lexical information about documents. If you have a book in which each
chapter is physically represented as an external parsed entity and you want to
run it through a filter that looks up and inserts details for bibliographic
references, you really do want its output to have the same physical entity
structure as its input, rather than condensing the entire book into a single
parsed entity. But many applications don't require those lexical details, and
an abstract model like the infoset tells implementers what details they can
handle themselves and what details they can delegate to the parser.