Lists Home |
Date Index |
- From: Jeff Greif <email@example.com>
- To: firstname.lastname@example.org
- Date: Thu, 03 Aug 2000 14:17:00 -0700
It seems there is an incontrovertible answer to the question
1. "What is the information content of a well-formed XML document?"
From this description of the information content, we can extract subsets
that prove useful in a majority of the contexts in which XML documents are
used, to answer such questions as
2. "What is the useful information content of a well-formed XML document?"
This is the question addressed by the XML Infoset spec that is the subject
of this controversy.
3. "What is the information content of a well-formed XML document that is
required to be preserved or accessible from a non-validating parser's
representation of an XML document?" or
4. "What is the information content of a well-formed XML document that is
required to be preserved or accessible from a validating parser's
representation of an XML document?"
This answer to question 1, probably expressible as a grove, includes the
logical content of the document in the form of elements, attributes, text,
annotated by additional information which enables exact replication of the
original document (e.g. whitespace layout, namespace prefixes, presence of
closing tags), and in addition containing complete describing references to
external resources (DTDs, namespaces, schemas) which are pertinent.
Presumably the logical items would be annotated as to whether they were
specified in a DTD, from external or internal subset, in place of ANY
element content markers, as the result of resolution of entities (and which
entities), etc. The information content also contains a set of constraints
(from DTDs or schemas) to which the document purports to conform. There is
clearly adequate information to determine whether it so conforms, but this
might not be determinable without running a validating parser, so it might
best be considered derived information left to the application to determine.
It seems that all the questions above, other than #1, are subject to
dispute; practical considerations, some more important than others to
different disputants, would allow reasonable people to reach different
conclusions about the desired answers to these questions. But I think it
makes sense to be formal and explicit about the answer to #1 in order to
frame the discussions about the others. Whatever formal representation is
used, the same formalism would help to highlight the differences between the
information content subsets, even if it is not used for presentation in the
spec, but is referred to there, even if non-normatively.