XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] SGML default attributes.

General entities are evil because they offer a false solution that leads
to failure and pain.

Note that in XML the 5 special escapes are defined as escapes, not as
more-general text entities. They are built into the language. They are
necessary, as you say, to be able to escape literal markup characters
without resorting to numeric character references.

In the context of general text entities there are two classes: internal
general entities and external general entities. Internal entities are less
bad only because their declaration must be part of the DOCTYPE declaration
and thus it's more obvious that they are string macros and their value is
clearer to the author. External general entities are the more insidious
feature because they look and feel like real re-use when in fact they are
not.

The issue is entirely one of identity: In normal SGML and XML processing
contexts (that is, any processor that reflects the results of parsing and
that does not go to extremes to preserve in some way knowledge of the
original entity boundaries), then not only do entities not have identity,
they do not have existence at all.

Parameter entities have the same problem: they do not have identity. The
difference is the processing context: parameter entities are only of
interest to the parser itself for task of composing the grammar used to
validate the instance. [And I'll observe that efforts to add more
application-level reuse features to XSD and, to a lesser degree, RELAX NG,
have led to serious problems, such as the hideous redefine feature in XSD
1.0. So to the degree that DTDs avoided those problems by limiting
themselves to simple string macros with a basic positional configuration
mechanism, it did the right thing.]

That is, from the point of view of the processor operating on the parsed
documents, the entities *never existed*. That means the processor has no
opportunity to do the things that are always necessary when implementing
re-use, such as validating the correctness of the reference, rewriting IDs
and addresses to reflect the re-use, etc.

The problem with things like public IDs and URNs for grammars is that we
*want* them to have some meaning but in fact they do not have any reliable
meaning because they are, fundamentally, just pointers to storage objects
("files"). As I showed in my response to Roger Costello, with DTDs you can
completely lie. With a catalog file you can completely lie. With a
modified copy of a file you can completely lie. With a parameter entity
redefined in the internal subset you can completely lie. Of course the lie
can be detected by doing other validation but the point is that the
ability to lie is inherent in the mechanism and cannot be detected by DTD
validation itself.

The solution is to decouple the identifier of the abstract document type
from any references to any implementation expressions of the document
type. 

DITA does this decoupling by saying "There is a (potentially unbounded but
finite) set of uniquely-named vocabulary modules that are, for a given
version in time, invariant. For any given DITA document there is a set of
modules that constitute that document's 'DITA document type'. By
definition any two documents with the same set of modules have the same
DITA document type."

Because the DITA document type is defined in terms of the module *names*,
which are simply names for the modules *as abstractions*, the definition
is entirely in terms of the abstract document type.

The modules must be defined somewhere--there has to be some
definition--prose, DTD, XSD, Schematron, RNG, running code, whatever--but
it doesn't matter *for conformance* what form it takes. The practical
requirement is that the agents operating on the documents be able to make
sense of the module definition *as needed*. But since every DITA element
ultimately  maps back to a base type defined in the DITA standard, it's
not even necessary to understand anything about the modules. Even when
there is no formal grammar defined for a module (and thus nothing
available to validate instances that use that module), the document can
still be validated against the grammars for the base types, because those
are known (provided as part of the DITA standard). So you can always know,
for any DITA document, that it at least conforms to the grammar rules for
the base element types. Because relaxation of constraint is not allow you
also know that any specializations don't add anything you didn't expect
(because addition is not allowed).

Note the constraint: specializations must not relax constraints. This is
the big DITA constraint, but it's necessary to make the mechanism work,
because otherwise you'd unconstrained madness (that is, you'd have
DocBook, JATS, TEI, every other standard XML application that allows
unconstrained extension). The solution is to ensure that the base types
allow all reasonable options. One of the things you can see in the
evolution of the DITA standard is the removal of inappropriate constraints
in the base types. It's certainly  not perfect but it's close enough for
DITA 1.x. XML applications in other domains could, of course, choose
different sets of starting constraints--"reasonable" is of course context
dependent. 

For the purposes of imposing DITA's content reference constraints, which
are defined in terms of "document type compatibility" it is sufficient to
know the DITA document types of the two documents involved and the @class
values of the elements involved. There is no need to know anything about
the actual grammar rules *because those rules are already defined in the
DITA standard*. That is, because no specialization can be less constrained
than the base, if you know about the base you know the minimum you need to
know. Because constraints are defined through separate modules you also
know if a given document type is more constrained than another if they
otherwise use the same modules. You don't necessarily know *how* it's more
constrained, just that it is. That is sufficient to know that you should
not reuse elements from the less-constrained document in the
more-constrained document if you do not want to risk including something
the more-constrained document has chosen to disallow.

The DITA standard says conforming DITA documents do not need to have any
reference to any grammar because DITA doesn't depend on a particular
grammar file or form of reference to determine that a given document is or
is not a DITA document. DITA is an XML standard and XML does not require
the use of grammar references from document instances. (Remember that
there are many ways to associate validation with documents other than
pointing from the document to the grammar--if you know the document's
abstract type you can always provide a way to do the appropriate
validation, whatever form it might take. Conversely, if you don't know the
document type there's no much you can reliably do other than check that
it's well formed. And remember that the DOCTYPE declaration (or schema
reference or RELAX NG does not reliably tell you the document's document
type, for the reasons I've given.)

DITA depends on three things in document instances that are independent of
any grammar use in order to determine if a document is (A) a DITA document
and (B) what its DITA document type is:

1. The presence of the @DITAArchVersion attribute, which is in a
DITA-defined namespace (and is the only use of namespaces in DITA other
than for non-DITA elements).
2. The presence of the @domains attribute on map and topic elements
3. The presence of the @class attribute with the DITA-defined syntax on at
least the map or topic element (but ideally on all elements not within a
DITA "foreign" element).

If all three conditions are met the document is almost certainly a DITA
document and can be reasonably validated against DITA requirements and
processed as a DITA document. Other vocabularies could have attributes
named @domains or @class and even have the same syntax, but none should
ever have the DITAArchVersion attribute.

My main issue with DTDs, in particular, is not that they weren't quite
valuable--obviously they were a very important innovation and have
tremendous practical value even today (the vast majority of DITA documents
are DTD-based, for various reasons)--but that they were misunderstood as
being THE primary or only definition of "document type" when they never
were. This led to a lot of misplaced effort, inappropriate and unrealistic
expectations, etc. 

Cheers,

E.


----
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 5/4/16, 2:08 PM, "Steve Newcomb" <srn@coolheads.com> wrote:

>Eliot,
>
>Like you, I'm not really wedded to the notion of parser-mediated
>transclusion.   On the other hand, I'm not really convinced we can 100%
>jettison it, either, or preach that the very concept is somehow evil.
>It's a hack, that's all.  (Frankly, hacks are what get us through the
>day.)
>
>What you've said is packed, as usual, with terrific insights.  I guess I
>just have trouble with the rhetoric.  It wouldn't bother me so much if I
>didn't think your words are (quite deservedly) influential.  If I didn't
>already know you so well, I might gather that it is your opinion that
>either:
>
>(1) entities have no identity
>
>or
>
>(2) entities may have identity but it doesn't make any difference,
>
>because
>
>...entities have no purpose other than content-level reuse in the
>context of parsing operations.
>
>Assuming I'm right, then I'm going to guess that it's your opinion that
>the *only* reason why we have the "lt" (less-than) general entity is to
>bypass the parser's natural inclination to recognize a STAGO (start tag
>open character).  As a purely practical matter, I must admit that I
>don't think I've *ever* used the "lt" general entity name for any other
>purpose.  And, truly, that purpose is a hack, pure and simple!  BUT:
>there's a vital principle here, and I don't want it to be trampled and
>lost.
>
>The principle at work here, at least for me, is that when I'm invoking
>an entity by name, I'm using a defined name to refer to an abstract
>thing, namely that abstraction which is shared by all "less-than"
>characters in all character sets, fonts, encodings, and whatnot.  The
>fact that I'm invoking the notion of "the less-than character" in the
>context of parsed character data is irrelevant. I might instead use the
>entity name "lt" as the value of an ENTITY attribute, for example, or in
>any of the many ways that HyTime, for example, exploited the notion of
>entity identity.
>
>In SGML, every aspect of the use of names to identify things is founded
>on the notions inherent in DTD syntax.  And DTDs can be fully or
>partially shared among many documents that invoke those element-type
>names, attribute names, and entity names, so that they are all
>(presumably, *cough*) invoking the same things whenever they utter the
>same names.  And the DTDs themselves can also have "universal" names by
>invoking the universes (somehow) identified in PUBLIC identifiers.  So I
>would argue that you err in portraying entities and document types as
>different things.  Instead, I think they are in fact best understood as
>different perspectives on one and the same organic whole, a single
>"grounding tree", if you like.
>
>Entity identity is the invisible root of the grounding tree.  In my
>view, the names declared in document types and invoked in document
>instances are merely the visible, above-ground parts of it.  Now, one
>may claim that we don't need entity identity for that purpose, just as
>we don't need gold to back up the U.S. dollar.  Hmmmm.  But there's
>still identity, even there, and in the case of U.S. dollars -- even the
>huge majority of them that don't have individual identity -- their
>root-existence and root-nature is arguably testable in the form of U.S.
>military power.
>
>Where's the power of URIs, if there no testable "there" there, and they
>don't even necessarily resolve?  Where's the identity of a document
>type, if not in an entity of some kind that is testably somewhere and
>ideally has properties that are useful for testing instances that claim
>to be of the type?
>
>I don't see how your explanation of DITA's approach resolves the
>problem.  When you say:
>
>> ...stop caring about the grammar as an artifact and care only
>> about the set of (abstract) vocabulary modules the document says it
>>(may)
>> use. That is, actually declare the abstract document type in an
>> unambiguous way and worry about validation details separately.
>
>...you don't say how to resolve the problem, other than, implicitly,
>anyway, via entity identity: the identities of the DITA modules,
>wherever they are.  Right?  You just don't admit it up front in ENTITY
>declarations.  It's just understood by everybody, more or less
>intuitively, I guess.
>
>How is that better?
>
>Steve
>
>On 05/04/2016 01:12 PM, Eliot Kimber wrote:
>> These are really two different subject domains: entities (content-level
>> reuse) and document types (defining and determining correctness of
>> instances against some understood set of rules).
>>
>> On general entities:
>>
>> General entities are absolute evil. They should never be used under any
>> circumstances. Fortunately, the practical reality of XML is that they
>> almost never are used. I only see them in XML applications that reflect
>> recent migration from legacy SGML systems.
>>
>> The alternative is link-based reuse, that is, reuse at the application
>> processing level, not at the serialization parser level. Or more
>> precisely: reuse is an application concern, not a serialization concern.
>> Entities in SGML and XML are string macros. To the degree that string
>> macros are useful then they have value and in the context of DTD
>> declarations parameter entities have obvious value and utility.
>>Parameter
>> entities are not evil.
>>
>> But in the context of content, that is, the domain of the elements
>> themselves, string macros are a big problem, not because they aren't
>> useful, but because people think they do something they don't, namely
>> provide a way to do reliable reuse. The use cases where string macros
>>are
>> useful relative to the use cases where they are actively dangerous is so
>> small as to make their value not at all worth the cost of their certain
>> misuse.
>>
>> Even for apparently-simple use cases like string value parameterization
>>in
>> content (e.g., product names or whatever), string macros fail because
>>they
>> cannot be related to specific use contexts. When you push on the
>> requirements for reuse you quickly realize that only application-level
>> processing gives you the flexibility and opportunities required to
>> properly implement re-use requirements, in particular, providing the
>> correct resolution for a given use in a given use context.
>>
>> The solution was in HyTime, namely the content reference link type,
>>which
>> was a link with the base semantic of use by reference. Because it is a
>> link it is handled in the application domain, not the parsing domain.
>>This
>> is transclusion as envisioned by Ted Nelson.
>>
>> You see this in DITA through DITA's content reference facility and the
>> map-and-topic architecture, both of which use hyperlinks to establish
>> reuse relationships. With DITA 1.3 the addressing mechanism is
>> sufficiently complete to satisfy most of the requirements (the only
>> missing feature is indirection for references to elements within topics,
>> but I defined a potential solution that does not require any
>>architectural
>> changes to DITA, just additional processing applied to specific
>> specializations).
>>
>> I'm not aware of any other documentation XML application that has the
>> equivalent use-by-reference features, but DITA is somewhat unique in
>>being
>> driven primarily by re-use requirements, which is not the case for older
>> specifications like DocBook, NLM/JATS, and TEI. Of course, there's no
>> barrier to adding similar features to any application. However, there
>>are
>> complications and policy considerations that have to be carefully worked
>> out, such as what are the rules for consistency between referencing and
>> referenced elements? DITA has one policy, but it may not be the best
>> policy for all use cases.
>>
>> On DTDs and grammars in general:
>>
>> I do not say that DTDs (or grammars in general) are evil.
>>
>> I only say that the way people applied them was (and is) misguided
>>because
>> they misunderstood (or willfully ignored in the face of no better
>> alternative) their limitations as a way to associate documents with
>>their
>> abstract document types. Of course DTDs and grammars in general have
>>great
>> value as a way of imposing some order on data as it flows through its
>> communication channels and goes through its life cycle.
>>
>> But grammars do not define document types.
>>
>> At the time namespaces were being defined I tried to suggest some
>>standard
>> way to identify abstract document types separate from any particular
>> implementation of them: basically a formal document that says "This is
>> what I mean by abstract document type 'X'". You give it a URI so it can
>>be
>> referred to unambiguously and you can connect whatever additional
>> governing or augmenting artifacts to it you want. By such a mechanism
>>you
>> could have as complete a definition of a given abstract document type as
>> you wanted, including prose definitions as well as any number of
>> implementing artifacts (grammars, Schematrons, validation applications,
>> phone numbers to call for usage advice, etc.).
>>
>> But of course that was too heavy for the time (or for now). Either
>>people
>> simply didn't need that level of definitional precision or they used the
>> workaround of pointing in the other direction, that is, by having
>> specifications that say "I define what abstract document type 'X'" is.
>>
>> This is was in the context of the problem that namespace names don't
>>point
>> to anything: people had the idea that namespace names told you something
>> but we were always clear that they did not--they were simply magic
>>strings
>> that used the mechanics of URIs to ensure that you have a
>> universally-unique name.
>>
>> But the namespace tells you nothing about the names in the space (that
>>is,
>> what is the set of allowed names, where are their semantics and rules
>> defined, etc.). The namespace spec specifically says "You should not
>> expect to find anything at the end of the namespace URI and you should
>>not
>> try to resolve it".
>>
>> So if the namespace name is not the name of the document type, what is?
>>I
>> wanted there to be one because I like definitional completeness.
>>
>> But in fact it's clear now that that level of completeness is either not
>> practical or is not sufficiently desired to make it worth trying to
>> implement it.
>>
>> So we're where we were 30 years ago: we have grammar definitions for
>> documents but we don't have a general way to talk about abstract
>>document
>> types as distinct from their implementing artifacts (grammars,
>>validation
>> processors, output processors, prose definitions, etc.).
>>
>> But experience has shown that it's not that big of a deal in practice.
>>In
>> practice, having standards or standards-like documents is sufficient for
>> those cases where it is important.
>>
>> As far as addressing the problem that the reference from a document
>> instance a grammar in fact tells you nothing reliable, a solution is
>>what
>> DITA does: stop caring about the grammar as an artifact and care only
>> about the set of (abstract) vocabulary modules the document says it
>>(may)
>> use. That is, actually declare the abstract document type in an
>> unambiguous way and worry about validation details separately.
>>
>> DITA does this as follows:
>>
>> 1. Defines an architecture for layered vocabulary.
>>
>> The DITA standard defines an invariant and mandatory set of base element
>> types and a mechanism for the definition of new element types in terms
>>of
>> the base types. All conforming DITA element types and attributes MUST be
>> based on one of the base types (directly or indirectly) and must be at
>> least as constrained as the base type (that is, you can't relax
>> constraints). This is DITA specialization. It ensures that all DITA
>> documents are minimally processable in terms of the base types (or any
>> known intermediate types). It allows for reliable interoperation and
>> interchange of all conforming DITA documents. Because the definitional
>> mechanism uses attributes it is not dependent on any particular grammar
>> feature in the way that HyTime is. Any normal XML processor (including
>>CSS
>> selectors) can get access to the definitional base of any element and
>>thus
>> do what it can with it. The definitional details of an element are
>> specified on the required @class attribute, e.g. class="- topic/p
>> mydomain/my-para ", which reflects a specialization of the base type "P"
>> in the module "topic" by the module "mydomain" with the name "my-para".
>> Any general DITA-aware processor can thus process "my-para" elements
>>using
>> the rules for "p" or, through extension, can have "mydomain/my-para"
>> processing, which might be different. But in either case you'll get
>> something reasonable as a result.
>>
>> 2. Defines a modular architecture for vocabulary such that each kind of
>> vocabulary definition (map types, topic types, or mix-in "domains")
>> follows a regular pattern. There is no sense of "a" DITA DTD, only
>> collections of modules that can be combined into document types (both in
>> the abstract sense of "DITA document type" and in the implementation
>>sense
>> of a "a working grammar file that governs document instances that use a
>> given set of modules").
>>
>> DITA requires that a given version in time of a module is invariant,
>> meaning that every copy of the module should be identical to every other
>> (basically, you never directly modify a vocabulary module's grammar
>> implementation). Each module is given a name that should be globally
>> unique, or at least unique within its expected scope of use. Experience
>> has shown us that it's actually pretty easy to ensure practical
>>uniqueness
>> just by judicious use of name prefixes and general respect for people's
>> namespaces. No need to step up to full GUID-style uniqueification ala
>>XML
>> namespaces.
>>
>> In addition to vocabulary modules, which define element types or
>> attributes, you can have "constraint modules", which impose constraints
>>on
>> vocabulary defined in other modules. Constraint modules let you further
>> constrain the vocabulary without the need to directly modify a given
>> module's grammar definition. Again, the rule is that you can only
>> constrain, you can't relax.
>>
>> 3. Defines a "DITA document type" as a unique set of modules, identified
>> by module name. If two DITA documents declare the use of the same set of
>> modules then by definition they have the same DITA document type. This
>> works because of rule (2): all copies of a given module must be
>>identical.
>> So it is sufficient to simply identify the modules. In theory one could
>>go
>> from the module names to some set of implementations of the modules
>> although I don't know of any tools that do that because in practice most
>> DITA documents have associated DTDs that already integrate the grammars
>> for the modules being used. But it is possible. The DITA document type
>>is
>> declared on the @domains attribute, which is required on DITA root
>> elements (maps and topics).
>>
>> Note that you could have a conforming DITA vocabulary module that is
>>only
>> ever defined in prose. As long as documents reflected the types
>>correctly
>> in the @class attributes and reflected the module name in the @domains
>> attribute the DITA definitional requirements are met. It would be up to
>> tool implementors to do whatever was appropriate for your domain (which
>> might be nothing if your vocabulary exists only to provide
>>distinguishing
>> names and doesn't require any processing different from the base).
>>Nobody
>> would do this *but they could*.
>>
>> Thus DITA completely divorces the notion of "document type" from any
>> implementation details of grammar, validation, or processing, with the
>> clear implication that there better be clear documentation of what a
>>given
>> vocabulary module is.
>>
>> Cheers,
>>
>> E.
>> ----
>> Eliot Kimber, Owner
>> Contrext, LLC
>> http://contrext.com
>>
>>
>>
>>
>> On 5/4/16, 11:06 AM, "Steve Newcomb" <srn@coolheads.com> wrote:
>>
>>> Eliot,
>>>
>>> In order to avoid potential misunderstandings, I think it might be
>>>worth
>>> clarifying your position on the following points:
>>>
>>> (1) Resolved: the whole idea of entity identity was a mistake, is
>>> worthless, and is evil.
>>>
>>> (2) Resolved: the whole idea of document type identity was a mistake,
>>>is
>>> worthless, and is evil.
>>>
>>> I have deliberately made these statements extreme and obviously silly
>>>in
>>> order to dramatize the fact that, even though there are problems with
>>> SGML's and/or XML's operational approaches to them, we cannot discard
>>> these ideas altogether.  The ideas themselves remain profound and
>>> necessary.  They will always be needed.  The usefulness of their
>>>various
>>> operational prostheses will always be limited to certain cultural
>>> contexts.  Even within their specific contexts, those prostheses will
>>> always be imperfect.  They will always require occasional repair and
>>> replacement, in order that they remain available for use even as that
>>> context's notions of "entity", "document", and "identity" continue to
>>> evolve and diversify.
>>>
>>> The operational prostheses with which these ideas were fitted at SGML's
>>> birth are things of their time.  That was then, this is now, and "time
>>> makes ancient good uncouth".  Their goodness in their earlier context
>>>is
>>> a matter of record; they were used, a lot, for a lot of reasons and in
>>>a
>>> lot of ways.  At the time, it was not stupid or evil to make the notion
>>> of document type identity depend on the notion of entity identity, nor
>>> was it stupid or evil to make the notion of entity identity dependent
>>>on
>>> PUBLIC identifiers.  And in many ways, it still isn't.  What is your
>>> proposed alternative, and why is it better?
>>>
>>> Steve
>>>
>>> On 05/04/2016 11:23 AM, Eliot Kimber wrote:
>>>> SGML requires the use of a DTD--there was no notion of a "default"
>>>>DTD.
>>>> This requirement was, I'll argue, the result of a fundamental
>>>>conceptual
>>>> mistake--understandable at the time but a mistake nevertheless.
>>>>
>>>> The conceptual mistakes that SGML made was conflating the notion of an
>>>> abstract "document type" with the grammar definition for (partially)
>>>> validating documents against that document type. That is, SGML saw the
>>>> DTD
>>>> as being equal to the definition of the "document type" as an
>>>> abstraction.
>>>> But of course that is nonsense. There was (remains today) the
>>>>misguided
>>>> notion that a reference to an external DTD subset somehow told you
>>>> something actionable about the document you had. But of course it
>>>>tells
>>>> you nothing reliable because the document could define it's "real" DTD
>>>> in
>>>> the internal subset or the local environment could put whatever it
>>>>wants
>>>> at the end of the public ID the document is referencing.
>>>>
>>>> Consider this SGML document:
>>>>
>>>> <!DOCTYPE notdocbook PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" [
>>>>     <!ELEMENT notdocbook ANY >
>>>>     <!ELEMENT bogus ANY >
>>>> ]>
>>>> <notdocbook>
>>>>     <bogus><para>This is not a DocBook document</para></bogus>
>>>> </notdocbook>
>>>>
>>>> This document will be taken as a DocBook document by any tool that
>>>> thinks
>>>> the public ID means something. But obviously it is not a DocBook
>>>> document.
>>>> It is, however, 100% DTD valid. QED DTDs are useless as tools of
>>>> document
>>>> type definition. The only reason the SGML (and now XML world) didn't
>>>> collapse under this fact is that the vast majority of SGML and XML
>>>> authoring and management tools simply refused to preserve internal
>>>> subsets
>>>> (going back to the discussion about DynaBase's problems with entity
>>>> preservation).
>>>>
>>>> Standoff grammars like XSD and RELAX NG at least avoid the problem of
>>>> internal DTD subsets but they still fail to serve as reliable
>>>> definitions
>>>> of document types in abstract because they are still only defining the
>>>> grammar rules for a subset of all possible conforming documents in a
>>>> document document type.
>>>>
>>>> Because of features like tag omission, inclusion exceptions, and short
>>>> references, it was simply impossible to parse an SGML document without
>>>> having both its DTD and its SGML declaration (which defined the
>>>>lexical
>>>> syntax details). There is a default SGML declaration, but not a
>>>>default
>>>> DTD.
>>>>
>>>> A lot of what we did in XML was remove this dependency by having a
>>>>fixed
>>>> syntax and removing all markup minimization except attribute defaults.
>>>>
>>>> XML does retain one markup minimization feature, attribute defaults.
>>>> Fortunately, both XSD and RELAX NG provide alternatives to DTDs for
>>>> getting default attribute values.
>>>>
>>>> Cheers,
>>>>
>>>> Eliot
>>>> ----
>>>> Eliot Kimber, Owner
>>>> Contrext, LLC
>>>> http://contrext.com
>>>>
>>>>
>>>>
>>>>
>>>> On 5/4/16, 6:16 AM, "Norman Gray" <norman@astro.gla.ac.uk> wrote:
>>>>
>>>>> Greetings.
>>>>>
>>>>> (catching up ...)
>>>>>
>>>>> On 29 Apr 2016, at 17:58, John Cowan wrote:
>>>>>
>>>>>> On Fri, Apr 29, 2016 at 8:54 AM, Norman Gray
>>>>>><norman@astro.gla.ac.uk>
>>>>>> wrote:
>>>>>>
>>>>>> In the XML world, the DTD is just for validation
>>>>>>
>>>>>>
>>>>>> That turns out not to be the case.  There are a number of XML DTD
>>>>>> features
>>>>>> which affect the infoset returned by a compliant parser.  If they
>>>>>>are
>>>>>> in
>>>>>> the internal subset, the parser MUST respect them;
>>>>> I stand corrected; I was sloppy.  I think this doesn't change my
>>>>> original point, however, which was that in SGML the DTD was integral
>>>>>to
>>>>> the document, and to the parse of the document, and that it's easy to
>>>>> forget this after one has got used to two decades of XML[1].  I can't
>>>>> remember if there was a trivial or default DTD which was assumed in
>>>>>the
>>>>> absence of a declared one, in the same way that there was a default
>>>>> SGML
>>>>> Declaration, but taking advantage of that would probably have been
>>>>> regarded as a curiosity, rather than normal practice.
>>>>>
>>>>> In XML, in contrast, the DTD has a more auxiliary role, and at a
>>>>>first
>>>>> conceptual look, that role is validation (even though -- footnote! --
>>>>> it
>>>>> may change other things about the parse as well).  Thus _omitting_ an
>>>>> XML DTD (or XSchema) is neither perverse nor curious.
>>>>>
>>>>> Practical aspect: When I'm writing XML, I use a DTD (in whatever
>>>>> syntax)
>>>>> to help Emacs tell me if the document is valid, but I don't even know
>>>>> whether the XML parsers I use are capable of using a DTD external
>>>>> subset.  That careless ignorance would be impossible with SGML.
>>>>>
>>>>> The rational extension of that attitude, of course, is MicroXML,
>>>>>which
>>>>> (as you of course know) doesn't use any external resources at all,
>>>>>and
>>>>> doesn't care about validation.
>>>>>
>>>>> Best wishes,
>>>>>
>>>>> Norman
>>>>>
>>>>>
>>>>> [1] Hang on, _two_ decades?!  I've just checked and ... 1996 doesn't
>>>>> seem that long ago.
>>>>>
>>>>>
>>>>> -- 
>>>>> Norman Gray  :  https://nxg.me.uk
>>>>> SUPA School of Physics and Astronomy, University of Glasgow, UK
>>>>>
>>>>> 
>>>>>______________________________________________________________________
>>>>>_
>>>>>
>>>>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>>>>> to support XML implementation and development. To minimize
>>>>> spam in the archives, you must subscribe before posting.
>>>>>
>>>>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>>>>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>>>>> subscribe: xml-dev-subscribe@lists.xml.org
>>>>> List archive: http://lists.xml.org/archives/xml-dev/
>>>>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>>>>
>>>>
>>>> 
>>>>_______________________________________________________________________
>>>>
>>>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>>>> to support XML implementation and development. To minimize
>>>> spam in the archives, you must subscribe before posting.
>>>>
>>>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>>>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>>>> subscribe: xml-dev-subscribe@lists.xml.org
>>>> List archive: http://lists.xml.org/archives/xml-dev/
>>>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>>
>>> _______________________________________________________________________
>>>
>>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>>> to support XML implementation and development. To minimize
>>> spam in the archives, you must subscribe before posting.
>>>
>>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>>> subscribe: xml-dev-subscribe@lists.xml.org
>>> List archive: http://lists.xml.org/archives/xml-dev/
>>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>>
>>
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
>_______________________________________________________________________
>
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS