Re: [xml-dev] The Goals of XML at 25, and the one thing that XML now nee

Arjun wrote:

Implicit conventions are quite common

in internal pipelines. The conversions from text to other data types
has to happen somewhere; I'm not seeing why it's easier overall for
this to be in the parser. Or maybe I'm not getting the point here?

Automatic data binding. For a datatype to be attached to the parse tree (DOM etc) as a primitive type (a la C), something has to be told to take the text value and convert it: it could be a program, it could be a schema, or it could be from instance syntax (i.e. delimiters and lexical patterns).

So lets say I have a XSLT script which decorates an incoming document of a standard format with an ISO8601 date in an attribute @D. And then produced XML is sent through the pipeline eventually to another process written in C or Kotlin which, say, reads the data into some kind of DOM. (And I am not someone who understands XML Schemas or which API to use for them, and anyway our system architect has banned variants of standard schemas, in case someone suggests using schemas.) Now, when I want to read that date, I have to produce code which checks that the date is in a correct lexical form, and parses it, and puts it somewhere on the DOM for me to use. Contrast to where the syntax rules for un

Contrast this with a richer syntax where the parser (or transducer is the right CS term?) can do those steps automatically with no configuation or coding of that on the server side.

The thing is, it is ridiculous (IMHO) to claim that an ISO 8601 date is something that we really need freedom to allow clients to interpret differently and therefore leave it up to the clients developers to determine it is a date: it will only either be parsed as a date or used for string-comparison-based collation (that is why the year and month comes before the day, after all.) Toput this another way, you cannot say what we must be done with a symbol or name or string @X="red" but markup like @Y=2021-07-21 is always going to have one thing done on it first: to be parsed as a date (even if just for validity).

Instead of datatype, it might be good for SGML-ers to consider it in terms of NOTATION. SGML did not leave it to the client to figure out what the notation of some text or reference or external entity was, it allowed NOTATION to be selected in the instance. (Of course, in XML on WWW, the notation is the MIME type and carried along as metadata to the resource, instead.) But SGML did not provide a way to declare the NOTATION of an attribute value, doubly not providing it for DTD-less documents. But that is a gap. (Does my memory tell me that HyTime tried to provide facilities that could be used for this kind of thing?) I think it is entirely reasonable and SGML-ish to want to specify the notation used for some attributes. I have no doubt that had ISO 8601 been around and well-established in 1986 for the initial SGML standard, it would have been considered for an attribute type (not saying it would have been adopted.)

Cheers

Rick

On Tue, Jul 20, 2021 at 11:06 PM Arjun Ray <arayq2@gmail.com> wrote:

On Tue, 20 Jul 2021 13:04:34 +1000, Rick Jelliffe
<rjelliffe@allette.com.au> wrote:

| May I argue that keeping data content untyped strings (i.e. you need a XMP
| Schema or casting to determine its type) but allowing limited basic typing
| of attribute values in no way compromises any theory of what tagging should
| be used for what purposes?

Sure. My "rule" about attributes was meant as advisory only! Further
along in the thread I cited (but which needs the thread index to find,
thanks to the bogotic handling of references in mail agents back then)
is a somewhat fuller explanation:

http://lists.xml.org/archives/xml-dev/200205/msg01043.html

The schema folks drove everything off the rails by introducing the
notion of "data typing" for attributes. This also instantly mystified
the older declared value typology. But it had the (possibly intended)
effect of solidifying the "use case" of attributes for ordinary data
values. Never mind the untold legions of Microserfs who learned the
"right way to do it" from gems of cluelessness such as this, graced
with the imprimatur of a W3C Note:

http://www.w3.org/TandS/QL/QL98/pp/microsoft-serializing.html

Bolting barn doors and all that. If (limited) data type recognition -
true numbers and booleans - is to be pushed into the parsing layer,
then we probably need a proper set of syntactic signals. I don't find
the "low hanging fruit" argument particularly persuasive.

| I like this syntax idea (unquoted attribute values have defined lexical
| types) not because it would compete with JSON more, but because it would
| take a clue from JSON and make traditional SGML-style publishing systems
| easier: particularly in internal pipelines which are inevitable done with
| no formal DTD or schema (i.e. normalized data.)

I'm not sure I understand this. Implicit conventions are quite common
in internal pipelines. The conversions from text to other data types
has to happen somewhere; I'm not seeing why it's easier overall for
this to be in the parser. Or maybe I'm not getting the point here?

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php