Arjun wrote:
Implicit conventions are quite common
in internal pipelines. The conversions from text to other data types
has to happen somewhere; I'm not seeing why it's easier overall for
this to be in the parser. Or maybe I'm not getting the point here?
Automatic data binding. For a datatype to be attached to the parse tree (DOM etc) as a primitive type (a la C), something has to be told to take the text value and convert it: it could be a program, it could be a schema, or it could be from instance syntax (i.e. delimiters and lexical patterns).
So lets say I have a XSLT script which decorates an incoming document of a standard format with an ISO8601 date in an attribute @D. And then produced XML is sent through the pipeline eventually to another process written in C or Kotlin which, say, reads the data into some kind of DOM. (And I am not someone who understands XML Schemas or which API to use for them, and anyway our system architect has banned variants of standard schemas, in case someone suggests using schemas.) Now, when I want to read that date, I have to produce code which checks that the date is in a correct lexical form, and parses it, and puts it somewhere on the DOM for me to use. Contrast to where the syntax rules for un
Contrast this with a richer syntax where the parser (or transducer is the right CS term?) can do those steps automatically with no configuation or coding of that on the server side.
The thing is, it is ridiculous (IMHO) to claim that an ISO 8601 date is something that we really need freedom to allow clients to interpret differently and therefore leave it up to the clients developers to determine it is a date: it will only either be parsed as a date or used for string-comparison-based collation (that is why the year and month comes before the day, after all.) Toput this another way, you cannot say what we must be done with a symbol or name or string @X="red" but markup like @Y=2021-07-21 is always going to have one thing done on it first: to be parsed as a date (even if just for validity).
Instead of datatype, it might be good for SGML-ers to consider it in terms of NOTATION. SGML did not leave it to the client to figure out what the notation of some text or reference or external entity was, it allowed NOTATION to be selected in the instance. (Of course, in XML on WWW, the notation is the MIME type and carried along as metadata to the resource, instead.) But SGML did not provide a way to declare the NOTATION of an attribute value, doubly not providing it for DTD-less documents. But that is a gap. (Does my memory tell me that HyTime tried to provide facilities that could be used for this kind of thing?) I think it is entirely reasonable and SGML-ish to want to specify the notation used for some attributes. I have no doubt that had ISO 8601 been around and well-established in 1986 for the initial SGML standard, it would have been considered for an attribute type (not saying it would have been adopted.)
Cheers
Rick