=?UTF-8?Q?Re:_[xml-dev]_The_Goals_of_XML_at_25,_and_the_one_thing_that_X?==?UTF-8?Q?ML_now_needs?=
FWIW. The general trend is away from ETL to ELT or LET, with type projection being part of the 'T'... certainly in a lot of the more 'loose' integrations This is a use-case where JSON is actually very convenient (vs XML, for example)... a common pattern is to stream small chunks of JSON into a 'database' and to then do ad-hoc extraction/transformation as part of report generation.
On Wed, 21 Jul 2021 00:45:31 +1000, Rick Jelliffe
| Arjun wrote:
| *Or maybe I'm not getting the point here?*
|
| Automatic data binding. For a datatype to be attached to the parse tree
| (DOM etc) as a primitive type (a la C), something has to be told to take
| the text value and convert it: it could be a program, it could be a schema,
| or it could be from instance syntax (i.e. delimiters and lexical patterns).
Isn't that the job of the ETL subsystem responsible for loading a DOM
that is - or should be! - fit for purpose? (But yes, working with
generic DOMs would place that burden on the application.)
| So lets say I have a XSLT script which decorates an incoming document of a
| standard format with an ISO8601 date in an attribute @D.
Or a LPD? (Nah. Too bad ISO8879 bollixed the definition of LINK.)
| Contrast this with a richer syntax where the parser (or transducer is the
| right CS term?) can do those steps automatically with no configuation or
| coding of that on the server side.
The trouble with that is the universe of such useful auto-conversions
is unbounded. Why stop at ISO8601 dates? (How about Roverdates[*],
or DbaseIIdates, which are YYYYMMDD, 32-bit ints in C parlance?)
Customizing the ETL layer seems wiser, from a system design POV.
[*] After Rover, Salomon Brothers' hoary database (back then, some
well-known databases on Wall Street had names like Spot and Fido...)
| The thing is, it is ridiculous (IMHO) to claim that an ISO 8601 date is
| something that we really need freedom to allow clients to interpret
| differently
I always thought the argument was to leave to the client the decision
to use ISO8601 at all, as opposed to some other scheme.
| Instead of datatype, it might be good for SGML-ers to consider it in
| terms of NOTATION.
Now there's an idea! (And bring in data attributes as well?)
| But SGML did not provide a way to declare the NOTATION of an
| attribute value, doubly not providing it for DTD-less documents.
Actually, Annex K in the WebSGML TC did, but they could have added
that for elements as well.
(That went nowhere, of course, and now that Google has fubar-ed its
Dejanews takeover, the CTS archive is inaccessible altogether.)
| I think it is entirely reasonable and SGML-ish to want to specify the
| notation used for some attributes.
The case for DTD-less documents is harder, and I think intractable
within the confines of the spartan syntax of XML. There are other
punctuation characters that could be put to good use. And the case of
simple elements, with #PCDATA content models, are very suitable
candidates for content notations and shorthand based on the old NET
style:
<myItem someAtt="someVal" /data content here/>
This can be extended with other content delimiters
<myItem format="iso8601" /@2021-07-20/>
And so on. I don't think we can get by with just the current syntax.
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.