OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Some random noise on rational type systems for XML

[ Lists Home | Date Index | Thread Index ]



"Amelia A.Lewis" wrote:
> 
> On Tue, 6 May 2003 21:28:38 -0400
> John Cowan <cowan@mercury.ccil.org> wrote:
> > Amelia A. Lewis scripsit:
> >

[disonant exchange about types, as to be expected as no distinction was made
between lexical and value domains]

> 
> I'm going to let Joe English respond here ...
> Joe English wrote:
> > But the _XML_ ur-type is string.  From the application
> > point of view, you might have dates, integers, IEEE double
> > precision floating point numbers, et cetera, but as far
> > as XML is concerned everything is a string.
> 
> That's terser than I know how to write without an editor.

that does not make it correct.
there are some would-be editors who would make the claim even shorter.

(1. all xml applications manifest abstract value domains)

until 19990114T222457 (+/- a few hourse for my uncertainty as to the time
zone) an application could well have gotten away with the misconception that
xml is a about strings only.

i suspect that, whether it admitted it or not, said application would have
been written at least in terms of the types

scalar-value (unicode)
string == scalar-value*
attribute == (string x string)
element = (string x attribute* x (element + string)* )

otherwise, there is no way it could have been concerned with the sequencing
and dominance relations which differentiate an xml-encoded unicode sequence
from a unicode sequence.

note that these are not the lexical domains described in the xml
recommendations. they are value domains for which the recommendation provides
to encode/decode as xml documents.

[i realize that a provision on the original message was to limit discussion to
atomic types, but i suggest that is advisable to always keep "minimal xml" in mind.]

a "pure-string" application would have had to include at least an xml
front-end which translated the values decoded from an xml document into
strings for eventual "pure-string" processing. but then the "application"
would not have been processing xml.

(2. namespace declarations add to the minimal abstract value domains)

as of the appearance of

<!-- http://www.w3.org is bound to n1 and n2 -->
  <x xmlns:n1="http://www.w3.org"; 
     xmlns:n2="http://www.w3.org"; >
    <bad a="1"     a="2" />
    <bad n1:a="1"  n2:a="2" />
  </x>

in REC-xml-names-19990114 any such applications are limited to processing a
subset of xml. a claim, that a conforming xml application can operate on
strings only, is incorrect. there will necessarily be some operations which
will produce different results with one well-defined subset of operand
combinations than with another set. it is not material that the effective
types are not reflected in the application's first class atomic data.

if an application does not express its intent more clearly, that is
unfortunate. if a standard does not provide the terms for an application to
express a necessary intent clearly, the standard is incomplete.


(3. there is a coherent set of minimal abstract domains)

by which standard, a minimal type system for xml would (ignoring constraints
on name characters, and allowing that ncnames are a special case of uname) be

scalar-value (unicode)
string == scalar-value*
uname == (string x string)
attribute == (name x string)
element = (name x attribute* x (element + string)* )

note that there is no such thing as a "qualified name" value domain. the
domain is also not called "expanded qname", as that would serve no purpose
other than to confuse the reader.

where the standard intends to support xml-encoding for a wider range of
application data it could well specify relations to concrete syntax for a
limited number of additional value domains

boolean == (true + false)
binary-sequence == number*
number
date

with everything else handled by rules for translating between combinations of
values from thse domains and sequences of lexical components as already
specified concrete syntax for the primitive domains.

> 
> > After all, every date and duration can be represented as a number.
> 
> Irrelevant.  XML doesn't store numbers.

(4. schema "types" specify the relation between value and lexical domains)

how a thing is encoded in an xml document and how a process treats that thing
which is so encoded are two separate things. as soon as one is concerned with
a value domain one is concerned with the latter. as such, the nature of
numbers cannot be irrelevant. that numbers are encoded in xml as a sequence of
code values may have no direct relevance to the significance of "number" as a
value domain, but purpose of a schema "type" is to establish a relation
between the two.

> 
> > For that matter, every string can be represented as a number by some
> > trick such as making each character a digit in base 2^20+2^16
> > notation.
> 
> Irrelevant.  XML doesn't store numbers to base 2^20+2^16 (unless you
> mean to suggest Unicode, in which case this is just another way of
> saying that everything is a string, in XML).
> 
> > That doesn't make you say that dates are numbers or that strings are
> > numbers.
> 
> When I'm dealing with Java, Dates are long integers (signed 64-bit
> ones-complement integers) measuring milliseconds since the epoch.  In
> Perl, I can certainly treat a string as a number or as a bitfield.
> 
> The point here is what I really badly want to label "the Bray ploy" as
> applied to value types.  Tim is famous for the slogan "XML is syntax".
> Applying that to value types, XML values are strings.

if you insist on that approach, you can process only a subset of xml.

> 
> > Nor are strings or numbers octet-sequences, either, although
> > of course they have several well-known representations as such.
> > Representation is a red herring.
> 
> There is a circumstance for which that is true: the type system permits
> multiple roots.  If, and only if, the type system permits multiple
> roots, then it is reasonable to restrict the notion of "string" to
> linguistic elements (words in English or Russian, for example).
> 
> W3C XML Schema wants to be singly-rooted.  So there's an ur-type.  The
> ur-type *is* a string, even though its *name* is "anySimpleType".

i suggest that this is ill-considered.

when i reflect on what i code, i try to avoid that methods which look like

  (defmethod do-something ((on-datum-x xsd:|anySimpleType|))
    ...
    )

incorporate apply operators which are defined for string arguments only.


>      When
> you munge something and don't know its schema type, so that all you can
> do is munge it as an anySimpleType, then you munge it as a string of (a
> subset of) Unicode characters.

i don't think one is well advised to do this.

> 

[useful discussion of the utility of constructing types in both the value and
lexical domains by combination]

> 
> > > Hmm.  We're missing one.  Ah, that's it: QName.  Question: does XML
> > > need a pointer type?  Which would, of course, be represented as a
> > > string.  If so, it might include, for instance, QName, XPath
> > > expressions, and URIs. Let's say that there's an abstract pointer,
> > > maybe.
> >
> > The difficulty is that QNames are really different from URIs, because
> > their interpretation is extremely context-sensitive, and you can't
> > tell just by looking at the representation of one whether it actually
> > refers to anything or not.
> >
> > QName is an irritating datatype, but if we have to have it, it needs
> > to be a seventh equal partner.  IRIs, OTOH, really are a subtype of
> > strings: their definition is purely syntactic.
> 
> Here's the quibble, though.  If you include QNames,

qnames are a lexical domain only. they do not exists as a value domain. unames
(universal names) are the corresponding value domain. unames are context-free.
the namespaces in xml recommendation specifies how to translate qname
expressions into universal names. once they are translated, there is no context-sensitivity.

>    should you not also
> include XPath expressions, which are used for much the same purpose, and
> which have the same context sensitivity?

(5. if one conflates value and the lexical domains all is lost.)

it is possible to translate things which are encoded as xpaths into values
which are combinations of strings, numbers, and unames. these values exhibit
no context dependancy. maybe it would help if the respective standards took
the requisite care and named them upaths instead.) there is no reason to
specify this value domain as a primitive. it suffices to specify it as a
combination of its constituents and to specify the translation between such
values and combinations of concrete expressions in lexical domains.

this is no different that the relation between a list-of-numbers lexical type
for attribute values together and a value domain specified to be a list of numbers.

>        Does that mean two datatypes?
> Or one base one, with a means of deriving QName and XPath expression
> from that base one?

the latter.

>        What if some working group manages to come up with
> an XPointer equivalent that people can actually agree upon and use; is
> that not also likely to have context sensitivity, and to have a clear
> relation to the existing pointer-like types?

if they manage to keep the value and lexical domains straight, there should be
no problem.

> 
> [laudable plea for comprehensibilitiy and extensibility]

...




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS