[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are we losing out because of grammars?

From: James Clark <jjc@jclark.com>
To: "K.Kawaguchi" <k-kawa@bigfoot.com>
Date: Thu, 01 Feb 2001 15:50:44 +0700

"K.Kawaguchi" wrote:

> > The lesson I draw from this is that it's better to keep these things as
> > well separated as possible.
> 
> I see.
> 
> However, "type-assignment" is a quite similar task with validation. In
> fact, validator can easily report the type information if it wants to do
> so.

It's not in general easy, unless you restrict the grammar.  For example,
consider the following TREX pattern:

<element name="x">
  <zeroOrMore>
    <element name="y">
      <attribute name="z">
        <data type="xsd:string"/>
      </attribute>
    </element>
  </zeroOrMore>
  <element name="y">
    <data type="xsd:integer"/>
  </element>
</element>

If I'm in an "x" element and I get a "y" element with a "z" attribute
that is a legal lexical representation of an integer, I can't tell
whether to type that attribute as an "xsd:integer" or an "xsd:string"
unless I lookahead and see whether it's the last element "y" element in
the "x".   The TREX implementation works on a stream of SAX events, so
this is a big complication.

> Or, in other words, if one wants to implement a "type-reporter", he/she
> is essentially implementing a validator.

It depends how you restrict the grammar.  If you restrict the grammar as
much as W3C's schemas, type assignment is significantly simpler than
validation (since I believe I am correct in saying that for W3C schemas
the type of an element depends only on its name and the names of its
parents).

> In yet other words,
> 
> > are separate functions and that mushing the two together is a bad idea:
> > I may want to validate without augmenting the infoset and I may want to
> > augment the infoset without validating.
> 
> "Validation without type-assignment" is possible,

We agree on that.

> but "type-assignment
> without validation" is not possible.

As I indicated above, it depends.

> Therefore, in implementation level, validator can (and I think it 'should') incorporate
> type-reporter.

I would agree with 'can', but not with 'should'. There are many
applications for which type-assignment is not necessary; I think
dispatching on the "FQGI" (ie on the name of the element and the names
of its ancestor elements) is sufficient for many applications.  Type
assignment may require quite different implementation techniques from
validation.

> I asked this question because your implementation doesn't incorporate
> type-reporting capability.

Correct.  It's just not something I've ever felt a great need for.  I
also think there's a huge potential for abuse (as Eric van der Vlist
pointed out). I also feel very uneasy about the whole idea of reporting
complex (in the W3C XML Schema sense) type names to applications: it
feels a bit like in XML exposing the names of parameter entities to the
application and I've never heard of anybody asking for that (unless the
are writing a DTD editor).  Exposing simple types makes a lot more sense
to me: that's like asking for the type of an attribute.

Now it's my turn to ask you some questions.

- You seem to think type-assignment is very important.  Why?

- Your ambiguity detection algorithm for RELAX detects whether it is
possible to assign labels to elements in more than one way. I would find
it more interesting to know whether it is possible to assign datatypes
(as specified by the RELAX "type" attribute) to leaf elements and
attributes in more than one way.  Is it possible/easy to detect this
kind of ambiguity?

James

Follow-Ups:
- Re: Are we losing out because of grammars?
  - From: "K.Kawaguchi" <k-kawa@bigfoot.com>
- Re: Are we losing out because of grammars?
  - From: Norman Walsh <ndw@nwalsh.com>

Prev by Date: Re: Are we losing out because of grammars?
Next by Date: Re: XML versus Relational Database
Previous by thread: Re: Are we losing out because of grammars?
Next by thread: Re: Are we losing out because of grammars?
Index(es):
- Date
- Thread