[
Lists Home |
Date Index |
Thread Index
]
- From: "James Tauber" <jtauber@jtauber.com>
- To: "XML-Dev Mailing list" <xml-dev@ic.ac.uk>
- Date: Mon, 20 Sep 1999 21:18:10 +0800
----- Original Message -----
From: Simon St.Laurent <simonstl@simonstl.com>
> >You don't actually need the "vocabulary". The alphabet of a formal
language
> >is part of the grammar.
>
> In XML-based languages that rely on DTDs or schemas, yes. But in all
> formal languages?
Yes. The grammar includes the symbols it uses.
> Seems that it wouldn't be hard to create a formal
> language that had classes of vocabulary (like noun, verb, adjective) and
> fit them into patterns (subject[noun]-verb[verb]-object[noun]) that were
> separate.
This separation is merely partitioning the grammar into productions that
take penultimate symbols to terminal symbol and all the other productions.
Eg
[1] Sentence -> NP VP
[2] VP -> V NP
[3] NP -> Simon
[4] NP -> XML
[5] V -> likes
What you are talking about is splitting productions 3-5 from 1-2. This is
often done in natural language processing and many theories of (natural)
language make a distinction between the lexicon and the syntactic rules. But
we are talking about formal languages, not natural languages.
[..]
> It's that, but it's also worse. Suppose you have a nice modular DTD that
> expresses most of the vocabulary a user will need to create documents of a
> certain type, but has ANY sections so that users can organize it any way
> they like. Users build sets of DTDs to see what exactly it is they're
> getting or producing, but all of the possibilities are actually open. Is
> the language described by the 'master' DTD, which doesn't get you very
far?
> Or is the language described by the particular DTDs? Or do we measure
> interoperability? A 'master DTD' containing all possibilities will
quickly
> grow obese.
I'm not sure I understand what you are saying here. When a user pieces
together bits of different DTDs, they end up with a *single* DTD. This is a
single grammar definining a single set of valid instances.
> Then there's the simpler case of well-formed documents, for which we can
> _derive_ grammars, but can't make definitive statements above the level of
> XML 1.0 conformance.
Pardon? A grammar for well-formed documents doesn't need to be derived
because it is in the XML 1.0 REC. It is a BNF augmented by WFCs and the odd
bit of prose.
[...]
> I think 'formal language' in that sense is not especially useful except
for
> limited situations, and should probably be reserved for the few cases
where
> XML development is limited to representations of older legacy systems that
> relied on formal languages based on that sense. XML itself, it seems, can
> do better than that.
It can. But formal languages are part of the picture because sometimes there
are syntactic constraints. They might be loose, but they are still a
grammar.
> It depends on what kind of 'formalizing' you want to do. In many cases,
> I'd suggest that we focus on 'relaxing', producing more flexible models
> that aren't so concerned about locking everything down into a single
> grammar and a single vocabulary. It requires a change of mindset.
A formal grammar is still a formal grammar even if it permits any of the
terminal symbols in any order. A more flexible model is still a model. The
moment you model the syntax, you have a formal grammar.
> Why is it that only one validating Java parser allows the application to
> continue after a validity constraint (not a well-formedness constraint)
has
> been violated?
Because the others are wrong.
> I suspect it's because a lot of folks are taking the 'formal grammar' of
DTDs more seriously than the XML 1.0
> spec itself does...
But that has nothing to do with the value of formal grammars. If I present
you with a CFG modelling English and refuse to listen to you unless your
sentences parse to my CFG, that isn't a problem with my CFG *or* the notion
of CFGs in general.
> I don't think we're incompatibly far apart
I actually agree with you completely in pretty much everything but
terminology.
> I just would like folks to look at 'formal languages' a bit more closely
and a bit more critically. Rick
> Jelliffe's made excellent arguments in other postings on this thread, for
example, regarding the ways formal
> languages can obscure as well as illuminate. Right now, I think we need to
contemplate whether 'formal
> grammars' sufficiently distinguish 'languages' in practice before putting
extra work
> for programmers and authors (namespaces) on every formal grammar that
comes
> our way.
I think the XML community would generally agree that:
1. certain classes of formal grammar are not sufficient for the syntactic
constraints people wish to express
2. syntax isn't all there is
Linguists worked these out well before you and I were born, Simon :-)
I think SGMLers did too which is one of the reasons that a Document Type
Definition in SGML includes semantics as well as syntax (see another post
where I follow on from Rick's comments relating to this)
As far as I can tell, no one is arguing that formal grammars are all we
need. I am merely trying to clarify what formal grammars are so that people
understand what is meant when someone says that a language has a grammar or
that a DTD is a grammar.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|