xml-dev - Re: Internal subset equivalent in new schema proposals?

Re: Internal subset equivalent in new schema proposals?

[ Lists Home | Date Index | Thread Index ]

From: Ketil Z Malde <ketil@ii.uib.no>
To: Joel Bender <joel@spooky.emcs.cornell.edu>
Date: 02 Dec 1998 09:48:53 +0100

Joel Bender <joel@spooky.emcs.cornell.edu> writes:

> I was thinking along similar lines.  I've been adding something like this
> to my XML documents:

> 	<prop name="state" xml:regexp="[A-Z]+">NY</prop>

It's a neat way of doing it, since checking is optional and
transparent to non-checking applications.

> So the parser can verify that the CDATA matches the regular expression.
> Works OK for content, but I don't see how I can add this meta-meta-data for
> attributes.

The dividing line between attributes and elements is a fine one,
anyway.  Is it a real restriction to have the user embed constrained
information content in elements and not attributes?  E.g.

	<prop>
	  <name xml:regexp="(state|county|city)">state</name>
	  <prop-content...> </..>
        </prop>

or perhaps rather

	<prop>
	   <!-- one of state, county, city -->
	   <state xml:regexp="[A-Z]+">NY</state>
	<prop>

>  That is to say, how can I tell the parser that the 'name'
> attribute value for the 'prop' entity must be of the form
> "[a-zA-Z_][0-9a-zA-Z_]*"?

Not to mention the form of the xml:regexp attribute, eh? :-)

Actually, that *is* a problem, since as a DTD designer, I want to
express the lexical data formats my applications handle, I wouldn't
want to leave this to document authors, who probably know more about
technical writing, and less about the technical limitations of the
application software.

By the way, you *can* check attributes by doing

	<prop name="state" name.regexp=""[a-zA-Z_][0-9a-zA-Z_]*"...>

or something, can't you?

> Of course this also brings up the murky waters of grep syntax, which I've
> been avoiding.

Well, looking back, I realize I consider regular expressions a simple
solution.  Looking further back, I realize that this is because of a
long and shady past of juggling Unix shell scripts.

On the other hand, regular expressions are very powerful, and you
don't really need to know all the ins and outs to write simple ones,
like "[A-Z]" or "(one|two|three)".  And many of the special characters 
are used in DTDs already (+*?).

~kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Follow-Ups:
- Re: Internal subset equivalent in new schema proposals?
  - From: James Robertson <jamesr@steptwo.com.au>

References:
- Re: Internal subset equivalent in new schema proposals?
  - From: Joel Bender <joel@spooky.emcs.cornell.edu>

Prev by Date: Re: <!ENTITY amp "&38;">
Next by Date: Re: Internal subset equivalent in new schema proposals?
Previous by thread: Re: Internal subset equivalent in new schema proposals?
Next by thread: Re: Internal subset equivalent in new schema proposals?
Index(es):
- Date
- Thread