[
Lists Home |
Date Index |
Thread Index
]
- From: Paul Prescod <papresco@technologist.com>
- To: xml-dev@ic.ac.uk
- Date: Sat, 16 May 1998 15:59:32 -0400
Jon Bosak wrote:
>
> A perl regexp is the *upper bound* of sophistication for this
> constituency. Please try, if you can, to imagine being faced with the
> job of doing an element-specific mass search-and-replace over two
> years' worth of company reports when all you know about XML is what
> you can see by looking at the source, you've never heard of the
> concept of a normalizer, and the only scripting tool you know how to
> use is the Word or WordPerfect macro language.
I do not believe that a person with the knowledge level you have described
is going to succeed at the task you have set for him or her.
Entities are going to kill them.
Whitespace in end-tags is going to toast them.
CDATA sections are going to confuse them.
Elements (and tags!) broken across lines are going to destroy them.
This person can only succeed if
a) the data is already normalized, probably due to a corporate standard
such as the one you mention.
b) they download a normalizer.
If I am wrong, it would be easy to prove me so. All someone has to do is
provide a regular expression that can (for instance) change all
occurrences of the GI "FOO" into "BAR" in any XML document corresponding
to a DTD of their choice (but which I can extend in the internal subset).
On the other hand, I can do this *trivially* in a regular expression on
data that has been normalized.
> SGML gives you the option of using empty end tags, and the
> historical fact is that most large users, given this option and a
> sufficient amount of experience with it, choose not to use it.
These "large users" have expensive SGML editors that they have paid
someone thousands of dollars to customize to perfection. Under those
conditions, I would legislate redundancy also -- not just fully expanded
end-tags, but probably redundant IDs in comments of end-tags, public
identifiers on all entity declarations, perhaps even unique identifiers on
all elements.
But XML is about a different world than that.
Paul Prescod - http://itrc.uwaterloo.ca/~papresco
"A writer is also a citizen, a political animal, whether he likes it or
not. But I do not accept that a writer has a greater obligation
to society than a musician or a mason or a teacher. Everyone has
a citizen's commitment." - Wole Soyinka, Africa's first Nobel Laureate
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|