OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] Interoperability [long]



At 09:16 14/11/2001 -0800, Tim Bray wrote:
>[...]
>What kind of problems do you run across?

Gosh, where to start! I run into problems flowing XML from
content creation, through editing, through processing, through
content management, through delivery processing.

I would identify these are the most irksome:

(Warning: free format attempt at documenting the
problems I've been through in the last two weeks
just getting some simple XML into a browser, validated
and in/out of some simple filtering programs, follows:)

1) Round-tripping problems
Most of my XML processing is XML to XML processing.
A variety of nasty things happen to things like entitiy refs,
encodings, comments, cdata secs etc. The usual stuff
I get in a fluff about on this list.

2) Display problems
Its amazingly hard to get a good result rendering
XML with CSS2 . Its not that CSS2 isn't up to it, it is that
things like attribute defaults, entity expansions etc. that you
want to keep external to the instance go unnoticed by
XML browsers that don't read the external DTD.
This stuff is real important for things like qualified
styles. You end up adding things to your instance that you
would prefer to leave external just to get the content
to display right.

3) Namespace problems
Back in the SGML days with things like Panorama (based
on Synex Viewport) it was possible to get tabular display
of arbitrary markup. In Opera 5 for example, you get
tabular display by using the table model from the
"http://www.w3.org/TR/REC-html40"; namespace.

But Opera don't read no external DTD, so I cannot do this:
<ATTLIST table
              xmlns   CDATA               #FIXED 
"http://www.w3.org/TR/REC-html40";

I must add the attribute to *every* instance of the table in my documents.
Then my authors complain saying "what the f&*k is this polluting
my table markup".

Now although I want to get tables for editing/browsing I don't
want to throw away DTD validation. DTDs don't support
namespaces. Bummer. One solution is to fix the prefix
in the instance like this:
         xmlns:x="http://www.w3.org/TR/REC-html40";

and in the DTD like this:
           xmlns:x   CDATA               #FIXED 
"http://www.w3.org/TR/REC-html40";

Now I can validate but have wired the prefix. Bummer. Could use
parameter entities to avoid that but then I scare my para-techs  with
a DTD that looks rather complicated with all those percents
%allovertheplace; (I told them XML would be easy!)

I could just abandon validation. Don't like that option. Would end
up coding too much data-validation in business logic. Could
jump for a complete namespace aware schema language.
Don't like the sound of that. People way smarter than me
are not even sure that XML Schema is implementable!

BTW, have you guys on xml-dev seen the collateral damage
in terms of  complexity/readability to the SAX/DOM/XSLT
programs that have become "namespace aware"?

Hey! I could add the FIXED attributes into the internal subset.
Cannot find any documentation on what Opera might
do with such an approach... I know for sure that my filter developers
writing SAX filters that have handlers for startElement(), endElement(), pis()
and characters() will be unhappy if I tell them they need
to round-trip the stuff in the internal subset. In fact I can tell
you now for sure that that stuff will just get lost. I can here
the screams from the content manager now...

4) Locating DTDs

I want to put DTDs somewhere central. I don't want to lug them around
each directory I have XML files in so that:
<!DOCTYPE foo SYSTEM "foo.dtd">
works.

I could use a full URI but then I need HTTP running locally or live with
the hit of pulling this stuff across an unreliable network. Not good.

Could use SOCAT but patchy support on the ground for this. So
much for freely interchangeable tools.

Could separate the prolog completely from the instance so that
entity A says <!DOCTYPE foo [ ... [> and entity B says
<foo>..<./foo>.

This worked great in the SGML days. You could pass multiple
system identifiers to NSGMLS for example and it would concatenate
them prior to parsing. Not workable with XML because although nothing
in the XML 1.0 spec prohibits it, parsing tools don't like it. If I have
to customize a parser to do this management will look at me
funny given all they have read about freely available parser tools
out there!

5) Creating simple hypertext effects

The ball has been dropped on linking for years. This is not XML's fault
but it sure doesn't help creating simple viewers for XML, which
then reflects badly on XML. I have made
it work in Opera 5 but it required proprietary CSS2 extensions
and an incantation in the instance from a long lost XLink WD:

         <xref xml:link="simple" show="replace" href="#preamble">

Again, because the browser ignores extenal subsets, I have
to carry these attributes around *every* xref in my instances
even though they never change value. Bummer.

I can't put them in the internal subset because the filter-writers
will loose them in XML -> XML processing. Bummer.

6) Character encodings

I want to ensure that my documents do not use characters
outside the ISO-8859-1 range. But I don't want to
use an iso-8859-1 encoding declaration because parsers
are not required to support it.

But I also don't want to say UTF8 (or leave it out and
have it default to UTF8) because then my authors can
slip in some Kanjii that will blow my downstream processes
to pieces.

What I want to say is "this document uses Unicode but
just uses Unicode characters in such-and-such a range" How
do I do that?

Oh, BTW, Opera and lots of other tools out there that
call themselves XML compliant, don't do Unicode. Worse, they
silently don't do Unicode. You find these things out
the hard way.

7) Browser/Editor styling

After two weeks I now have a CSS2 system for
rendering some XML. Opera looks okay but I have
had to add kludgy stuff to my instances to make
the links work and proprietary stuff to my stylesheet.

Even taking out the proprietary stuff out of the
stylesheet, I get varying degrees of crud in
every CSS compatible editing system.
Again, not XML's fault per se but being able
to view the stuff easily is kinda important!

=====
That lot is just for starters. Gotta get back to work.
Any suggestions on these issues greatly appreciated.

Call me a fuddy-duddy but simple stuff like this
was simpler with the *complex* SGML standard
than it is with the *simple* XML standard.

To return to the original spark of this, I believe that a significant
part of the problem is that XML's definition is just syntax
and compliance with the syntax doesn't tell you a lot
when it comes to tying components together into complete
systems.

This is because it does not give you any guidance as to what
XML you will get *out* of a system - it just tells you
that you can get your stuff *in* to the system. Great
for vendors and consultants. Not so great for John Q.
Webhacker.

It doesn't give you any feel for what
the xml compliant system will do at layers above
the syntax which is where the rubbber hits the
road for interoperability.


Sean