OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Parser Behaviour (serious)

[ Lists Home | Date Index | Thread Index ]
  • From: Peter Murray-Rust <peter@ursus.demon.co.uk>
  • To: xml-dev@xml.org
  • Date: Mon, 03 Apr 2000 01:00:54 +0100

At 11:02 AM 4/2/00 -0700, Tim Bray wrote:
>At 10:22 AM 4/2/00 +0100, Peter Murray-Rust wrote:
>>What's wrong? Ah! The parser is trying to resolve the URL for the DTD and
>>since I'm offline (connections cost money over here) it can't. So the file
>>I have created can only be processed as XML if:
>>	(a) I am connected online
>>	(b) the W3C maintain *** for all time *** a means of dereferencing either
>>the FPI or the URL
>>
>>I can't believe this is what the community wants. It fooled me, and I've
>>been working with XML for some time.
>
>I think it's simpler than you make it out to be.  You have to decide
>whether, for what you need to do, you need the DTD or not. If you need the
>DTD, then either you have to have a net connection to where it is, or you
>need to maintain a local copy and use a "file://" URL.  

I agree with this, though it is non-trivial to maintain a local copy. *I*
don't produce the files - tidy/DaveR does, and links them to a remote URL.
I would have to edit every file and replace the SYSID. Non-trivial and
extremely error prone.

>Another option is
>you could implement a Public Identifier resolver, which is pretty easy
>even though there's not yet a standardized interoperable scheme for this.

"Pretty easy" to write my own hacked code, but extremely difficult to make
sure that my interpretation of everyone else's FPIs is the same as theirs.
I thought the general consensus was that FPIs were to difficult to maintain
unless there was a central (free) repository and we'd more or less given up
on them for XML.

>If you *don't* need to read the DTD for your task at hand, then you don't
>have a problem.  You say the file "can only be processed as XML if" and
>I just don't buy this; it is explicitly OK to skip the external subset
>and in fact a common practice in many processing models (in particular
>with XHTML the DTD is going to be an order or two of magnitude bigger
>than the average instance, so the fetching/processing cost is nontrivial).
>
>Clearly, this presupposes that the software you're using has some sort of
>switch that allows you to tell it whether or not to read the DTD; which 
>seems like a basic must-have and one that exists in every XML tool I've
>worked with.

I am not sure it does, and anyway there is no consistency about the exact
behaviour. But the worse problem is that there are an increasing number of
tools with parsers buried in them and we don't have any control over them.
The two main problems that worry me are (a) external DTDs and (b) external
entities.

>
>What am I missing? -Tim

Probably that newcomers to XML are going to find this surprising and
difficult :-). What worries me is that I spend more time than I would like
*editing* files like this to get them to behave. I wanted to do a simple
job - take a large number of HTML files and analyse the hyperlinks. tidy
converts them to XHTML - I simply run them into a DOM and traverse to find
a@href, frame@src and so on. I assumed it would be a simple job before I
hit this problem.

I agree with DavidM that SAX2 will go some way to help - though it's a
lowish level approach - read the file to see what it contains, and tweak
bits that allow it to be parsed. I also agree with Lee Quin that we are at
the level of acceptable complexity for many people - I know lots of you
don't agree, but I have to sell this to communities which are going to have
problems.

In the current case, I am prepared to accept without reservation that Dave
Raggett has created a file which is:
	(a) well-formed and
	(b) conforms to whatever DTD he has specified in the DOCTYPE.
The FPI/SYSID is simply a way for Dave to stamp the file as conforming to
whatever - I'll take his word.


	P.





***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS