OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Suggestions for a slightly less verbose (and easierto a

[ Lists Home | Date Index | Thread Index ]

Sean McGrath wrote:

> [Tim Bray]
>  >Machine-generated XML usually doesn't do entities or CDATA.  It does do
>  ><someTag>
>  > ..stuff..
>  > ..stuff..
>  ></someTag>
>  >and perl is just the ticket.
> The problem of course is that there is no way to tell whether or not
> the 1 Gig XML instance you are about to process contains any entities,
> CDATA sections etc.

Right, if there's no way to tell, you might as well go ahead and use a 
real XML processor.  On the other hand if you're processing a database 
dump that gets generated by the same batch job every week and it's in 
XML, go ahead and use regexps or whatever turns your crank.

Mind you, when the morons in IT change the batch job without telling 
you, your code will break.  In this case, you *might* have done better 
using the XML processor, because if you were lucky enough that the 
morons broke well-formedness you're going to get a helpful error 
message; but if they just changed <foo> elements to <FindOuterOtter>, 
you're hosed whether you were using an XML processor or not.

> I see three possibilities to make this work reliably:

I don't think any of them are cost-effective.  If you're not 100% sure 
what you're getting, use a real XML processor and the problems go away. -Tim


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS