[
Lists Home |
Date Index |
Thread Index
]
Sean McGrath wrote:
> [Tim Bray]
> >Machine-generated XML usually doesn't do entities or CDATA. It does do
>
> ><someTag>
> > ..stuff..
> > ..stuff..
> ></someTag>
>
> >and perl is just the ticket.
>
> The problem of course is that there is no way to tell whether or not
> the 1 Gig XML instance you are about to process contains any entities,
> CDATA sections etc.
Right, if there's no way to tell, you might as well go ahead and use a
real XML processor. On the other hand if you're processing a database
dump that gets generated by the same batch job every week and it's in
XML, go ahead and use regexps or whatever turns your crank.
Mind you, when the morons in IT change the batch job without telling
you, your code will break. In this case, you *might* have done better
using the XML processor, because if you were lucky enough that the
morons broke well-formedness you're going to get a helpful error
message; but if they just changed <foo> elements to <FindOuterOtter>,
you're hosed whether you were using an XML processor or not.
> I see three possibilities to make this work reliably:
I don't think any of them are cost-effective. If you're not 100% sure
what you're getting, use a real XML processor and the problems go away. -Tim
|