Lists Home |
Date Index |
> I'll second that. It is a basic approach but works fairly well,
> and you have a convenient place to correct bugs that arise from yet
> another unanticipated input.
> The thing I don't like about it is that you can't hide the regexp
> calls in element(), or you'll escape significant delimiters.
Sure you can. It depends on how many of the tricky things you want to
handle. Some versions of my code even detect the best quote to use for
attributes (single or double) and escape where necessary. Of course you can
just excape all of them quotes in asttribute values, but this way makes the
result more readable, which is good when debugging or showing the result to
someone else. This gets done within the element() call (which may of course
delegate the work).
> A related approach is to use a few more functions, say,
> starttag(), endtag(), attribute() and escapecontent().
> More verbose, but you can encapsulate the regexp handling.
But now you do not automatically get a well-formed xml fragment out of any
one operation. That is one of my goals. It is like an atomic transaction -
all or nothing. If the fragments are well-formed, the whole thing ends up
> The other technique that has proven useful for some situations is
> to use a magic string as an escape for the real delimiters, building up
> a block of XML as needed, then applying a (single but complicated)
> global regexp to replace everything necessary in one fell swoop.
Yes, I have done the magic string escapes too, although I tend to do it on
the little bits at a time rather than all at once. They are really useful
when you might already have some escaped ampersands in character content.
That is, you do not want to escape "& a m p; a second time.