[
Lists Home |
Date Index |
Thread Index
]
On Thursday 20 December 2001 04:31 am, Michael Kay wrote:
> This is unrealistic. They aren't your character sequences, they are
> something that someone else put in a database donkey's years ago, and it's
> your job to ship them somewhere else without asking any questions.
That's not necessarily textual data though.
I understand the issue here, and also understand why people might want this,
but I can't a good cost/benefit ratio. One way or another these things will
have to be interpreted/decoded, so why not use some other mechanism? We've
already hear PI's and entities (not numeric character references as they're a
lexical structure) proposed.
> There's a purist approach, which says that once you move data using XML,
> you have to label it properly, which means you either have to make sure
> it's 100% clean before transmitting it, or you have to label it as binary.
"Be conservative in what you produce, and liberal in what you consume" is the
internet mantra..
> And there's a pragmatic approach, which says that the data is supposed to
> be text, and I want the benefits of treating it as text, but if for any
> reason a control character has kept in (perhaps due to inadequate input
> validation when the data was first entered), then that's part of the data
> I've been given and it's not my job to fix it, I'm only the messenger.
To me, that's either a bug or a kludge.
> PS: And of course there are some situations where control characters can
> legitimately appear in text. Am I the only one who remembers putting a BEL
> character in error messages and delighting as the teletype went "ping!"?
> The joy went out of that when the bell was emulated by electronic beeps,
> but one day, your database of 1960s classic software error messages is
> going to find a BEL in it.
You should see some of my animated ASCII christmas cards ;-)
That said, even in these cases, I think control characters are not text.
|