Lists Home |
Date Index |
> In what situations do naked control characters appear in
> text? Control
> characters are by definition control sequences.... I would
> generally consider
> it a BUG if my character sequences contained control characters.
This is unrealistic. They aren't your character sequences, they are
something that someone else put in a database donkey's years ago, and it's
your job to ship them somewhere else without asking any questions.
There's a purist approach, which says that once you move data using XML, you
have to label it properly, which means you either have to make sure it's
100% clean before transmitting it, or you have to label it as binary.
And there's a pragmatic approach, which says that the data is supposed to be
text, and I want the benefits of treating it as text, but if for any reason
a control character has kept in (perhaps due to inadequate input validation
when the data was first entered), then that's part of the data I've been
given and it's not my job to fix it, I'm only the messenger.
PS: And of course there are some situations where control characters can
legitimately appear in text. Am I the only one who remembers putting a BEL
character in error messages and delighting as the teletype went "ping!"? The
joy went out of that when the bell was emulated by electronic beeps, but one
day, your database of 1960s classic software error messages is going to find
a BEL in it.