Hi Folks, I find it totally fascinating that XML parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters. Applications that operate (reason) on the post-parsed input know exactly what
they are working on. Wicked neat! Do other data format specifications specify that their parsers perform similar conversions?
Do JSON parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?
Do CSV parsers (Comma Separated Value parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters? Do YAML parsers (Yet Another Markup Language parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters? Do Protocol Buffer parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters? Or, does XML stand apart from other text data formats in this regard? /Roger From: Roger L Costello <costello@mitre.org> Hi Folks, An XML parser does two hugely significant conversions. Suppose we provide input to an XML parser. Here are the conversions that the parser does to the input: 1. The parser converts the characters in the input to Unicode. 2. The parser converts line endings in the input to a linefeed character (hex 0A). What are the consequences of these conversions? Answer: your applications can operate on the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.
I like the term that Amy used: your applications can _reason_ about the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character. /Roger |