[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
What exactly does this mean: an XML document may not contain the NULcharacter
- From: Roger L Costello <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Mon, 10 Jan 2022 14:31:37 +0000
Hi Folks,
Suppose I create an XML document:
<Person-Name>John Doe</Person-Name>
Section 2.2 [1] of the XML specification lists the characters that are permitted in XML documents. The NUL character is not present in the list, i.e., the NUL character is not allowed in XML documents.
That means I cannot directly copy (from somewhere) a NUL character and paste it into the XML document. Nor can I indirectly use the NUL character via the character entity mechanism. So, the "surface syntax" cannot contain the NUL character, either directly or indirectly. [I hope that I am using the phrase "surface syntax" correctly.]
I save the above XML document to a file: person.xml
I run an XML parser on person.xml
The parser builds an in-memory parse tree.
Next, an application modifies the node in the parse tree that contains the string "John Doe", appending a NUL character. Seem strange to do such a thing? Not at all, DFDL processors does this routinely. (DFDL = Data Format Description Language)
person.xml doesn't contain the NUL character. Its in-memory parse tree contains the NUL character.
Is person.xml still XML?
Is the in-memory parse tree no longer XML since it contains the NUL character?
The "surface syntax" cannot contain the NUL character, but can the parse tree contain the NUL character?
/Roger
[1] https://www.w3.org/TR/xml/#charsets
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]