Re: [xml-dev] What exactly does this mean: an XML document may notcontai

On Mon, 10 Jan 2022 at 14:31, Roger L Costello <costello@mitre.org> wrote:

Hi Folks,

Suppose I create an XML document:

<Person-Name>John Doe</Person-Name>

Section 2.2 [1] of the XML specification lists the characters that are permitted in XML documents. The NUL character is not present in the list, i.e., the NUL character is not allowed in XML documents.

That means I cannot directly copy (from somewhere) a NUL character and paste it into the XML document. Nor can I indirectly use the NUL character via the character entity mechanism. So, the "surface syntax" cannot contain the NUL character, either directly or indirectly. [I hope that I am using the phrase "surface syntax" correctly.]

I save the above XML document to a file: person.xml

I run an XML parser on person.xml

The parser builds an in-memory parse tree.

Next, an application modifies the node in the parse tree that contains the string "John Doe", appending a NUL character. Seem strange to do such a thing? Not at all, DFDL processors does this routinely. (DFDL = Data Format Description Language)

person.xml doesn't contain the NUL character. Its in-memory parse tree contains the NUL character.

Is person.xml still XML?

Yes, you haven't changed it.

Is the in-memory parse tree no longer XML since it contains the NUL character?

an in memory parse tree is never xml. But the in memory object you have now can not really be called a parse tree as it isn't generated by parsing anything.

The "surface syntax" cannot contain the NUL character, but can the parse tree contain the NUL character?

The xml spec does not specify any form for the result of parsing. The natural interpretation of "parse tree" would be something obtained by parsing the xml, so that can not contain null, however the in-memory structure may then be manipulated in arbitrary ways (or at least ways specified by other non-xml specifications such as the DOM for (x)html)

You see this all the time with _javascript_ manipulation of DOM trees originally generated by parsing (x)html but then you can generate elements with names consisting of (more or less) arbitrary strings and content containing non xml content such as nulls, by manipulating the DOM object from _javascript_.

/Roger

[1] https://www.w3.org/TR/xml/#charsets

David

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php