XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] What exactly does this mean: an XML document may notcontain the NUL character

You might consider that control characters are, in a sense, not characters at all. They are there to allow higher level protocols to be implemented above the textual.  They are not "this is" codes but "do this" codes, IYSWIM.

For example, the standard C libraries treat a NUL as a terminator for strings. Or BS is to backspace a print head. Or EOT is to end a transmission.  Etc.

So to allow NUL directly in transmitted text is a layer violation.  

Would it actually hurt much to allow a XML documents to have �  ? Not at all, if you are willing to make life more difficult for developers in languages using null terminated strings. (In effect, you would be penalizing the C traditional Open Source ecosystem such as the GNU family, over the more recent languages and platforms such as Rust,  .NET, Java, etc.)

In the past, it seemed that the people who wsnted NUL often wanted to put binary data into XML: fragments of non-unicode. So Bin64 notation provides a different way.

Cheers
Rick

On Tue, 11 Jan. 2022, 01:31 Roger L Costello, <costello@mitre.org> wrote:
Hi Folks,

Suppose I create an XML document:

<Person-Name>John Doe</Person-Name>

Section 2.2 [1] of the XML specification lists the characters that are permitted in XML documents. The NUL character is not present in the list, i.e., the NUL character is not allowed in XML documents.

That means I cannot directly copy (from somewhere) a NUL character and paste it into the XML document. Nor can I indirectly use the NUL character via the character entity mechanism. So, the "surface syntax" cannot contain the NUL character, either directly or indirectly. [I hope that I am using the phrase "surface syntax" correctly.]

I save the above XML document to a file: person.xml

I run an XML parser on person.xml

The parser builds an in-memory parse tree.

Next, an application modifies the node in the parse tree that contains the string "John Doe", appending a NUL character. Seem strange to do such a thing? Not at all, DFDL processors does this routinely. (DFDL = Data Format Description Language)

person.xml doesn't contain the NUL character. Its in-memory parse tree contains the NUL character.

Is person.xml still XML?

Is the in-memory parse tree no longer XML since it contains the NUL character?

The "surface syntax" cannot contain the NUL character, but can the parse tree contain the NUL character?

/Roger

[1] https://www.w3.org/TR/xml/#charsets

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS