XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Proposed requirements on solutions that convertXML-illegal characters into XML



On 25 April 2017 at 19:49, Costello, Roger L. <costello@mitre.org> wrote:

Hi Folks,

XML 1.0 has a limited set of characters. Some other data formats have a superset of characters – the other data formats may have characters that would be illegal in XML.

Suppose the other data format is to be converted to XML. How will the illegal characters be handled?

Other data format -> convert -> XML

Example: the JSON data format has a superset of characters. Suppose you want to convert the following JSON to XML:

{
 
"key":"\u0000"
}

 

\u0000 is a JSON encoding of the NUL (hex 0) character. Recall that the NUL character is not allowed in XML.

I am collecting requirements on the process of converting other data formats into XML. Below is my list thus far. Do you agree with the list?


I agree it's a list.


 

Are there requirements that you would add/delete?

1. The conversion must result in legal XML. Thus, conversion of the above JSON must not produce this:

<key>&#x0;</key>

That is not legal (well-formed) XML.

this should go without saying: it is implied by "conversion to XML"  there is no such thing as XML which is not well formed, it's just not XML.

2. The conversion must be round-trippable. The operation must be lossless. Thus, it is not acceptable to convert the above JSON to this:

<key/>

Data has been lost. That is a lossy operation and is not round-trippable.


A good requiremet to have.

3. The conversion must output standard XML. The XML must not contain syntax/encoding that is specific to the other data format. The XML must be processable using standard XML tools. Thus, it is not acceptable to convert the above JSON to this:

                <key>\u0000</key>

That has a JSON-specific encoding embedded within XML. If we wanted, say, to do a string comparison on the value of <key>, the application would need to understand the JSON syntax.


Without a definition of "Standard XML" I don't think this requirement means anything. 
the content of an XML element is always in some format specified outside of XML if you have <p>Hello World</p> you need to understand English to make sense of the content which is no different from understanding that \u0000 means null (if that is what it means in this context)


4. The conversion must output readable text. No hexadecimal text output. Thus, it is not acceptable to convert this:

{
 
"message": "Hello \u000C World"
}

 

to this:

 

<message>48656c6c6f200c20576f726c64</message>


This doesn't seem a useful restriction.

 

Well, that’s a start. What are the other requirements for converting illegal characters to XML?

 

Have these requirements boxed me into a situation where no solution is possible?


impossible to say. If for example you use

<message>hello <char>0</char> World</message>

does that  meet all four of your requirements, I can't tell.
(that is content model of message is character data or char elements and content of char is a decimal number representing a unicode character of that number.)


 

/Roger

 


David



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS