Hi Roger, parsing is always curious. As you know, there are many XML parsers available. Here are 2 levers for XML parsing questions like these.
References [1] Canonical XML Version 1.1, W3C Recommendation 2 May 2008, https://www.w3.org/TR/xml-c14n11 [2] Extensible 3D (X3D) encodings, Part 3: Compressed binary encoding, 4 Concepts, 4.2.3 X3D canonical form
https://www.web3d.org/documents/specifications/19776-3/V3.3/Part03/concepts.html#X3DCanonicalForm X3D is highly numeric/geometric. Here are those details on C14N additions we found useful. (Although written with Fast Inforset compression, our next version will reference EXI). 4.2.3 X3D canonical form
Conceptually, the X3D scene input to the Fast infoset encoder is an XML-encoded document with certain restrictions. X3D canonical form eliminates file ambiguities that have no impact
on the 3D content but which otherwise would negatively impact security issues, compression or parsing performance.
X3D canonical form is based on Canonical XML (see 2.[XML-Canonicalization]) which specifically allows modification to the default XML canonicalization rules. This provides the ability
to establish equivalence between differently formatted (but functionally identical) XML documents. This capability is required for the application of XML Encryption (see 2.[XML-Encryption]) or XML Signature (see 2.[XML-Signature]) syntax and processing techniques.
The following X3D canonicalization restrictions are applied to an X3D scene (or scene fragment) prior to encryption, signature or compression:
EXAMPLE 1
EXAMPLE 1
“Hello, quotation marks”
NOTE 1 The default DTD is not included in the final Compressed binary encoding, only substitute DTD values are compressed.
NOTE 2 The default X3D Schema attributes are not included in the final Compressed binary encoding, only substitute X3D schema attribute values are compressed.
3.
Floating point values are not converted to or from scientific notation; instead they retain their original form.
EXAMPLE 2
EXAMPLE 3 The following code exhibits X3D not in this form: <Collision>
<Shape containerField="children" />
<Shape containerField="proxy" />
<Shape containerField="children" />
</Collision>
The proper child-element grouping for canonical form is: <Collision>
<Shape containerField="proxy" />
<Shape containerField="children" />
<Shape containerField="children" />
</Collision>
EXAMPLE 4 The construct:
all the best, Don --
Don Brutzman Naval Postgraduate School, Code USW/Br brutzman@nps.edu Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149 X3D graphics, virtual worlds, Navy robotics https://
faculty.nps.edu/brutzman From: Roger L Costello <costello@mitre.org>
Assume the XML document has no CDATA sections, PIs, comments, or DOCTYPE. 1. You are shown just a slice of an XML document: > some text (possibly whitespace) not containing the less than symbol </ That is, you see a greater-than symbol, some text, and then a less-than symbol followed by a forward slash. You are not shown the stuff before > nor the stuff after </ What is it? Does the slice signify an element: the part before > is the start tag, the part after </ is its end tag, and
text is the content of the element? 2. You are shown another slice of an XML document: > whitespace <C C = letter of the alphabet, colon, or underscore. Does that slice signify the end of one element and the start of another element: the part before > is an end tag, the
C in <C is the first character of a start tag, and whitespace separates the end tag from the start tag? 3. Is an end tag always followed by a less-than symbol (possibly with whitespace separating them)? Scroll down to see the answers … 1. You are shown just a slice of an XML document: > some text (possibly whitespace) not containing the less than symbol </ That is, you see a greater-than symbol, some text, and then a less-than symbol followed by a forward slash. You are not shown the stuff before > nor the stuff after </ What is it? Does the slice signify an element: the part before > is the start tag, the part after </ is its end tag, and
text is the content of the element? Answer: It might signify an element (start tag, content, end tag), e.g., <greeting>Hello, world</greeting> But it might not. It might signify an end tag followed by another end tag, e.g., </D> </A> 2. You are shown another slice of an XML document: > whitespace <C C = letter of the alphabet, colon, or underscore. Does that slice signify the end of one element and the start of another element: the part before > is an end tag, the
C in <C is the first character of a start tag, and whitespace separates the end tag from the start tag? Answer: It might signify the end of one element and the start of another element (with some whitespace between them), e.g., </book> <magazine> But it might not. It might signify an element embedded in another element (with some whitespace between them), e.g., <document> <paragraph> 3. Is an end tag always followed by a less-than symbol (possibly with whitespace separating them)? Answer: Yes, with one exception: the end tag of the root element is not followed by a less-than symbol. |