Excellent example Roger, thanks for sharing. I expect that use of XML attributes will certainly reduce size of XML and possible EXI conversions too. Presumably you are using schema-aware EXI compression. Defining enumerations for common strings and judicious default values can help. Of course schema validation also helps identify (and isolate) bad-data problems precisely, avoiding Garbage In Garbage Out (GIGO) syndromes. Thesis work by past NPS graduates were consistently able to meet or beat file sizes for zip/gzip of text data, and a few cases of arbitrary binary data, with corresponding EXI. This was true in every case we pursued and remains a worthy goal in general. We are thinking about how we might put together a utility for such computations/comparisons. Might be a worthy feature for DFDL implementations someday, helping implementers employ best practices for validatable/compressible XML. When doing so, one also achieves correspondingly higher performance (and reduced computational/energy cost) for data decompression. all the best, Don -- Don Brutzman Naval Postgraduate School, Code USW/Br brutzman@nps.edu Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149 X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman From: Roger L Costello <costello@mitre.org> Hi Folks, I created a parser to parse air navigation data (air nav data is the data that is loaded into the aircraft’s computer to enable it to fly the aircraft). The parser is completely declarative, i.e., it has no code. The parser is specified using the DFDL language, which is a simple extension of XML Schema. Here is an excerpt of the DFDL schema that I wrote: <xs:element name="GLS_Channel" type="validString" dfdl:lengthKind="explicit" dfdl:length="5" /> Notice how declarative it is. It specifies “what” data fields are in the air nav document, not “how” to parse its data fields. That, in my opinion, is a huge benefit. I fed the air nav file into the Apache open-source DFDL processor (Daffodil), along with my DFDL schema. I instructed the DFDL processor to serialize the parsed air nav data to XML: I could have instructed the DFDL processor to serialize the parsed air nav data to JSON or to EXI (binary XML, i.e., super-compact XML) or to a number of other formats. The XML document that parsing produced is highly readable. Here is an excerpt of the XML output: <GLS_Channel>20abc</GLS_Channel> Notice how readable it is. And, of course, the XML document is highly processable. The entire XML suite of technologies may be used to process it. In addition to the advantages just listed, DFDL also has the advantage of unparsing. Thus, I was also able to unparse the XML: Here are some interesting statistics:
/Roger |
Attachment:
smime.p7s
Description: application/pkcs7-signature