[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] Convert any data format to XML … very cool
- From: "Costello, Roger L." <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Tue, 24 Mar 2015 15:35:41 +0000
Hello Graham,
I forwarded your message to the editor of the DFDL specification and he kindly responded:
Mr. Hannington is correct that DFDL does not support offsets. It was something we (on the DFDL committee) knew existed out there, but it didn't make the cut for DFDL v1.0 as it raises a whole bunch of issues and it was much more important to get the v1.0 standard done than to cover every case. Admittedly though, the IBM representatives on the DFDL workgroup recently reported a customer of theirs having a similar issue with a format containing offsets instead of lengths. I am maintaining a list of requested features for DFDL v2.0 on the Daffodil project Wiki at: https://opensource.ncsa.illinois.edu/confluence/display/DFDL/DFDL+2.0+Wishlist
Now, sometimes, with clever use of expressions and the dfdl:valueLength and or dfdl:contentLength functions and elements with calculated values, one can convert offsets into lengths. As an example, if you have an array of offsets, but it is known that the object at index $i has length given by the delta between two offsets, say offsets[$i) and offsets[$i+1], well then the lengths can be computed by subtracting. So I would be interested in seeing this format to see what could be done along these lines.
With respect to the "Converting any data format" - it is true this is an overstatement, depending on how broadly you interpret the word "data". For example, DFDL v1.0 is intentionally not Turing-complete by itself, so anything that requires general recursion or iteration cannot be expressed. DFDL is limited to describing data that roughly speaking, can be described as a one-pass instance of the data grammar in the specification, and such data is a tree-like nest inherently.
This notion to cover "any data format" including things that aren't typically thought of as data, was explicitly not our goal. For example, the lack of general recursion means DFDL cannot describe most recursively nested document formats, such as RTF, or MS-Word. But those things, while interesting challenges, are "document" formats, and are absolutely not what most people think of when they talk of a data set. Similarly, a core-dump of memory from some process is an interesting puzzle. But nobody talks of a core-dump as a data set to be loaded into some data store (except possibly as one giant blob probably for some subsequent forensic purpose). Now we think DFDL can be helpful here, in that it can describe various of the structures one would encounter in order to process such a file. But DFDL's job is not to "solve" the problem of parsing such a file down to the objects in the heap.
The goal of DFDL is more like "stop engineers from having to reinvent the data-intake wheel that has been reinvented time and time again in every product that takes in variously-formatted data" That is of having to create yet another ad-hoc way of describing data formats that ends up being the Achilles heel of the product ..... that was the goal.
DFDL v1.0 does quite well for this. It has features in it which are the union of every capability of every data handling tool that the committee members had any experience of. It can handle dense bit-packed data formats like MIL-STD-2045, and rich text-oriented formats like HL7. We're got a start at a growing library of these at https://github.com/DFDLSchemas.
You can ask general questions about DFDL by way of the daffodil-users@oss.tresys.com (you have to sign up for that list). Or by email to dfdl-wg@ogf.org.
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]