[
Lists Home |
Date Index |
Thread Index
]
This is interesting and relevant to the discussion of binary payloads of
scalar data.
In my opinion, the first level for "binary XML" (i.e., a new more
efficient XML-like format) is to improve the efficiency of the
structure. The second level can involve exactly the ideas apparent
below in the description of DFDL. It is debatable whether a format spec
would include definition of the binary data, standard types, and
built-in type notation in a self-contained way, but if it's not in the
spec for "binary XML", then it would be layered on top just as the
schema specs are now layered.
A format could be able to contain all standard scalar formats and have
an efficient MIME-like way to note the types but that any schema
language should be a separate specification. A format could support
text, labeled scalar, and opaque binary formatted data with choice and
placement of metadata controlled by the application. The DFDL work
could support both labeled self-described and opaque methods, with the
former supporting self-contained instances and the latter being normally
more efficient by having metadata out of band.
sdw
mike.beckerle@ascentialsoftware.com wrote:
> I do believe that GGF DFDL is relevant to the discussion here.
> https://forge.gridforum.org/projects/dfdl-wg/ is the site, and
> https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&category_id=803&document_content_id=2973
> <https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&category_id=803&document_content_id=2973> (or
> http://tinyurl.com/435j7 in case email clobbered the long URL) is the
> most recent presentation. Around slide 7 is where you'll find content.
>
> Here's a snippet to give you the "DFDL" idea:
>
> E.g., a description of a million element array of little endian double
> floats would be this XSD:
>
> <sequence>
> <element name="data" type="double" minOccurs="1000000"
> maxOccurs="1000000">
> <annotation><appinfo source="http://dfdl.org">
> <representation repType="binary" byteOrder="littleEndian"/>
> </appinfo></annotation>
> </element>
> </sequence>
>
> Several people and companies have been exploring this notion, so we're
> trying to standardize it.
>
> I feel DFDL differs from binaryXML in being descriptive of format
> rather than prescriptive of format. This matters why?
>
> 1) Legacy data formats - Much of the complexity of DFDL comes from the
> need to handle quite complex legacy formats which are tricky to describe.
>
> 2) New data formats, but you need random-access I/O capabilities or
> the ability to memory map the files into some exact memory layout with
> all the alignments and inter-item offsets exactly specified.
>
> 3) you don't want to bother to have to use any particular XML-oriented
> library to write out your data. So long as the data format is
> describable in DFDL you can do your I/O with ordinary I/O operations.
> In other words minimal investment has to be made up front in worrying
> about data format and data interchange issues. DFDL lets you "just get
> on with it".
>
> If none of these 3 apply, then either XML or binaryXML *should* be the
> right thing depending on your data size and performance needs. If you
> are just after efficiency and density then DFDL may be less effective
> for you since a DFDL-described data file isn't necessarily nicely
> self-contained like an XML or binaryXML file should be. (Though we do
> have a placeholder on the issue of how to associate DFDL descriptors
> tightly to binary data so they can't get separated.)
>
>
> ...mikeb
> Mike Beckerle
> co-chair DFDL WG, GGF
>
>
>
>
>
> ------------------------------------------------------------------------
> *From:* Cutler, Roger (RogerCutler)
> [mailto:RogerCutler@chevrontexaco.com]
> *Sent:* Monday, November 22, 2004 5:54 PM
> *To:* Aleksander Slominski; Stephen D. Williams
> *Cc:* Wolfgang Hoschek; xml-dev@lists.xml.org;
> public-xml-binary@w3.org; Kenneth Chiu; Madhusudhan Govindaraju
> *Subject:* RE: use cases: binary XML for scientifc computing
>
> If you are going to be looking at how this stuff fits in with grid
> computing, perhaps it would be worthwhile also to make some
> comments about DFDL? I posted this suggestion previously (11/1)
> and nobody seems to have picked up on it, so maybe the thought is
> not appropriate for some reason, but at first glance DFDL does
> seem related to me.
>
> -----Original Message-----
> *From:* public-xml-binary-request@w3.org
> [mailto:public-xml-binary-request@w3.org] *On Behalf Of
> *Aleksander Slominski
> *Sent:* Monday, November 22, 2004 4:41 PM
> *To:* Stephen D. Williams
> *Cc:* Wolfgang Hoschek; xml-dev@lists.xml.org;
> public-xml-binary@w3.org; Kenneth Chiu; Madhusudhan Govindaraju
> *Subject:* use cases: binary XML for scientifc computing
>
> Stephen D. Williams wrote:
>
>>
>>
>>>
>>> what are use cases for nux: what do you plan to use it for?
>>>
>>> are use cases related to XML Binary Characterization
>>> <http://www.w3.org/TR/xbc-use-cases/>?
>>>
>>> i am a bit disappointed that scientific requirements are
>>> completely omitted form XBC use cases - the closest i could find
>>> is http://www.w3.org/TR/xbc-use-cases/#FPenergy but it skips
>>> over whole issue how to transfer array of doubles without
>>> changing endianess ...
>>
>>
>> I have proposed to the group recently that I create one or more
>> use cases that cover supercomputing, grid processing, and sensor
>> networks.
>
> great to hear this. i think we worked in all those areas -it seems
> XML became very popular and now wit convergence on Grid Web
> Services having efficient binary XML format that can be used
> between "optimized" peers seems to be very important ...
>
>> Your observation seems to validate that point. I would be happy
>> to incorporate anything you could provide. My company builds and
>> maintains Linux supercomputers and I have present and past
>> experience with grid-like processing, so I have some resources
>> and contacts.
>>
>>> we did lot of work in past related to XML performance (in
>>> Indiana University and Binghamton) and are very concerned that
>>> whatever binary XML will be characterized/standardized in W3C
>>> will be of no much use for scientific computing and grids ...
>>
>>
>> Could you provide links or details to any of this work?
>
> we worked on SOAP parsing and optimization for scientific computing:
>
> Madhusudhan Govindaraju, Aleksander Slominski, Venkatash
> Choppella, Randall Bramley, and Dennis Gannon. Requirements for
> and evaluation of RMI protocols for scientific computing
> <http://www.extreme.indiana.edu/xgws/papers/sc00_paper/>. In
> Proceedings of SC00 Conference, Dallas TX, Nov 2000. Available on
> CD-ROM from IEEE
> Kenneth Chiu, Madhusudhan Govindaraju, and Randall Bramley.
> Investigating the limits of SOAP performance for scientific
> computing
> <http://www.computer.org/proceedings/hpdc/1686/16860246abs.htm>.
> In The 11-th IEEE International Symposium on High Performance
> Distributed Computing HPDC-11 2002 (HPDC'02), Jul 2002.
> Madhusudhan Govindaraju, Aleksander Slominski, Kenneth Chiu,
> Pu Liu, Robert van Engelen, and Michael J. Lewis. Toward
> Characterizing the Performance of SOAP Toolkits
> <http://www.extreme.indiana.edu/xgws/papers/soap_perf_char_grid2004.pdf>.
> In 5th IEEE/ACM International Workshop on Grid Computing, November
> 2004
> Kenneth Chiu and Wei Lu. A Compiler-Based Approach to
> Schema-Specific XML Parsing
> <http://wam.inrialpes.fr/www-workshop2004/ChiuLu.pdf>. In First
> International Worksop on High Performance XML Processing(Satellite
> of WWW2004), May 2004.
> Kenneth Chiu. XBS: A Streaming Binary Serializer for High
> Performance Computing. In Proceedings of the High Performance
> Computing Symposium 2004. Society for Computer Simulation
> International, 2004
>
> however we never got enough forward momentum to come up with a
> proposal for binary XML but still we are willing to work to get
> use cases described.
>
>> How do you think that XML, espeically a binary characterized XML,
>> should related to HDF5?
>
> HDF5 looks to me like a separate problem as it defines its own
> schema for its own representation so that is a big task how to
> make HDF5 to XML Infoset.
>
> we are more interested in how to transfer scientific data (mostly
> arrays of primitive types or simple structs with primitive types
> that can be perfectly well expressed in XML Infoset but are also
> extremely inefficient including dreaded IEEE float conversion to
> string and back) and make it consistent with XML messaging (such
> as SOAP).
>
> thanks,
>
> alek
>
>--
>The best way to predict the future is to invent it - Alan Kay
>
>
--
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
|