OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Xqueeze: Compact XML Alternative

[ Lists Home | Date Index | Thread Index ]

Hi all!

Let me first state on the record that I am not a professional
developer using XML applications day-in and day-out so you might find
the breadth of my knowledge a little limited. Xqueeze is a project I
am pursuing purely out of interest - I'm not even paid for it. That
said, I'm convinced that there is demand for a uniform and compact
XML-like notation purely for machines to use. I am doubtful about the
prophecy of any attempt to solve this "problem" being doomed. Here's
what I note:

* Need to support arbitrary BE (Binary Encoded) documents w/o
  reference to schema: With a little modification in the grammar, xqML
  can support construction of trees for arbitrary documents. It would
  be able to supply all associated data, but it wouldn't tell what
  elements or attributes etc are present. Even as of now, xqML
  documents can carry their data dictionary inline, thus freeing the
  consumers of the burden of managing schema for everything.

* Debugging Tools: It is quite trivial to write an "inspector" tool
  once arbitrary documents are supported. This is closely related to
  the above. However, I would like to point out one thing. The way I
  intend Xqueeze to be, it would be a very transparent logic sitting
  at a low level in your application.
  
  You just substitute XML generators with xqML generators and XML
  parsers with xqML Parsers. The middleware may just provide an option
  to switch between XML and xqML modes of operation. You can develop
  your application entirely in XML mode and when all debugging has
  been done, switch to xqML mode fr deployment. For debugging the xqML
  tools, Emacs and a chart of ASCII control characters offered all the
  necessary visual cues for me. hexdump -C and diff came handy too.

* Support for Data Types: There are plenty of unused symbols in xqML
  that can be used to signify data types. Not only that, currently no
  semantic information is captured from the specifications. Doing this
  can enable greater optimization of the BE and tighter validation of
  the data.

* Random Access: This is not supported by XML, so it can't be
  supported by xqML. However, indexes on complete documents can be
  created by the serving applications and xqML can provide for indexes
  to be included in the beginning of the document (eww! this is
  getting messy)

* Multiple encoding formats: While designing the specifications for
  xqML, I had several choices with me. After initially settling for
  binary symbols, I eliminated choices on the criterion of ease of
  generation and parsing. That choices exist implies that there can be
  several encoding formats a la ASN.1 to suit various needs.

* Competition with compression: xqML in it's current format is as
  structured as XML so it too compresses well. In an experiment[1] a
  12 kB HTML document zipped to 2 kB. The (handwritten) BE for the
  same document took 3 kB and when zipped, it took less than 1000
  bytes.

* Backward Compatibility: As ABS points out, this is a limitation of
  XML, hence is not directly addressable. However, if the schema is
  semantically backward compatible (e.g. just adds another child
  element somewhere so that older docs comply with the new schema),
  this backward compatibility can be captured in XML and hence
  xqML. Regarding changes in the XML syntax itself - well, ask the
  experts.

Comments are welcome, as always :-)

[1] This was done very early in the project's lifetime and the
handwritten encoding was still more verbose than present day xqML.

I haven't carried out proper tests yet for two reasons - the API is
still too elementary to allow writing large test cases without much
effort :p The second reason is that since data is not touched in xqML,
how much compaction is achieved depends a lot on the percentage of
markup, thus invalidating any claims for a real-world
expectation. Even for 100% markup, compaction depends on the length of
identifiers in the specs. I may specify
extremely_large_element_names_like_this in the spec and jump around
with claims of 99% size-reduction.


-- 
Tahir Hashmi (VSE, NCST)
http://staff.ncst.ernet.in/tahir
tahir AT ncst DOT ernet DOT in

We, the rest of humanity, wish GNU luck and Godspeed





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS