[
Lists Home |
Date Index |
Thread Index
]
Hi all!
Let me first state on the record that I am not a professional
developer using XML applications day-in and day-out so you might find
the breadth of my knowledge a little limited. Xqueeze is a project I
am pursuing purely out of interest - I'm not even paid for it. That
said, I'm convinced that there is demand for a uniform and compact
XML-like notation purely for machines to use. I am doubtful about the
prophecy of any attempt to solve this "problem" being doomed. Here's
what I note:
* Need to support arbitrary BE (Binary Encoded) documents w/o
reference to schema: With a little modification in the grammar, xqML
can support construction of trees for arbitrary documents. It would
be able to supply all associated data, but it wouldn't tell what
elements or attributes etc are present. Even as of now, xqML
documents can carry their data dictionary inline, thus freeing the
consumers of the burden of managing schema for everything.
* Debugging Tools: It is quite trivial to write an "inspector" tool
once arbitrary documents are supported. This is closely related to
the above. However, I would like to point out one thing. The way I
intend Xqueeze to be, it would be a very transparent logic sitting
at a low level in your application.
You just substitute XML generators with xqML generators and XML
parsers with xqML Parsers. The middleware may just provide an option
to switch between XML and xqML modes of operation. You can develop
your application entirely in XML mode and when all debugging has
been done, switch to xqML mode fr deployment. For debugging the xqML
tools, Emacs and a chart of ASCII control characters offered all the
necessary visual cues for me. hexdump -C and diff came handy too.
* Support for Data Types: There are plenty of unused symbols in xqML
that can be used to signify data types. Not only that, currently no
semantic information is captured from the specifications. Doing this
can enable greater optimization of the BE and tighter validation of
the data.
* Random Access: This is not supported by XML, so it can't be
supported by xqML. However, indexes on complete documents can be
created by the serving applications and xqML can provide for indexes
to be included in the beginning of the document (eww! this is
getting messy)
* Multiple encoding formats: While designing the specifications for
xqML, I had several choices with me. After initially settling for
binary symbols, I eliminated choices on the criterion of ease of
generation and parsing. That choices exist implies that there can be
several encoding formats a la ASN.1 to suit various needs.
* Competition with compression: xqML in it's current format is as
structured as XML so it too compresses well. In an experiment[1] a
12 kB HTML document zipped to 2 kB. The (handwritten) BE for the
same document took 3 kB and when zipped, it took less than 1000
bytes.
* Backward Compatibility: As ABS points out, this is a limitation of
XML, hence is not directly addressable. However, if the schema is
semantically backward compatible (e.g. just adds another child
element somewhere so that older docs comply with the new schema),
this backward compatibility can be captured in XML and hence
xqML. Regarding changes in the XML syntax itself - well, ask the
experts.
Comments are welcome, as always :-)
[1] This was done very early in the project's lifetime and the
handwritten encoding was still more verbose than present day xqML.
I haven't carried out proper tests yet for two reasons - the API is
still too elementary to allow writing large test cases without much
effort :p The second reason is that since data is not touched in xqML,
how much compaction is achieved depends a lot on the percentage of
markup, thus invalidating any claims for a real-world
expectation. Even for 100% markup, compaction depends on the length of
identifiers in the specs. I may specify
extremely_large_element_names_like_this in the spec and jump around
with claims of 99% size-reduction.
--
Tahir Hashmi (VSE, NCST)
http://staff.ncst.ernet.in/tahir
tahir AT ncst DOT ernet DOT in
We, the rest of humanity, wish GNU luck and Godspeed
|