[
Lists Home |
Date Index |
Thread Index
]
- From: "Paul R. Brown" <prb@fivesight.com>
- To: "Simon St.Laurent" <simonstl@simonstl.com>, "XML-Dev Mailing list" <xml-dev@xml.org>
- Date: Thu, 16 Mar 2000 17:03:05 -0600
Although it is counter to XML principles, how about a DTD/Document processor
that is effectively a code obfuscator? (Call it "obfuscompression".)
The algorithm would go like this:
1) Create a list of elements, attributes, etc., from the DTD.
2) Create a Huffman-type histogram on the tags used in the file -- or in
multiple files.
3) Build a Huffman-like code, assigning legal single letter codes to the 52
most common elements and attributes, (The best we can do is a single
character instead of a single bit..), two letter codes to the next 52^2 most
common, etc.
4) Create a new DTD and a processor in your favorite XML-friendly language
(Python, Java, Perl, XMLScript, etc.; I would probably choose Perl for this
one.) to obfuscompress XML files encoded in the original DTD.
No decompression required...
- Paul
> -----Original Message-----
> From: owner-xml-dev@xml.org [mailto:owner-xml-dev@xml.org]On Behalf Of
> Simon St.Laurent
> Sent: Thursday, March 16, 2000 11:34 AM
> To: XML-Dev Mailing list
> Subject: standard compressed XML format?
>
>
> Is anyone doing any work on a standard compression format for XML
> documents?
>
> I'm starting to get concerned about the volume of complaints I'm getting
> from readers and folks in Web development forums who are starting to argue
> that XML's verbosity is a problem, especially for things like transmitting
> vector graphics information. There are a lot of wasted bits in XML
> documents - and of course in HTML and other text documents as well.
>
> I'm not happy about the prospect of sending documents to browsers as .zip
> or some other compressed format and making users go through multiple steps
> to decompress and view the content. I'd like to think that we could come
> up with a compression/decompression algorithm for markup (maybe just XML,
> maybe all text) that we can use transparently. Ideally, it would be an
> algorithm explicitly placed in the public domain, avoiding licensing and
> legal battles.
>
> Some folks have argued that this belongs in transfer protocols, while
> others have argued that it should be a 3rd party function, like .zip and
> .sit are today. I'm not convinced by the first because so many competing
> formats (gif, jpeg, flash, etc.) already include compression, and I'm not
> convinced by the second because I don't think users are willing to
> micromanage such a process.
>
> It also has an impact on some of the discussions on the IETF-XML-MIME
> discussion (see http://www.imc.org/ietf-xml-mime/ for archives and
> information) because we're already discussing how best to mark information
> as XML for possible generic processing. If a compression
> standard emerged,
> it might well have an impact on MIME types - and I'd like to see that
> discussion start before we settle the MIME types for XML debate.
>
> Any thoughts? I like the fact that XML is verbose when I'm editing and
> processing, but it's not so good in transmission. I'd like to think that
> there's a good _general_ solution that will let us have the best of both
> worlds.
>
> Simon St.Laurent
> XML Elements of Style / XML: A Primer, 2nd Ed.
> Building XML Applications
> Inside XML DTDs: Scientific and Technical
> Cookies / Sharing Bandwidth
> http://www.simonstl.com
>
> ******************************************************************
> *********
> This is xml-dev, the mailing list for XML developers.
> To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
> List archives are available at http://xml.org/archives/xml-dev/
> ******************************************************************
> *********
>
***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************
|