OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compression



We basically have a mechanism that takes an instance document and a schema and
does the following:

1. Looks in the schema to see if the element can be omitted (minOccurs
attribute >= 1)
1a. If it can, it reads 1 bit to see if it is there or not.
1b. If it cannot be omitted, it does not bother to read the bit, it 'knows'
that the element must be there, otherwise the XML would not have conformed to
the schema which it purports to belong to.

2.. Looks in the schema again to see if the element is repeatable (maxOccurs
attribute > 1)
2a. If it is not, then it just expects there to be one element to read out.
2b. If it is, it reads a bit after it has decompressed each element to see if
there is another one on its way.

In our case there was no requirement to know the number of repetitions before
hand, therefore a simple bit scheme like this suffices quite well.

n.b. There are other methods we used to reduce the overhead from one bit per
repetition to theoretically fractional numbers of bits but that is going a
little off-topic.

Cheers,

Jonathan

SNedunuri@pav.com wrote:

> Hmm, interesting. I want to be sure I understand your scheme, so for
> example, given a DTD spec like
>         <!Element SomeTag (Foo|Bar|Lah)*>
> an XML file could be
>         <SomeTag> <Foo.../> <Bar.../> </SomeTag>
> or
>         <SomeTag> <Bar.../> <Lah.../> </SomeTag>
> I don't see how you could do this with just one bit, if you're omitting
> tags.
>
> Perhaps a bit vector for the right hand side, which indicates the presence
> or absence of that element. If its present (1) then its followed by its
> content. SO the first one would be
>         1:<Foo's content>00, 01:<Bar's content>0
> and the second
>         01:<Bar's content>0, 001:<Lah's content>
>
> But isn't the repetition still a problem. How do you encode the number of
> times its repeated with just a single bit?
>
> - sn
>
> > -----Original Message-----
> > From: Caroline Clewlow [mailto:cclewlow@eris.dera.gov.uk]
> > Sent: Thursday, February 15, 2001 10:32 AM
> > To: Jeff Rafter
> > Cc: Danny Ayers; xml-dev@lists.xml.org
> > Subject: Re: Compression
> >
> >
> > As regards compression - one option that *we* have looked at
> > is that of the
> > schema being known at both the receiving and transmitting end of the
> > communication.  In that case a method can be used whereby the
> > transmission of
> > the tags themselves are not required.  A single bit can be
> > used to indicate the
> > presence or otherwise of an optional item, repeated items can
> > also be indicated
> > by 0 or 1 depending on whether they are repeated or not.
> > Then it's just a case
> > of encoding the actual content between the tags.
> >
> > The XML document can be rebuilt at the receiving end by
> > stepping through the
> > schema checking the data for the content that is present.
> >
> > Hope the above description makes some sense !
> >
> > Regards
> >
> > Caroline
> >
> >
> > Jeff Rafter wrote:
> >
> > > > A harder (but quite interesting) alternative would be to
> > have a pointer on
> > > > http://zip  to a document specifying the (de)compression
> > algorithm, from
> > > > which B could build its own native converter.
> > >
> > > I don't know if it would be recreating the wheel-- or rather if this
> > > compression system would be re-using the wheel-- but it seems that
> > > encryption (public key/private key) fits very well in this
> > model.  This is
> > > plain in scenario three:
> > >
> > > Scenario 3 :
> > >
> > >  A tells B (in straight XML) that it has some data for it,
> > and it wants to
> > > encrypt it in the format found at URI http://encrypt with
> > the decryption key
> > > found at http://a/decrypt
> > > B replies - please wait
> > > B goes to the URL pointed to at http://encrypt, downloads
> > and installs the
> > > decryption algorithim (if not already present) and then
> > downloads the public
> > > key from http://a/decrypt
> > > B tells A, I'm ready
> > > A sends encrypted binary...
> > >
> > > Jeff Rafter
> > >
> > > ------------------------------------------------------------------
> > > To unsubscribe from this elist send a message with the single word
> > > "unsubscribe" in the body to: xml-dev-request@lists.xml.org
> >
> >
> > ------------------------------------------------------------------
> > To unsubscribe from this elist send a message with the single word
> > "unsubscribe" in the body to: xml-dev-request@lists.xml.org
> >
>
> ------------------------------------------------------------------
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: xml-dev-request@lists.xml.org