OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Compression



Forgive me if I'm being dense, but what happens if, as in my example, you
have two elements in the DTD/schema *both* of which can optionally be absent
(eg (x|y)*). Then how can you tell in the XML instance which of the two you
are looking at, without some way of distinguishing them? I don't see how you
would be able to distinguish n choices with 1 bit.

- sn

> -----Original Message-----
> From: Jon Cleaver [mailto:j.cleaver@eris.dera.gov.uk]
> Sent: Thursday, February 15, 2001 12:25 PM
> To: SNedunuri@pav.com
> Cc: xml-dev@lists.xml.org
> Subject: Re: Compression
> 
> 
> We basically have a mechanism that takes an instance document 
> and a schema and
> does the following:
> 
> 1. Looks in the schema to see if the element can be omitted (minOccurs
> attribute >= 1)
> 1a. If it can, it reads 1 bit to see if it is there or not.
> 1b. If it cannot be omitted, it does not bother to read the 
> bit, it 'knows'
> that the element must be there, otherwise the XML would not 
> have conformed to
> the schema which it purports to belong to.
> 
> 2.. Looks in the schema again to see if the element is 
> repeatable (maxOccurs
> attribute > 1)
> 2a. If it is not, then it just expects there to be one 
> element to read out.
> 2b. If it is, it reads a bit after it has decompressed each 
> element to see if
> there is another one on its way.
> 
> In our case there was no requirement to know the number of 
> repetitions before
> hand, therefore a simple bit scheme like this suffices quite well.
> 
> n.b. There are other methods we used to reduce the overhead 
> from one bit per
> repetition to theoretically fractional numbers of bits but 
> that is going a
> little off-topic.
> 
> Cheers,
> 
> Jonathan
> 
> SNedunuri@pav.com wrote:
> 
> > Hmm, interesting. I want to be sure I understand your scheme, so for
> > example, given a DTD spec like
> >         <!Element SomeTag (Foo|Bar|Lah)*>
> > an XML file could be
> >         <SomeTag> <Foo.../> <Bar.../> </SomeTag>
> > or
> >         <SomeTag> <Bar.../> <Lah.../> </SomeTag>
> > I don't see how you could do this with just one bit, if 
> you're omitting
> > tags.
> >
> > Perhaps a bit vector for the right hand side, which 
> indicates the presence
> > or absence of that element. If its present (1) then its 
> followed by its
> > content. SO the first one would be
> >         1:<Foo's content>00, 01:<Bar's content>0
> > and the second
> >         01:<Bar's content>0, 001:<Lah's content>
> >
> > But isn't the repetition still a problem. How do you encode 
> the number of
> > times its repeated with just a single bit?
> >
> > - sn
> >
> > > -----Original Message-----
> > > From: Caroline Clewlow [mailto:cclewlow@eris.dera.gov.uk]
> > > Sent: Thursday, February 15, 2001 10:32 AM
> > > To: Jeff Rafter
> > > Cc: Danny Ayers; xml-dev@lists.xml.org
> > > Subject: Re: Compression
> > >
> > >
> > > As regards compression - one option that *we* have looked at
> > > is that of the
> > > schema being known at both the receiving and transmitting 
> end of the
> > > communication.  In that case a method can be used whereby the
> > > transmission of
> > > the tags themselves are not required.  A single bit can be
> > > used to indicate the
> > > presence or otherwise of an optional item, repeated items can
> > > also be indicated
> > > by 0 or 1 depending on whether they are repeated or not.
> > > Then it's just a case
> > > of encoding the actual content between the tags.
> > >
> > > The XML document can be rebuilt at the receiving end by
> > > stepping through the
> > > schema checking the data for the content that is present.
> > >
> > > Hope the above description makes some sense !
> > >
> > > Regards
> > >
> > > Caroline
> > >
> > >
> > > Jeff Rafter wrote:
> > >
> > > > > A harder (but quite interesting) alternative would be to
> > > have a pointer on
> > > > > http://zip  to a document specifying the (de)compression
> > > algorithm, from
> > > > > which B could build its own native converter.
> > > >
> > > > I don't know if it would be recreating the wheel-- or 
> rather if this
> > > > compression system would be re-using the wheel-- but it 
> seems that
> > > > encryption (public key/private key) fits very well in this
> > > model.  This is
> > > > plain in scenario three:
> > > >
> > > > Scenario 3 :
> > > >
> > > >  A tells B (in straight XML) that it has some data for it,
> > > and it wants to
> > > > encrypt it in the format found at URI http://encrypt with
> > > the decryption key
> > > > found at http://a/decrypt
> > > > B replies - please wait
> > > > B goes to the URL pointed to at http://encrypt, downloads
> > > and installs the
> > > > decryption algorithim (if not already present) and then
> > > downloads the public
> > > > key from http://a/decrypt
> > > > B tells A, I'm ready
> > > > A sends encrypted binary...
> > > >
> > > > Jeff Rafter
> > > >
> > > > 
> ------------------------------------------------------------------
> > > > To unsubscribe from this elist send a message with the 
> single word
> > > > "unsubscribe" in the body to: xml-dev-request@lists.xml.org
> > >
> > >
> > > ------------------------------------------------------------------
> > > To unsubscribe from this elist send a message with the single word
> > > "unsubscribe" in the body to: xml-dev-request@lists.xml.org
> > >
> >
> > ------------------------------------------------------------------
> > To unsubscribe from this elist send a message with the single word
> > "unsubscribe" in the body to: xml-dev-request@lists.xml.org
>