OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Fun With Schemas

[ Lists Home | Date Index | Thread Index ]

On Tue, 15 Mar 2005 4:13 pm, Rick Marshall wrote:
> i must be missing something here. every day i do battle with
> translations from one vocabulary to another. flat files to csv to edi to
> xml to printer codes to postscript etc. actually i'm a bit over it all
> at the moment.

Yes me too. I thought you were one of our respected leaders and
teachers for the W3C in Australia. Helping and inspiring us mere mortals 
that aren't on the W3C to go forward and do useful and interesting things.

It doesn't sound very inspiring or rewarding... nothing much in there 
for any Uni graduate learning xml on the list and wanting to hit the work 
force and do something useful.

No wonder the job market for xml is so bad... if this is your idea
of W3C XML fun then I think I am going to be sick....

> to do what len has suggested you need a dictionary - (not a data
> dictionary, but a dictionary) that says an attribute, element, whatever
> in one vocabulary is <something /> in another. possibly rdf is a good
> way to express this. then you need a translator that can read an output
> schema (and produce valid output). then it needs a schema to describe
> the input stream.
> putting it all together:
> 1. decide on and maintain internal schema representation for data
> 2. maintain translator tables from internal schema to output schema
> vocabulary (rdf?)
> 3. maintain output schema
> all 3 should be able to be maintained with some independence (actually
> changing 1 or 3 only requires a change to 2 - which is the point)
> then to use it:
> 1. convert arbitrary input stream to internal xml schema
> 2. use schema aware tool (you might have to write this) to load items 1
> to 3 above and translate from input to output stream.
> i haven't built such a tool yet, but i've done enough of it by hand to
> know that this is the correct broad direction for such things.
> there are some other problems in the "real world" that linguists know
> about only too well. syntax and semantics. let's say you can translate
> the vocabulary - does the output go together in the same order as the
> input. worse are attributes in the input still attributes in the output
> or are they now elements? consider translating the location of
> adjectives in english and french or verbs in german etc (and that's just
> western languages). then there's the problem of semantics - is this the
> correct vocabulary choice in this setting? in australia (queensland
> actually) xxxx (4x) is a beer, my understanding is that it's a sex aid
> in america....
> it would be really interesting to know how those auto translator things
> (like google translator) work because they must have tackled many of
> these problems.
> as an aside. it would be good if there was a sort of xslt that worked
> like this. as the xsl gets bigger, it gets harder to know if you're
> producing valid output and harder to change the model.
> rick
> Peter Hunsberger wrote:
> >On Mon, 14 Mar 2005 15:02:24 -0800, Bob Foster <bob@objfac.com> wrote:
> >>Generating instances from schemas usually just produces one of the
> >>infinite number of instances restricted by certain trivial parameters. I
> >>don't know of an example where meaningful instances are generated.
> >>
> >>If a generated document changes automatically depending on the schema it
> >>finds at the time of generation yet somehow contains the same
> >>"information", there must be a model of the document that is independent
> >>of the schema, e.g., something like an ER model. Then the model must be
> >>populated: this concrete entity has that relationship to these other
> >>concrete entities, etc. Then there must be a mapping from the abstract
> >>document model to the elements and attributes used in the schema. When
> >>the schema changes, the mapping must change in concert (and there must
> >>be a way to prevent changes to the schema that violate the abstract
> >>document model, e.g., changing an unbounded relationship to a bounded
> >> one).
> >>
> >>After that, piece of cake. ;-}
> >
> >Instance traversal is something I didn't touch on but of course is the
> >real issue here: what's the data source?  I had assumed the
> >application would be traversing some form of relational DB or similar
> >and that there was already some natural key structure and
> >relationship metadata/data around.  Not necessarily a good
> >assumption...
> >
> >If not, you need some source of control over the data source or
> >complete metadata.  If the data is simple and you control it you can
> >just add id/idref pairs to it to get simple hierarchical descent
> >traversal.  But if the data's that easy to walk then I'm not sure why
> >you're doing this.
> >
> >Beyond that you can make some simplifying assumptions.  The easiest is
> >something like assuming every element contains an attribute with the
> >same name plus something like "Id" appended and that every referring
> >element will include an identically named attribute. That will get you
> >lattice like graph traversal and many to many relationships. However,
> >unless this is also enforced on the data population side it sounds
> >rather fragile...  Then again, we do know you have a Schema that can
> >be checked at data population time :-)
> >
> ><snip/>

Computergrid : The ones with the most connections win.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS