xml-dev - Re: [xml-dev] Fun With Schemas

Re: [xml-dev] Fun With Schemas

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Fun With Schemas
From: Rick Marshall <rjm@zenucom.com>
Date: Wed, 16 Mar 2005 08:13:36 +1100
In-reply-to: <cc159a4a0503150656684c6038@mail.gmail.com>
Organization: Zenucom Pty Ltd
References: <15725CF6AFE2F34DB8A5B4770B7334EE07206CE2@hq1.pcmail.ingr.com> <42361800.70206@objfac.com> <cc159a4a0503150656684c6038@mail.gmail.com>
User-agent: Mozilla Thunderbird 0.6 (X11/20040502)

i must be missing something here. every day i do battle with 
translations from one vocabulary to another. flat files to csv to edi to 
xml to printer codes to postscript etc. actually i'm a bit over it all 
at the moment.

to do what len has suggested you need a dictionary - (not a data 
dictionary, but a dictionary) that says an attribute, element, whatever 
in one vocabulary is <something /> in another. possibly rdf is a good 
way to express this. then you need a translator that can read an output 
schema (and produce valid output). then it needs a schema to describe 
the input stream.

putting it all together:

1. decide on and maintain internal schema representation for data
2. maintain translator tables from internal schema to output schema 
vocabulary (rdf?)
3. maintain output schema

all 3 should be able to be maintained with some independence (actually 
changing 1 or 3 only requires a change to 2 - which is the point)

then to use it:

1. convert arbitrary input stream to internal xml schema
2. use schema aware tool (you might have to write this) to load items 1 
to 3 above and translate from input to output stream.

i haven't built such a tool yet, but i've done enough of it by hand to 
know that this is the correct broad direction for such things.

there are some other problems in the "real world" that linguists know 
about only too well. syntax and semantics. let's say you can translate 
the vocabulary - does the output go together in the same order as the 
input. worse are attributes in the input still attributes in the output 
or are they now elements? consider translating the location of 
adjectives in english and french or verbs in german etc (and that's just 
western languages). then there's the problem of semantics - is this the 
correct vocabulary choice in this setting? in australia (queensland 
actually) xxxx (4x) is a beer, my understanding is that it's a sex aid 
in america....

it would be really interesting to know how those auto translator things 
(like google translator) work because they must have tackled many of 
these problems.

as an aside. it would be good if there was a sort of xslt that worked 
like this. as the xsl gets bigger, it gets harder to know if you're 
producing valid output and harder to change the model.

rick

Peter Hunsberger wrote:

>On Mon, 14 Mar 2005 15:02:24 -0800, Bob Foster <bob@objfac.com> wrote:
>  
>
>>Generating instances from schemas usually just produces one of the
>>infinite number of instances restricted by certain trivial parameters. I
>>don't know of an example where meaningful instances are generated.
>>
>>If a generated document changes automatically depending on the schema it
>>finds at the time of generation yet somehow contains the same
>>"information", there must be a model of the document that is independent
>>of the schema, e.g., something like an ER model. Then the model must be
>>populated: this concrete entity has that relationship to these other
>>concrete entities, etc. Then there must be a mapping from the abstract
>>document model to the elements and attributes used in the schema. When
>>the schema changes, the mapping must change in concert (and there must
>>be a way to prevent changes to the schema that violate the abstract
>>document model, e.g., changing an unbounded relationship to a bounded one).
>>
>>After that, piece of cake. ;-}
>>    
>>
>
>Instance traversal is something I didn't touch on but of course is the
>real issue here: what's the data source?  I had assumed the
>application would be traversing some form of relational DB or similar
>and that there was already some natural key structure and 
>relationship metadata/data around.  Not necessarily a good
>assumption...
>
>If not, you need some source of control over the data source or
>complete metadata.  If the data is simple and you control it you can
>just add id/idref pairs to it to get simple hierarchical descent
>traversal.  But if the data's that easy to walk then I'm not sure why
>you're doing this.
>
>Beyond that you can make some simplifying assumptions.  The easiest is
>something like assuming every element contains an attribute with the
>same name plus something like "Id" appended and that every referring
>element will include an identically named attribute. That will get you
>lattice like graph traversal and many to many relationships. However,
>unless this is also enforced on the data population side it sounds
>rather fragile...  Then again, we do know you have a Schema that can
>be checked at data population time :-)
>
><snip/>
>
>  
>

begin:vcard
fn:Rick  Marshall
n:Marshall;Rick 
email;internet:rjm@zenucom.com
tel;cell:+61 411 287 530
x-mozilla-html:TRUE
version:2.1
end:vcard

Follow-Ups:
- Re: [xml-dev] Fun With Schemas
  - From: Peter Hunsberger <peter.hunsberger@gmail.com>
- Re: [xml-dev] Fun With Schemas
  - From: David Lyon <david.lyon@computergrid.net>

References:
- Fun With Schemas
  - From: "Bullard, Claude L (Len)" <len.bullard@intergraph.com>
- Re: [xml-dev] Fun With Schemas
  - From: Bob Foster <bob@objfac.com>
- Re: [xml-dev] Fun With Schemas
  - From: Peter Hunsberger <peter.hunsberger@gmail.com>

Prev by Date: xml on a chip....
Next by Date: Re: [xml-dev] Fun With Schemas
Previous by thread: Re: [xml-dev] Fun With Schemas
Next by thread: Re: [xml-dev] Fun With Schemas
Index(es):
- Date
- Thread