[
Lists Home |
Date Index |
Thread Index
]
i must be missing something here. every day i do battle with
translations from one vocabulary to another. flat files to csv to edi to
xml to printer codes to postscript etc. actually i'm a bit over it all
at the moment.
to do what len has suggested you need a dictionary - (not a data
dictionary, but a dictionary) that says an attribute, element, whatever
in one vocabulary is <something /> in another. possibly rdf is a good
way to express this. then you need a translator that can read an output
schema (and produce valid output). then it needs a schema to describe
the input stream.
putting it all together:
1. decide on and maintain internal schema representation for data
2. maintain translator tables from internal schema to output schema
vocabulary (rdf?)
3. maintain output schema
all 3 should be able to be maintained with some independence (actually
changing 1 or 3 only requires a change to 2 - which is the point)
then to use it:
1. convert arbitrary input stream to internal xml schema
2. use schema aware tool (you might have to write this) to load items 1
to 3 above and translate from input to output stream.
i haven't built such a tool yet, but i've done enough of it by hand to
know that this is the correct broad direction for such things.
there are some other problems in the "real world" that linguists know
about only too well. syntax and semantics. let's say you can translate
the vocabulary - does the output go together in the same order as the
input. worse are attributes in the input still attributes in the output
or are they now elements? consider translating the location of
adjectives in english and french or verbs in german etc (and that's just
western languages). then there's the problem of semantics - is this the
correct vocabulary choice in this setting? in australia (queensland
actually) xxxx (4x) is a beer, my understanding is that it's a sex aid
in america....
it would be really interesting to know how those auto translator things
(like google translator) work because they must have tackled many of
these problems.
as an aside. it would be good if there was a sort of xslt that worked
like this. as the xsl gets bigger, it gets harder to know if you're
producing valid output and harder to change the model.
rick
Peter Hunsberger wrote:
>On Mon, 14 Mar 2005 15:02:24 -0800, Bob Foster <bob@objfac.com> wrote:
>
>
>>Generating instances from schemas usually just produces one of the
>>infinite number of instances restricted by certain trivial parameters. I
>>don't know of an example where meaningful instances are generated.
>>
>>If a generated document changes automatically depending on the schema it
>>finds at the time of generation yet somehow contains the same
>>"information", there must be a model of the document that is independent
>>of the schema, e.g., something like an ER model. Then the model must be
>>populated: this concrete entity has that relationship to these other
>>concrete entities, etc. Then there must be a mapping from the abstract
>>document model to the elements and attributes used in the schema. When
>>the schema changes, the mapping must change in concert (and there must
>>be a way to prevent changes to the schema that violate the abstract
>>document model, e.g., changing an unbounded relationship to a bounded one).
>>
>>After that, piece of cake. ;-}
>>
>>
>
>Instance traversal is something I didn't touch on but of course is the
>real issue here: what's the data source? I had assumed the
>application would be traversing some form of relational DB or similar
>and that there was already some natural key structure and
>relationship metadata/data around. Not necessarily a good
>assumption...
>
>If not, you need some source of control over the data source or
>complete metadata. If the data is simple and you control it you can
>just add id/idref pairs to it to get simple hierarchical descent
>traversal. But if the data's that easy to walk then I'm not sure why
>you're doing this.
>
>Beyond that you can make some simplifying assumptions. The easiest is
>something like assuming every element contains an attribute with the
>same name plus something like "Id" appended and that every referring
>element will include an identically named attribute. That will get you
>lattice like graph traversal and many to many relationships. However,
>unless this is also enforced on the data population side it sounds
>rather fragile... Then again, we do know you have a Schema that can
>be checked at data population time :-)
>
><snip/>
>
>
>
begin:vcard
fn:Rick Marshall
n:Marshall;Rick
email;internet:rjm@zenucom.com
tel;cell:+61 411 287 530
x-mozilla-html:TRUE
version:2.1
end:vcard
|