OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: Multi-lingual experiment - a call for action

[ Lists Home | Date Index | Thread Index ]
  • From: "Laurent Bossavit" <laurent@mmania.com>
  • To: Xml-Dev <xml-dev@xml.org>
  • Date: Mon, 17 Apr 2000 17:44:47 +0200

Didier wrote :

> It make sense to have a DTD in each language so that, people can experiment
> translating form one language to an other. Do not forget that the experiment
> is about the result of a database. 

I suspect that we have a slight disagreement though about what we 
mean by 'translation'. It seems perfectly natural to me to want to 
translate a document's character content, that content being a 
necessary and crucial part of its meaning.

But as long as we restrict ourselves to a DTD, rather than using - 
say - Schema, isn't "translating" the DTD more a matter of 
*structure* than one of content ? Documents with different DTDs will 
necessarily be of different types; we can *transform* one such type 
to another, but can we say that in doing so we have performed a 
translation ?

My answer is no - because the essence of 'translating' is not only to 
map "STL Tutorial" to "Une introduction à STL" but also to say that a 
"title" in english is the same thing as a "titre" in french. This 
can't be done, as far as I can tell, with a DTD since it does not 
have the means of expressing equivalence between structural 
vocabularies. This is why I think this problem would make a perfect 
test case for a more sophisticated schema language such as XML 
Schema.

> a) Example: translate from an XML document encoded with a French DTD into a
> new XML document encoded in German for trading. Should I mention here that
> this matter of fact will happen with a high probability mainly for exchange
> and trade within the European community.

I would argue that an important requirement here woud be that either 
the French or the German version of such a document should pass 
validation by the same parser.

If you're hinting at a sort of "folder" of documents where one 
"multilingual" element could be the parent of a number of subelements 
each representing a different language version of the "same" content, 
then it seems to me that it would be desirable that each such 
subelement be, structurally, *equivalent* to any other, even if 
element names should differ.

Example :
<versions>
   <objet xml:lang="français">
      <titre>Introduction à STL</titre>
   </objet>
   <item xml:lang="english">
      <title>STL Tutorial</title>
   </item>
</versions>

With a (very partial) Schema as follows (if I understand Schema at 
all, that is, which might be far from the case...) :
<schema targetNamespace="http://yo.com/polyglot">
  <element name="T" type="T" abstract="true"/>
  <element name="titre" equivClass="T"/>
  <element name="title" equivClass="T"/>
  <element name="O" type="O" abstract="true"/>
  <element name="objet" equivClass="O"/>
  <element name="item" equivClass="O"/>
</schema>

In this case a single XSL transform expressed with (say) french 
element names could be used to output the French version of any 
<objet> contained within a <versions> folder, even if this <objet> is 
in fact an <item>... A rose by any other name, etc. (An interesting 
question is how the equivalence classes themselves should be named; 
maybe Esperanto...)

> Please, use the accents since French includes accent. If I show you a
> Japanese DTD (unfortunately most mails won't be able to decode UTF-8
> Japanese characters) you'll notice that the elements are full Japanese words
> _not_ cut back ones. So please, include the accents so that it is french not
> a language between two chairs. If we speak of multi-ligual let's be
> multi-lingual. Anyway, don't bother, I'll add them.

Yeah, accents seem to be allowed - looks like I read the spec wrong. 
Excerpted from the XML 1.0 spec:

[45]  elementdecl ::=  '<!ELEMENT' S Name S contentspec S? '>' 
[5]  Name ::=  (Letter | '_' | ':') (NameChar)* 
[84]  Letter ::=  BaseChar | Ideographic 
[4]  NameChar ::=  Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender 

Then again, does the above mean an element name can't *start* with a 
diacritic ? That would rule out "éditeur"... It's all spelled out in 
the spec but I haven't gotten around to learning Unicode yet - I know 
I should !


========================================
Laurent Bossavit     -     Ingénieur R&D
>>>        laurent@mmania.com        <<<
>>            ICQ#39281367            <<
MultiMania     http://www.multimania.fr/
========================================

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS