OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Concerning Ethiopic and XML

One of my readers, Daniel Yacob from the Ethiopian News Headlines <http://www.ethiozena.net/>, heard about the Blueberry discussions and wrote to me for some more info. He's been using XML with Amharic for some time now. I originally met him (in cyberspace only so far) when he volunteered to help me out with some screen captures of Ethiopic sites for the XML Bible.

I asked Mr. Yacob to answer a few questions about his experiences with XML and Amharic as grist for ongoing discussions, and he graciously agreed. The quoted text are my questions. What follows each question are Mr. Yacob's answers. He doesn't subscribe to this list, but can be reached at yacob@geez.org.

> 1. Can you briefly describe what you and your colleagues use XML for in an Ethiopic context? i.e. what sort of XML documents do you produce in Ethiopic?

With HTML my experience is extensive, with XML just beginning.  My colleague, Menasse Zaudou (Menasse.Zaudou@eng.sun.com) and I have toyed with XML structures for marking up an Amharic-English dictionary, an Amharic-Amharic dictionary, and an Amharic Bible.  We've been on and off this since '98, the work is just starting to mature now actually.

Otherwise only the the news headlines as I described previously, we've used in a purely XML context.

Similarly, I've used extensions to the HTML markups, processed by my own interpreter, for embedding for things like date, numeric and string conversions.  These extensions are used by a dozen or so web sites publishing in Amharic (Ethiopia's lingua franca). The resulting documents are not valid XML (the had to be editable with standard web publishing software), though the approach should extend easily to XML.  Some proposed extensions to the sytem I've uploaded recently, the link below can you gist of it:


(the original spec is a bit dated now, I'm embarrased to present it).  I've started work on a SOAP interface for the same, a server is running now that can handle some of the requests.

> 2. Do you use standard XML vocabularies such as Docbook and CDF or do you make up your own vocabularies? or both?

In the case of the Bible Menasse did find an XML specification that was suitable.  I can dig this up, I can't recall if it was recognized as a standard.  For the dictionaries we found a spec from GNU I believe that was a bit larger than our needs, we're working on using a subset of it.

The zena.xml followed the format used at slashdot.org, we had to add markups for <EthiopianDate> and <EuropeanDate> so that those reading the file could easily know the date context.

> 3. What language do you use for your tag names? If it's an Ethiopic language, what script do you write the characters in? i.e. do you use an ASCII transliteration?

The script is always ASCII.  I did try Ethiopic tags once and perl's XML::Parser choked on it (unless I had another error).  So I never tried again.

Come to think of it, I've always made tags in English translation but attributes in Amharic transcription.  The tags (like an object) are generally universal, the attributes seem to be what becomes culturally specific.  But this is likely my limited experience talking :)

> 4. Would it be useful to you and your colleagues to be able to write tags in Ethiopic script? How useful would it be? If it were possible, would you do this? 

When a UTF-8 aware version of vi is available I would love it :)

Emacs is Ethiopic aware (we use it for composing HTML), but UTF-8 is just now being introduced.  If it were possible I could see myself leaving vi for Emacs to compose XML.  So "yes" in short.

> 5. Do you see, now or in the future, a large desire for fully-native markup in Ethiopic languages? In particular, do you see a need for native Ethiopic language speakers who are not comfortably bilingual in a Latin-script language to write XML markup? 

The present state of Ethiopian and Eritrean language localization is one where learning some rudimentary English is a prerequisite to using a computer.  The XML manual count in .et and .er languages is also zero.  So given this I can't pretend that the lack of Ethiopic tags is preventing people presently from using XML in environements that are strictly western and largely ascii based.

I'm reminded of debates 10 years ago over writing programming languages like Pascal and C in Ethiopic script and in Amharic.

I think people would use it if it were available, there is still a lot of founcation work that may need to come first.  The XML spec need not wait for localized interfaces, in fact it might be a way to push for them, make them easy to implement.  it is definitely woth pursuing.



| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      | 
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |