xml-dev - RE: [xml-dev] UTF-8+names

RE: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]

To: "'David Carlisle'" <davidc@nag.co.uk>
Subject: RE: [xml-dev] UTF-8+names
From: "Alessandro Triglia" <sandro@mclink.it>
Date: Sun, 19 Oct 2003 20:40:20 -0400
Cc: <xml-dev@lists.xml.org>
Importance: Normal
In-reply-to: <200310192358.AAA27469@e3000>

David Carlisle wrote:
> 
> 
> > If so, isn't this entirely a software issue?  Can't existing XML 
> > browsers and editors just be extended so as to support *names* for 
> > characters, and we leave the encodings alone?
> 
> they could, but requiring tools to enter data appears to be 
> completely contrary to the spirit of XML. 

I didn't say requiring.  There is absolutely nothing that I implied as being
required.  I said that the original user need seems to be satisfiable by
enhancing the current tools.  If user A has that need, she can choose any
tool that addresses her need.  If user B doesn't have that need, she is
still able to share documents with user A by using the tools she has always
used.  Both will be using UTF-8 or UTF-16 or ISO-xxxx and be happy.  There
will be no need to convert millions of existing XML documents to UTF-8+names
just to make them more readable, because the  software can handle this.  The
impact on existing software will be zero, apart from those tools that will
voluntarily be enhanced.

What I am saying, again, is:  There is obviously a user need.  This user
need seems to be a need for better software, not a need for a new standard,
even in the form of a new encoding of Unicode, which is so invasive.

> If you're going to 
> always hide data entry behind some interface what's the point 
> of a textual format?

The data is Unicode.  It is Unicode today, it is Unicode in Tim's and John's
proposal, and it is Unicode in the suggestion I am now making.  My point is
that, if input/display of certain Unicode characters is difficult with
today's software, a good answer is probably to enhance the software to
facilitate input and display of "difficult" characters.

Again, it is NOT all of the software that needs to be enhanced.  Nor does
all of the enhanced software need to be enhanced in the same way and at the
same time.  Interoperability is guaranteed by the current XML and Unicode
standards, since the encoding itself won't change.

> 
> If I tell people that to get a right arrow in MathML they need to go
> 
> <mo>&rightarow;</mo>
> 
> Then they know how to enter that (whatever system they are 
> using). If on the other hand I had typed the above using a 
> fancy API that entered teh character directly into this email 
> (rather than emacs over a vt100 terminal emulator dialup 
> line:-) 

Here you seem to be saying that a software-specific solution won't address
100% of the use cases, because there is a fraction of users that have no
option but to use an old editor, but still need the ability to enter
characters by name.

How many are they?  Does their number justify the introduction of a new kind
of complexity that will affect the whole spectrum of XML tools?

Alessandro

> then what you would be seeing would be
> 
> <mo>-missing-glyph-marker-</mo>
> or
> <mo>-a-right-arrow-</mo>
> or 
> <mo>-random-latin-1-characters-corresponding-to-utf8-code-for-
> 8859-</mo>
> 
> depending on how your system is set up, and either way this 
> gives essentially no information about how you are supposed 
> to enter that character on your system (short of cutting and 
> pasting it from my message).
> 
> You don't need the entity names of course: & # 8 5 9 4 ; 
> would do as well, but people find it easier to remember names 
> than numbers.
> 
> The problem is of course that fragments such as the above are 
> not well formed, since they reference an undefined entity, If 
> you add a DOCTYPE it becomes well formed but no longer a 
> fragment that can be pasted into a document. This proposal 
> doesn't really address that issue either, as you still need a 
> declaration at the top of the file. A preferred solution for 
> MathML at least would have been for XML 1.1 to say that an 
> undefined entity was a validity (not WF) error (this is 
> already the case if the DOCTYPE has an external component 
> that has not been read)
> 
> If this were the case one could layer alternative entity 
> definition mechanisms over XML parsing.
> 
> Strangely XML core WG declared there wasn't really any 
> problem with existing uses of entity references. 
> http://www.w3.org/XML/Core/2002/10/charents-20021023
> 
> So according to that view, the current proposal appears to be 
> addressing a non-problem. (Strange as there appers to be an 
> overlap  in authorship.)
> 
> David
>

Prev by Date: Re: [xml-dev] UTF-8+names
Next by Date: RE: [xml-dev] UTF-8+names
Previous by thread: Re: [xml-dev] UTF-8+names
Next by thread: RE: [xml-dev] UTF-8+names
Index(es):
- Date
- Thread