[
Lists Home |
Date Index |
Thread Index
]
- From: Chris Smith <smith@interlog.com>
- To: Don Park <donpark@quake.net>, Gavin McKenzie <gmckenzi@JetForm.com>
- Date: Wed, 11 Feb 1998 03:00:20 -0500 (EST)
The discussion has covered some good points up to now. I'll try to
build on it, and move forward.
Let's be clear about what we're trying to solve here. Unicode has
essentially solved the text problem. This note focuses on non-textual
data, or places where a different character encoding is required
inside your document.
For some applications, base64 will be be easy to use. Binary data will
be present in particular locations in the XML tree, and the
applications will simply know to decode it. These don't really need
anything new, but will benefit if there is a common technique for
handling it.
I think the real target is 'container' elements, where the designer
needs to allow for flexibility in content at runtime. It is possible
to do part of this with elements, but you run into two difficulties.
First, you eventually hit your non-text data, and you have to provide
some indication of the content and format. Second, you may have a real
need to allow for formats that have never been forseen.
What we don't need to do is provide another mechanism for managing XML
markup and structure. XML parsers will not be asked to do anything
different. This is entirely about how developers will use XML's
features to resolve an often-encountered problem. (That's why this
still belongs on xml-dev.)
That said, the moment you move away from Unicode data content, you
face a number of issues. You will probably have to specify a wrapper
layer used to make the data XML-friendly. If that is removed, then you
will have to note what format or conventions apply to the next layer.
Ultimately you will reach either a text layer or a binary data layer,
which cannot be further unwrapped. That layer may need a descriptor,
to specify what type of data was carried with all this effort.
The question I still haven't completely resolved is - is there a need
for allowing an arbitrary number of layers, or is three sufficient?
That is the 'content encoding', 'content format', and 'content type'?
I'm not certain it's sufficient, but I can't see a use for much more
at the moment. (I'm not tightly attached to the labels, but I think
they work, and at least they're a start.) The most likely
implementations seem to be with these as attributes. Attributes that
are not present would have a default of a zero-length string.
Below, I've listed a number of items, in the interests of ensuring
that any proposed solution can handle them all. (Ultimately, such a
table would be useful to developers.)
What Is It? Content Content Content
Encoding Format Type
-------------- -------- --------------- -----------------
JPEG image base64 mime:image/jpeg
ASCII text base64 ISO-8859-1 mime:text/plain
HTML text base64 ISO-8859-1 mime:text/html
XML content
XML carried xml:
XML carried base64 ISO-10646-UCS-2 mime:text/xml
XML data only xml:pcdata
private data hex x-private:somedata
private text base64 Commodore64 x-private:sometext
embedded item base64 ISO-8859-1 rfc:822
embedded item base64 mime:application/x-zip
I thought about separating content-type from the content-domain, but
I can't see that you would specify them separately all that often.
The above seems to support several required ideas:
1) Standard XML content requires no settings at all. This is the
degenerate case, and it is good that it works this way.
2) Standard XML content could be structured using a DTD specified
using namespace techniques. This appears to be an available option
without changing any of the infrastructure around encoding.
3) It supports MIME types, but does not require them. Other domains
can be used bsides MIME, including completely private or
proprietary formats.
4) There is some consistency. Notice that whenever you specify a text
type, you must provide a content-format. Otherwise, the text is the
same as the surrounding XML. Whenever you specify any
content-format that is different than the surrounding XML, you must
use a content-encoding to restore XML friendliness.
4) So far, just about anything you can throw in there that has any
current structure looks to be workable.
An example element using these, called 'container' could be defined as
shown below.
<!ELEMENT container ANY>
<!ATTLIST container
content-encoding (base64|hex|none) "none"
content-format CDATA
content-type CDATA
>
I've limited the strings in content-encoding. Is this a good idea?
There would be some structure applied to the content-format and
content-type, but I don't think it would be effectively captured in
the DTD.
Comments aren't just welcome - they're essential!
---------------------------------------------------------------------------
Chris Smith <smith@interlog.com>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|