OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Why is Encoding Metadata (e.g. encoding="UTF-8) putInside the XML Document?

Rick Jelliffe wrote:
> Jonathan Robie said:
>> Michael Kay wrote:
>>>> Why? Shouldn't metadata be external to a document?
>>> Sadly, most of us are using file systems based on 1960s thinking that
>>> don't
>>> allow metadata to be held anywhere other than in the content of the file
>>> (or potentially in its name).

> There has always been a split between systems based on "magic numbers" (in
> the UNIX sense) which the XML encoding header is an elaborate example of,
> systems based on richer file structures (e.g. old Mac) and systems using
> registries. But it is the file read and write APIs that are the weak links
> in the chain: information about encoding is lost when writing out a file,
> and the only way to maintain it is to write it somewhere. And the only
> place to write it that is cross-platform and cross-application and
> transparent is inside the file itself.
> Actually, it continues the trend of web resources being self-identifying
> rather than requiring external metadata;

> For XML we looked at two different mechansisms: Gavin Nicol suggested that
> we should just use the existing MIME header syntax at the start of the
> file. This had two drawbacks: first, when you use EBCDIC it means a file
> in two different encodings, and second the file was not longer an
> acceptable SGML entity. So the PI syntax was adopted instead, even though
> it meant a disconnect from MIME header syntax.

Is there anything that could be proposed, based on these two ideas.
Clearly XML only half got it right with the PI notation + internal to 
the file metadata.
Mike rightly decries the ancient filesystems we're using for not
addressing encoding.
URI's extend to the file system, what (if anything) has been found 
succesful when working with files in order to address encoding?

Do any of the OS's use something else? I can't understand how this
problem (which must have bugged most readers on this list at one
time in the past) hasn't been faced up to in IETF or W3C or NISO.

What might it look like when solved? A directory based meta container?
Ricks idea of something at the file read/write level?


Dave Pawson
XSLT, XSL-FO and Docbook FAQ

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS