XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] The illusion of simplicity and low cost in data designand computing



On 14 Aug 2022, at 14:45, Roger L Costello <costello@mitre.org> wrote:

Michael Kay wrote this regarding whether information about a file should be inside or outside the file:

 

  • Inside when viewed at one layer, outside when viewed at a different layer.
  • Or to put it another way, you don't need to know. You don't care whether the bits you get to see are contiguous on disk or not.

 

I need a concrete example please. Suppose I have a program that can only process XML documents that are encoded using the 8859-8 character set (Latin/Hebrew). An XML document arrives. How will my program determine whether or not it can process the XML document?

 



You've already muddled the layers. Your program only cares that it's XML, it doesn't care what the encoding is. You program calls something like

if (file.hasContentType("application/xml")) {
  parseXml(file);
}

The XML parser does

Reader reader = file.decode();

The operating system knows the encoding of the file and decodes it as characters.

Of course, there's always a possibility that the operating system doesn't know the encoding of the file, because no-one told it. So you need some kind of API like

file.setEncoding("iso-8859-8")

which would normally be done automatically when you write a file using a character-based Writer.

Similarly there's a possibility the operating system doesn't know the media type of the file, so you need an API like

file.setContentType("application/xml")

Again, one would hope that applications that write XML to filestore will call this API to register the media type.

Of course, this can be wrong, just as HTTP content headers can be wrong. But it's a lot more likely to be right than if you just use guesswork.

Doing this isn't actually fundamentally difficult, it just means making the inode data (that holds metadata about files) extensible. Only when you start trying to make things secure (for example restricting access to a file to a particular application) does it start to affect the system architecture.

Michael Kay
Saxonica






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS