OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Identifying XML Document Types (was XML media types revisited)

[ Lists Home | Date Index | Thread Index ]
  • From: Walter Underwood <wunder@infoseek.com>
  • To: XML-Dev Mailing list <xml-dev@ic.ac.uk>
  • Date: Fri, 15 Jan 1999 13:57:10 -0800

At 03:12 PM 1/15/99 -0500, Simon St.Laurent wrote:
>With XML the expectations (for being able to process documents with both
>specific and generic tools) are much higher, yet the tools for identifying
>document types are actually weaker in many ways.

I'm not sure that things are all that bad. An Excel spreadsheet
can be a lot of different things, but it is always parsed the
same way. Word documents or FrameMaker documents may use different
templates, but the file format is the same. MIME types do a fine
job at that level.

More ambitious schemes for description become more and more 
application specific.

For example, my application is reading XML so that our search
engine can index it. The document features that are important
to a search engine are not specified in DTDs, style-sheets,
schemas, or anything else. We need to know which element is the
title, which is the description, and whether some parts of
the document are more important for search purposes (a bibliography
is less important, a problem description might be more important).

The search engine does not care whether the document is valid
or has a DTD at all, but it does care whether XLink is used
in the document (namespaces do help in this case).

Documents are often put to unexpected uses--indexing for
search, legal discovery, corpus linguistics, whatever.
Committing to a document description too early can actually
make a document harder to use. 

In case you're curious, the search engine is a commercial 
product (Ultraseek Server), and has supported simple XML
searching since last September.


Walter R. Underwood
wunder@best.com (home)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS