Lists Home |
Date Index |
- From: Peter Murray-Rust <email@example.com>
- To: firstname.lastname@example.org
- Date: Fri, 17 Apr 1998 12:37:21
At 18:45 08/04/98 -0400, Tyler Baker wrote:
>One dilemma I have been trying to figure out with XML is the problem of
>handling unknown element types and what to do with their children.
>Anyone here got any better ideas on this?
Well I have some ideas ... :-)
The problem I address (in JUMBO2) is "
"what do I do when someone sends me an XML document without any/enough
accompanying material telling me what to do with it?"
If this is similar to your problem, read on :-)
(1) If the DTD is present it can tell you if the document is valid. There
is no agreed mechanism whereby a DTD can carry additional semantics. So
your DTD could tell you if a B element can contain mixed content including
an I element - it can't tell you what they mean.
(2) There is no universal generic mechanism for adding semantics to an XML
(3) If the main purpose of the document is to be rendered for humans, then
stylesheets should be used. If the author creates their own tagset and
doesn't provide a stylesheet, many XML-aficionados will give up at this
stage. i.e. a document:
This is a <FOO>bold <BAR>italic</BAR> phrase</FOO>
is as valid as B and I, but the reader has to do some detective work.
They'd probably give up on most.
(4) If the main purpose of the document is for a machine to act upon it
(and not everyone realises the enormous potential of XML here), then
another way of communicating semantics has to be provided. The method I use
is to map Java classes onto elements. This can use a wide degree of
context-dependence and can be very powerful. Example:
<MOL><ATOMS> <ARRAY BUILTIN="X2">... </ARRAY></ATOMS></MOL>
will draw a chemical line drawing.
<MOL><ATOMS> <ARRAY BUILTIN="X3">... </ARRAY></ATOMS></MOL>
will draw a rotatable 3-D molecule.
The JUMBO-MOL software is (obviously) application-specific and uses
XPointers extensively to decide on context.
(5) To help with the first three problems JUMBO2 now has to following
*generic* facilities which help with 'unstyled' random XML documents
- search the document for all elements, attributes, attribute values, and
PCDATA content and uniquify them
- display this as a tree showing unique markup components. This is linked
to the original document (tree). Thus, I may find that <bibref> occurs in
rec.xml. What does it mean? I can use JUMBO2 to find all the occurrences
of <bibref> in the doc and highlight them all (almost instantaneous , now :-)
- find all 'whitespace' elements and delete them. This aids tree
navigation in some cases
- display the content of any node (whether mixed or element) in several
different styles. These include:
untagged event stream (e.g. similar to removal of unknown tags)
prettyprinted XML (indented)
whitespace specifically highlighted
The default styling applies simple heuristics to display elements. Thus
is displayed as:
where the markup term is in a different font. This is useful for may
generic XML documents.
In addition JUMBO will allow you to add your own style to individual
elements. Thus <olist> in rec.xml would appear to be a list, so the user
can interactively add list-formatting to it. In your case you could arrange
that <B> was made bold and <I> was made italic. [I am not prepared to
'guess' the meaning of common tags - e.g. <A> - and the reader has to take
the responsibility for this. I would hope that the world might converge
towards common semantics for common terms, and XML-DEV is here if anyone
wishes. But if you want to use <PARA> for a chemical term rather than a
paragraph, you're perfectly welcome to - XML doesn't care :-)].
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
xml-dev: A list for W3C XML Developers. To post, mailto:email@example.com
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:firstname.lastname@example.org the following message;
To subscribe to the digests, mailto:email@example.com the following message;
List coordinator, Henry Rzepa (mailto:firstname.lastname@example.org)