[
Lists Home |
Date Index |
Thread Index
]
Hi Rick,
> Would it be more "XML"-ish to hang preprocessing off namespaces?
>
> So that you provide a preprocessor with a list of allowed namespaces
> or prefixes and strip the tags of the rest? "Ignore html1: and strip
> rdf:" for example.
Filtering by namespace is definitely what I believe people term
"low-hanging fruit" (that phrase always reminds me of a guy that
mistakenly talked about his "low-hanging plums" [1]).
Using different namespaces (or combinations of namespaces) is
certainly a very easy way of identifying different hierarchies, but I
don't think that it hits all the use cases.
The main problem is what happens when you have markup that should be
shared between two hierarchies. Taking the bible example, we might
want to extract two hierarchies:
/bible/testament/book/chapter/verse
/bible/testament/book/section/para
I guess that you could place anything that's "common" in yet another
namespace, so we end up with three:
/bib:bible/bib:testament/bib:book/log:chapter/log:verse
/bib:bible/bib:testament/bib:book/phys:section/phys:para
but (a) too many namespaces spoil the markup (make it harder to read
etc.) and (b) if you're making divisions like these, you're
effectively dictating from the outset which structures can be
extracted from the data, which I think goes against the principal of
descriptive markup (i.e. describe the data; let the applications
choose what to do with it).
> There would then be kind of overlapping WF check easily possible,
> just checking that all elements of each prefix/namespace form a
> balanced tree: one tree per namespace. (With some scoping
> conventions for xmlns declarations.)
If we're "coming down from the trees" (to use Patrick's phrase), I'm
also uncomfortable with the notion of stating that the elements/ranges
in each namespace must form a tree. That would prevent, for example, a
"CommentryML" namespace for comments that overlap, which I think is an
important use case for these technologies.
[By the way, the scoping of namespace declarations is a tricky area
when you get into overlapping markup; in LMNL we manage it by
distinguishing namespaces-for-markup (which are declared with [!ns]
declarations that scope to "the rest of the document") and
namespaces-for-content (which are declared as normal annotations that
scope to "the content of the range" or whatever else is appropriate
for the particular application).]
Cheers,
Jeni
[1] 'plums' is slang for 'testes' in England.
---
Jeni Tennison
http://www.jenitennison.com/
|