Lists Home |
Date Index |
This message I'm writing follows a simple convention for email notes. Text
arranged in paragraphs that are separated by blank lines.
Like that. We all know that this blank line is supposed indicate that what
follows is a new paragraph.
We also know that previous messages in a thread are quoted and know how to
recognise them, and how the author of those paragraphs is attributed.
I've never heard anybody suggest that email messages should be represented
by a type in XML. Everybody knows that things like <p></p> <quote></quote>
are the way to go. A good idea. Right?
Why is it that seemingly everybody wants to write 2-3-2002 or 2/3/2002 or 3
Feb 2002 or March 3, 2003 or feb 3 8:02 pm or whatever else we can think of
and then expect that this be understood as one of those special types that
the should be understood by 'the system'.
We can talk about numbers too, though they are simpler: 1, -1, 1.0, -1.0,
01, 0.1e1, 1e0, +1, +0.1e+1, 10e-1, 1e0.5 and on and on.
For that matter, I appear to be writing this in English. Nobody expects that
to be validated.
Is it because of frequency of use? I think I can make a pretty good argument
that simple email-like messages are written more frequently than dates are
written using a computer.
Is it because of some kind of notion of 'simple'? Surely we can see from the
date debates and the books written that dates are not simple in any usual
sense of the word 'simple'. Neither is our notation for numbers for that
Is it because of some kind of notion of 'atomic'? Markup has to stop
somewhere, doesn't it? But where?
Is it because we have it someplace in the back of our minds that we want to
represent this information 'efficiently' in a computer program?
Is it because we fear that at some point the predominance of tags starts to
obscure the meaning of the content?
Is it simply because XSD exists that we discuss this? Maybe because we
believe that the kind of validation done by XSD is sufficient for an
Is it because we want XQuery to work?
Is it because we *like* things to be complex?
Is it something I've missed?
Why shouldn't we write something like: "<number>1<number>"
Then we could have truly simple types in there (like the kinds of thing
already in DTDs). And the complexity is exposed for everyone to see.
If we did this, then why couldn't we do something like define common markup
for this kind of thing (kind of what I imagine Dublin Core is all about)?
And then write software libraries that can understand it?
This may be too strict/simplistic. Maybe. Can somebody name a better
heuristic for deciding when something should be a 'type' and when it should
be marked with tags?
It seems very unclear to me what problem is being solved by adding types to
something like XSD. It is also very unclear to me that there is a consensus
on this that I am not aware of.
When *do* tags stop being a good idea?