OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[xml-dev] Semantic Markup for prose

Good morning (ok maybe not so good since its raining hard enough to obscure
the North shore mountains here in Vancouver), 

From my reading of various mailing list threads, when people talk about
semantics, it seems to typically be a matter of header metadata. 

I wonder if there is any generic way for applying arbitrary semantics to a
prose type document. In most cases, a prose document has an over all
structure like either HTML (<html><head/><body/></html> or NITF
and has the prose represented mostly by text and <p> tags, perhaps some of
the other block level HTML elements. This framework should be sufficient to
represent almost any simple document one might write. Now lets say that I
have the following in my document

<p>... and Acme Inc. has had a very successful year, producing...</p>

clearly Acme Inc has some additional information that needs to be attached
to it, namely that it is an identifiable company. Once could conceivably
create a semantic tag, such as was done with NITF, to represent a company
and so you'd have something like:

<p>... and <comp symbol="ACME">Acme Inc.</comp> has had a very successful
year, producing...</p>

but that isn't very meaningful unless you happen to know my schema. Odds are
you are going to simply ignore the tag and move on. 

Here's another example that is somewhat more complex. Lets say that I'm
creating fill in the blank type document, lets say its a generic marriage
ceremony. Some where you are going to have:

<p>Do you _______________ take _______________ as your wife? In sickness and
in health......</p>

Now there are several pieces of information that are not correctly conveyed
here The first is that there are three parties involved in this, the bride,
groom and witnesses (the JP or priest isn't participating per se since he is
"rendering" the document). The second is that blanks have to be filled in
with names of each of the participants (and not just any, in a specific
order and in a certain format). The third is that this is a question posed
to the participant, and requires a response. If the semantics of this were
provided, then it would be easy to create "Robo-JP, your very own marriage
officiating appliance." Joking aside, it seems to me that the document is
incomplete without it since you would have a hard time rendering this to a
Braille display or an audio browser.

Now all of this is probably obvious to those who've thought about it. I
recall reading a paper on accessibility (it may have been the W3 rec on the
subject) which encouraged the use of semantic tags such as <var> and <code>.
But it seems to me that there is no end to the number of semantic tags one
could create, even though they are all effectively the same thing. You need
<var> and <code> I need <org>, <person>, <event>. 

Would it not be possible to come up with an extensible semantic mechanism
where everyone would use the same tag but the meaning of the tag would
change based on information provided in the header an in the tag itself. For

<doc xmlns:cs="schema.semantics.org/schemes/computerscience">
		<p>Make sure to set the <sem
scheme="cs:variable">classpath</sem> variable before compiling your Java

<doc xmlns:wed="schema.myplace.com/semantics/weddings"
		<ref id="bride" def="wed:bride"/>
		<ref id="groom" def="wed:groom"/>
			<!-- Ok, I'm borrowing from RDF here, its a great
construct -->
				<p target="#groom">Do you <sem src="#groom"
schema="sem:full-name"/> take <sem src="#bride" schema="sem:full-name"/> as
your lawfully wedded <sem srd="#bride" schema="sem:role"/>? Do you promise
to love <sem srd="#bride" schema="sem:pronoun"/> in spite of all <sem
srd="#bride" schema="sem:pronoun.possesive"/> foibles?</p>
				<p src="#groom">I do.</p>
				<p src="#groom" target="#bride">I, <sem
src="#groom" schema="sem:full-name"/>, take you, <sem src="#bride"
schema="sem:full-name"/>, as my lawfully wedded <sem srd="#bride"
schema="sem:role"/>. ...</p>

It seems to me that such a mechanism should be easier to extend because
everyone knows that <sem> means some sort of semantic markup. It may not
know what it means but at least we know its something interesting. 

Does any of this even make any sense to someone other than me? I've already
done a number of things where this sort of thing has been useful, but I've
had to create my own way of doing things. Since I'm the only one using them,
its not a problem. 

What does everyone think?

Adam van den Hoven
Internet Software Developer
Blue Zone
tel. 604 685 4310 ext. 260
fax 604 685 4391

> Blue Zone makes you interactive. http://www.bluezone.net/