xml-dev - Re: [xml-dev] XML and XPATH: How do they work?

Re: [xml-dev] XML and XPATH: How do they work?

[ Lists Home | Date Index | Thread Index ]

To: Alexander Johannesen <alexander.johannesen@gmail.com>
Subject: Re: [xml-dev] XML and XPATH: How do they work?
From: Joe Schaffner <schaffner.joe@gmail.com>
Date: Wed, 6 Jul 2005 21:56:38 +0300
Cc: xml-dev@lists.xml.org
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:references; b=ntMexYaz/P1DMKFm/o0l9eUGHOsJXU0TXfoDjxlsXibS8mKqcIJtaxbQ1p7UbSak7MpAh3U/tTfODnkBOsj/vLETIfkuWcUz9k4VLGX1Bmm+1EP4f1DCJAtoKlJix7I16FRdQsTSEEd108wKOIKVhm9WBly3c2JP2rCD7oWrX4Q=
In-reply-to: <f950954e0507042109758feea0@mail.gmail.com>
References: <514a17f5050628033555ee4b93@mail.gmail.com> <42c13661.479df431.75d6.4f91SMTPIN_ADDED@mx.gmail.com> <514a17f5050629044523ed06e7@mail.gmail.com> <f950954e05062905034ffea849@mail.gmail.com> <514a17f5050630033015047976@mail.gmail.com> <f950954e05070123267e92abff@mail.gmail.com> <514a17f5050704120073933a56@mail.gmail.com> <f950954e0507042109758feea0@mail.gmail.com>
Reply-to: Joe Schaffner <schaffner.joe@gmail.com>

ÁëÝîáíäñå, Defender of Mankind...

> Not sure why you would want to do this in XSD, and I personally would use RELAX-NG, but even DTD is good enough for your simple stuff here?

Thanks for the tip. I don't like DTD; it's ugly, that's all (reminds me of ASN-1; maybe it's all the capital letters).

I didn't know about RELAX-NG, I'll have a look. Maybe it's simpler than XSD, which is too expressive (=complicated) for me, so I was experimenting with my own schema language. My only data type is "character string". It looks like this (I hope tripod will let you in):

http://modern-greek-verbs.tripod.com/mgv.xml

At the moment I only have two validators, 'kate' and 'ie6.0'. Kate will parse the xml and show me unbalanced tags, which is very helpful. (Gedit isn't much help).

Kate also has a button bar on the left side of the editing window which will collapse/expand my elements, but there is a bug: sometimes it works, sometime it doesn't.

I was surprised: IE6.0 complains I can only have one top-level object per document, which seems counter-intuitive to me. I figured xml would naturally allow many such objects in a single document, acting, in effect, like a stream of objects.

Here is an instance of my schema, a sample conjugation of áãáðþ, which I did by hand, from the html <table>:

http://modern-greek-verbs.tripod.com/agapao.xml

I can write my own validator in perl, a variation of the ''is_equal()" predicate in LISP, which just plunges through my schema with an argument which points to the instance, replacing the "refs" in the schema with the actual definitions, of course.

If they're "equal" in appearance i.e. look the same, the document is correct.

>> I want the metadata to remain in the application domain where I see it.

> Umm, isn't this just about where you store your definitions?

Yes, right again. It's just a question of the three 'L's: location, location, location.

I don't want to learn another definition language, but I like yours. You're ontology is great. You really understand the problem.

What is your ontology language called? It's not XSD, is it?

(I also like the WYSIWYG approach of HTML.)

The semantics are built into *my* dictionary, not the validator's/compilers. Yes, I need a way to define the lists, to know on which list my <Prev> and <Next> links belong. Now they're grouped visually on the page. As I said, the names of the links are not important, but the list is real and must exist, if only to automatically generate the anchors in the HTML output. Otherwise I'll be parsing the simple text I using as labels on the page.

> Are you saying that you want to store your stuff in a C++ source file because that's where you want to maintain it?

No, I want to store my stuff as XML data, in the application structure itself.

> The what do you need XML /schema for?

Just to validate that the documents I'm creating are valid, especially if I do a few by hand.

> Just the presentation?

I thought XML would make it easier to generate different presentations, in HTML, and PDF, for a printed version, using XSL, but I see that XSL is way too complicated.

I may get by writing my own "tools" just for this application, and stick with HTML as the modeling language. Back to the Future?

I like in the recursive descent plunger in XSL, but it will take me longer to read the documentation than to do one myself. I simply haven't figured what the templates are doing, or what the execution sequence is.

<dig>This technology is really getting out of hand. None of the writers seem to understand it, at least the ones I've read, including W3C.</dig>

We're not speaking the same language.

I need to combine what I understand about Xlink, XPointer, XPath and Namespaces into a coherent whole. For example:

http://modern-greek-verbs.tripod.com/agapao.xml is both a URI and a URL, because it refers to a real data object.

A URI becomes a namespace when it is combined in an XML document with a two- or three-letter mnemonic, using some syntax I'm always forgetting.

There seems to be no real difference between a URI and a namespace, unless you're talking about those cute little mnemonics I make up to use one.

An XPointer points to an instance of the Schema -- a data structure -- inside the document, much like a label in HTML -- so I might like to address this node:

http://modern-greek-verbs.tripod.com/verbs.xml#agapao/active/indicative/present

(I'm reusing the # sign from HTML to distinguish the name spaces... I thought I read somewhere that XPointer uses attribute/value pairs -- like X.500 directories-- and plenty of parenthesis, which turns me off)

Evaluating this expression should return a structure which looks like this:

    <singular>
      <first>áãáðÜù, áãáðþ</first>
     <second>áãáðÜò</second>
      <third>áãáðÜåé, áãáðÜ</third>
    </singular>
    <plural>
      <first>áãáðÜìå, áãáðïýìå</first>
     <second>áãáðÜôå</second>
      <third>áãáðÜí(å), áãáðïýí(å)</third>
    </plural>

So, an XPointer still looks like a "path" expression to me.

What is an XPath?

There must be some kind of tool out the that evaluates "path" expressions and returns valid XML, including simple text, but I have no idea what it might be called.

Thanks man, you've been great!

Joe

On 7/5/05, Alexander Johannesen <alexander.johannesen@gmail.com> wrote:

Hi Joe,

On 7/5/05, Joe Schaffner <schaffner.joe@gmail.com > wrote:
> Can you recommend an XSD validator?

Not sure why you would want to do this in XSD, and I personally would
use RELAX-NG, but even DTD is good enough for your simple stuff here?

> I want a simple one, a command line filter, one which works on stdio, the
> schema as the only argument. I hate rpms, so a small collection of
> statically-linked, command-line filters is what I want. No Java RTE, s'il
> vous plait.  I'm using SuSE 9.2. Linux.  Also, an XSL processor?

Look up libxml and libxslt, which is what I would recomend, unless you
want to do the whole Java thing, in which probably Saxon would suit
your needs more than well enough.

...

> Today I was linking all the verbs which express "luck" - ôý÷ç - like: {áôõ÷þ
> (I have no luck), åõôõ÷þ (I have good luck), äõóôõ÷þ (I have bad luck),
> ðåôõ÷áßíù (I succeed)}.
>
> Over time, the conjugations have changed, so the verbs belong to different
> morpological models. Now they're all on the same list, linked on the
> (English) meaning!

Here's a part of your ontology ;

<item id="luck">
  <name>ôý÷ç</name>
  <name lang="en">luck</name>
  <type>expression</type>
</item>

And here are verbs using that ontology (see '<type>luck</type>' ) ;

<item id="áôõ÷þ">
  <type>luck</type>
  <name>áôõ÷þ</name>
  <name lang="en">I have no luck</name>
</item>

<item id="åõôõ÷þ">
  <type>luck</type>
  <name>åõôõ÷þ</name>
  <name lang="en">I have good luck</name>
</item>

The advantage is that you could use multiple types to declare other
properties easily ;

<item id="äõóôõ÷þ">
  <type>luck</type>
  <type>bad</type>      
  <name>äõóôõ÷þ</name>
  <name lang="en">I have bad luck</name>
</item>

Now you can map your semantics between typed items to get that model
you were talking about earlier. But you can also take a different (or
additional) approach with this (very bogus) example to describe
multi-relationships ;

<relationship id="3945">
  <type refid="morphology" />
  <item refid="äõóôõ÷þ" role="medio-passive" />
  <item refid="syntax" role="schema-old-greek" />
  <item refid="reflexive" role="verb-type" />
</relationship>

Now you can type your relationships, and chucking id's on the
<relationship /> you can talk about this relationships in other
relationships, which would be *exactly* what you want if you wanted to
give examples.

> I want the metadata to remain in the application domain where I see it.

Umm, isn't this just about where you store your definitions? Are you
saying that you want to store your stuff in a C++ source file because
that's where you want to maintain it? The what do you need XML /
schema for? Just the presentation?

> Yes, you're right, it's just relating(=linking) objects by property. The
> semantics are defined in the application domain by the members of the list,
> not in a data dictionary maintained by the compiler.

I'm sorry, I don't understand; what is maintained by the compiler, and
what is meant by 'application domain' here?

> Coming up with names for the lists can be hard. In fact, the name of the
> list is really the string of entries in list itself. Kind of reminds me of
> the name of God himself, which was the string of all the letters and words
> in the books of the Bible...

this is what's known as ontology work, really; you've got some items
and you try to pin a label on the group, defining what they're all
about. After you've defined and defined some more, you got some kind
of model to show for it. It sometimes work, and often it doesn't work
at all. It depends on how good you are at flexibility.

> (God as the primary abstraction? Sometimes He is defined as "top" in the
> books, "top cat", "top sergeant"... the primary abstraction from which all
> others flow, a pure, virtual class.)

Nothing -> something -> Everything :)

> All the verbs having similar, oposite, or otherwise related meaning will be
> placed on the same list. A curious student is encouraged to push on the link
> to see what follows.

Why not create an interface that enables such behaviour by default?

> (It's sort of like an IQ test question, you know, which word does not belong
> to the list of choices?)

Again, if you can define a context, all relationships within it can be
described, but as long as you keep something in the compiler / C++
source files and something in XML, this can be very hard to sort out
and make useful.

> I want to keep the lists short, maybe 3, 4 or 5 entries each, otherwise the
> meanings start to diverge and you are left wondering what the idea was in
> the first place.

Again, I think the idea of lists are going to fail you in the long
run; there will be so many little lists with similar or opposite or
near or totally wrong meanings. Why not define each verb with its
relationships and let the application work out automagically what the
model might look like? This way you've got a stronger foundation, and
you don't have to worry about it at all.

> Opposites, like {love, hate}, {come, go}, {eat, drink}, {walk, run} are the
> most interesting categories and have separate lists.
> The case of "being" {is, becomes, exists, appears, remains} is pretty tight.
>
> ãåííÞèçêá - I was born.
> ìåãÜëùóá - I was raised
> ðÝèáíá - I died
>
> All three belong to different models, but are related by meaning. What is
> that meaning? Who knows, they just go together.

This is an untyped relationship where roles are undefined;

<relationship id="2356">
  <name>Somewhat related</name>
  <item refid="ãåííÞèçêá" />
  <item refid="ìåãÜëùóá" />
  <item refid="ðÝèáíá" />
</relationship>

If you are on the page talking about 'ãåííÞèçêá', it will also list
this relationship with it. If you now want to *talk* about this
relationship ;

<item id="reified-2356">
  <name lang="en">The 'somewhat related' relationship</name>
  ...
</item>

Simple.

> The curious thing about the morphology is that the first is medio-passive,
> because it refers to "self", like a reflexive verb, but the other two are
> active intransitives. (You'd expect them all to be medio-passive, but
> they're not.)

Define your ontology as such ;

<item id="self">
  <type>role</role>
  <name>reflexive</name>
</item>

<item id="active-intransitive">
  <type>role</role>
  <name>active intransitive</name>
</item>

Now you can talk about these things in your relationships if you want to ;

  <item refid="ìåãÜëùóá" role="active-intransitive" />

Migth make a nifty user-interface to your dataset.

> I'm definining them as I go. It's a blast.

That's the key, isn't it; have fun! :)

Alex
--
"Ultimately, all things are known because you want to believe you know."
                                                        - Frank Herbert
__ http://shelter.nu/ __________________________________________________

Follow-Ups:
- Arbortext bought
  - From: Frank Richards <frank@therichards.org>

References:
- Re: [xml-dev] XML and XPATH: How do they work?
  - From: Alexander Johannesen <alexander.johannesen@gmail.com>
- Re: [xml-dev] XML and XPATH: How do they work?
  - From: Joe Schaffner <schaffner.joe@gmail.com>
- Re: [xml-dev] XML and XPATH: How do they work?
  - From: Alexander Johannesen <alexander.johannesen@gmail.com>

Prev by Date: Schema Experience Workshop minutes online
Next by Date: Re: [xml-dev] XML and XPATH: How do they work?
Previous by thread: Re: [xml-dev] XML and XPATH: How do they work?
Next by thread: Arbortext bought
Index(es):
- Date
- Thread